This repository provide the code corresponding to the article One Class Splitting Criteria for Random Forests, and other anomaly detection algorithms.
Random Forests (RFs) are strong machine learning tools for classification and regression. However, they remain supervised algorithms, and no extension of RFs to the one-class setting has been proposed, except for techniques based on second-class sampling. This work fills this gap by proposing a natural methodology to extend standard splitting criteria to the one-class setting, structurally generalizing RFs to one-class classification. An extensive benchmark of seven state-of-the-art anomaly detection algorithms is also presented. This empirically demonstrates the relevance of our approach.
The implementation is based on a fork of scikit-learn. To have a working version of both scikit-learn and OCRF scikit-learn one can use conda to create a virtual environment specific to OCRF while keeping the original version of scikit-learn clean.
This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:
python setup.py build_ext --inplace
and run your personal code inside the folder OCRF. To use OCRF outside of the OCRF folder change the environment variable PYTHONPATH or create a virtual environment with Conda.
First install conda Conda and update it:
conda update conda conda update --all
Then create a virtual environment for OCRF, activate it and install OCRF and its dependencies on the new virtual environment:
conda create -n OCRF_env python=2.7 anaconda source activate OCRF_env conda install -n OCRF_env numpy scipy cython matplotlib git clone https://github.com/ngoix/OCRF cd OCRF pip install --upgrade pip pip install pyper python setup.py install cd ..
Now OCRF is installed. To check it run the script benchmark_oneclassrf.py:
python benchmarks/benchmark_oneclassrf.py
To quit the environment and revert to the original scikit-learn use:
source deactivate
To return to the OCRF environment use:
source activate OCRF_env
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the AUTHORS.rst file for a complete list of contributors.
It is currently maintained by a team of volunteers.
Note scikit-learn was previously referred to as scikits.learn.