PyODDS

Build Status Coverage Status Documentation Status Codacy Badge PyPI version

Official Website: http://pyodds.com/

PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches. It is developed by DATA Lab at Texas A&M University.

PyODDS is featured for:

The Full API Reference can be found in handbook.

API Demo:

from utils.import_algorithm import algorithm_selection
from utils.utilities import output_performance,connect_server,query_data

# connect to the database
conn,cursor=connect_server(host, user, password)

# query data from specific time range
data = query_data(database_name,table_name,start_time,end_time)

# train the anomaly detection algorithm
clf = algorithm_selection(algorithm_name)
clf.fit(X_train)

# get outlier result and scores
prediction_result = clf.predict(X_test)
outlierness_score = clf.decision_function(test)

#visualize the prediction_result
visualize_distribution(X_test,prediction_result,outlierness_score)

Cite this work

Yuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu. "PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning" (Download)

Biblatex entry:

@inproceedings{10.1145/3366424.3383530,
    author = {Li, Yuening and Zha, Daochen and Venugopal, Praveen and Zou, Na and Hu, Xia},
    title = {PyODDS: An End-to-End Outlier Detection System with Automated Machine Learning},
    year = {2020},
    isbn = {9781450370240},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3366424.3383530},
    doi = {10.1145/3366424.3383530},
    booktitle = {Companion Proceedings of the Web Conference 2020},
    pages = {153--157},
    numpages = {5},
    keywords = {Automated Machine Learning, Outlier Detection, Open Source Package, End-to-end System},
    location = {Taipei, Taiwan},
    series = {WWW '20}
  }

Quick Start

python demo.py --ground_truth --visualize_distribution

Results are shown as

connect to TDengine success
Load dataset and table
Loading cost: 0.151061 seconds
Load data successful
Start processing:
100%|████████████████████████████████████| 10/10 [00:00<00:00, 14.02it/s]
==============================
Results in Algorithm dagmm are:
accuracy_score: 0.98
precision_score: 0.99
recall_score: 0.99
f1_score: 0.99
roc_auc_score: 0.99
processing time: 15.330137 seconds
==============================
connection is closed

Installation

To install the package, please use the pip installation as follows:

pip install pyodds
pip install git+git@github.com:datamllab/PyODDS.git

Note: PyODDS is only compatible with Python 3.6 and above.

Required Dependencies

- pandas>=0.25.0
- taos==1.4.15
- tensorflow==2.0.0b1
- numpy>=1.16.4
- seaborn>=0.9.0
- torch>=1.1.0
- luminol==0.4
- tqdm>=4.35.0
- matplotlib>=3.1.1
- scikit_learn>=0.21.3

To compile and package the JDBC driver source code, you should have a Java jdk-8 or higher and Apache Maven 2.7 or higher installed. To install openjdk-8 on Ubuntu:

sudo apt-get install openjdk-8-jdk

To install Apache Maven on Ubuntu:

sudo apt-get install maven

To install the TDengine as the back-end database service, please refer to this instruction.

To enable the Python client APIs for TDengine, please follow this handbook.

To insure the locale in config file is valid:

sudo locale-gen "en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
locale

To start the service after installation, in a terminal, use:

taosd

Implemented Algorithms

Statistical Based Methods

Methods Algorithm Class API
CBLOF Clustering-Based Local Outlier Factor :class:algo.cblof.CBLOF
HBOS Histogram-based Outlier Score :class:algo.hbos.HBOS
IFOREST Isolation Forest :class:algo.iforest.IFOREST
KNN k-Nearest Neighbors :class:algo.knn.KNN
LOF Local Outlier Factor :class:algo.cblof.CBLOF
OCSVM One-Class Support Vector Machines :class:algo.ocsvm.OCSVM
PCA Principal Component Analysis :class:algo.pca.PCA
RobustCovariance Robust Covariance :class:algo.robustcovariance.RCOV
SOD Subspace Outlier Detection :class:algo.sod.SOD

Deep Learning Based Methods

Methods Algorithm Class API
autoencoder Outlier detection using replicator neural networks :class:algo.autoencoder.AUTOENCODER
dagmm Deep autoencoding gaussian mixture model for unsupervised anomaly detection :class:algo.dagmm.DAGMM

Time Serie Methods

Methods Algorithm Class API
lstmad Long short term memory networks for anomaly detection in time series :class:algo.lstm_ad.LSTMAD
lstmencdec LSTM-based encoder-decoder for multi-sensor anomaly detection :class:algo.lstm_enc_dec_axl.LSTMED
luminol Linkedin's luminol :class:algo.luminol.LUMINOL

APIs Cheatsheet

The Full API Reference can be found in handbook.

License

You may use this software under the MIT License.