AutoML

Angel's automatic machine learning toolkit.

Angel-AutoML provides automatic hyper-parameter tuning and feature engineering operators. It is developed with Scala. As a stand-alone library, Angel-AutoML can be easily integrated in Java and Scala projects.

We welcome everyone interested in machine learning to contribute code, create issues or pull requests. Please refer to Angel Contribution Guide for more detail.

Hyper-parameter tuning

Strategies

Angel-AutoML has three tuning strategies, i.e., Grid search, Random search, and Bayesian optimization.

Grid search and random search

Bayesian optimization

For BO, Angel-AutoML implements a series of surrogate functions and acquisition functions.

Usage

The tuning component of Angel-AutoML provides easy-to-use interfaces. Users can integrate it into their programs with fewer than 10 lines.

Feature engineering

Feature engineering, such as feature selection and feature synthesis, has significant importance in industry level applications of machine learning. Angel-AutoML implements useful feature engineering operators with Spark MLlib. They can be easily assembled into Spark pipeline.

Feature selection

Since the feature selection operators in Spark MLlib is not enough, we enhance Spark by adding two categories of operators.

Feature synthesis

A majority of online recommendation systems choose linear models, such as Logistic Regression, as their machine learning model for its high throughput and low latency. But Logistic Regression requires manual feature synthesis to achieve high accuracy, which makes automatic feature synthesis essential. However, existing automatic feature synthesis methods simply generate high-order cross features by cartesian product, incurring problem of dimension curse. Therefore, we propose Auto Feature synthesis (AFS), an iterative approach to generate high-order features.

Automatic feature synthesis

In AFE, each iteration is composed of two stages:

The above figure is an example of an AFS iteration: