Feature Engine

Python 3.6 Python 3.7 Python 3.8 License CircleCI Documentation Status

Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data.

Feature-engine features in the following resources:

Feature Engineering for Machine Learning, Online Course. Python Feature Engineering Cookbook

Documentation

Current Feature-engine's transformers include functionality for:

Imputing Methods

Encoding Methods

Outlier Handling methods

Discretisation methods

Variable Transformation methods

Installing

pip install feature_engine

or

git clone https://github.com/solegalli/feature_engine.git

Usage

>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io

Contributing

Local Setup Steps

Opening Pull Requests

PR's are welcome! Please make sure the CI tests pass on your branch.

License

BSD 3-Clause

Authors

References

Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.

To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering for Machine Learning, Online Course

For a summary of the methods check this presentation and this article

To stay alert of latest releases, sign up at trainindata