Feature-engine is a Python library with multiple transformers to engineer features for use in machine learning models. Feature-engine's transformers follow Scikit-learn functionality with fit() and transform() methods to first learn the transforming paramenters from data and then transform the data.
Feature Engineering for Machine Learning, Online Course. Python Feature Engineering Cookbook
pip install feature_engine
or
git clone https://github.com/solegalli/feature_engine.git
>>> from feature_engine.categorical_encoders import RareLabelCategoricalEncoder
>>> import pandas as pd
>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelCategoricalEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
See more usage examples in the jupyter notebooks in the example folder of this repository, or in the documentation: http://feature-engine.readthedocs.io
pip install tox
tox
if the tests pass, your local setup is completePR's are welcome! Please make sure the CI tests pass on your branch.
BSD 3-Clause
Many of the engineering and encoding functionality is inspired by this series of articles from the 2009 KDD competition.
To learn more about the rationale, functionality, pros and cos of each imputer, encoder and transformer, refer to the Feature Engineering for Machine Learning, Online Course
For a summary of the methods check this presentation and this article
To stay alert of latest releases, sign up at trainindata