
Pandas dataframe goes in, XGBoost model results come out

The feature engineering step (creating new features and selectively removing unwanted features) is the most creative and fun step of training a model, whereas what follows is usually a standard data-processing flow.

Once you're done engineering your features, xgbmagic automatically runs a standard workflow for using XGBoost to train a model on a pandas dataframe.

New features!

Iterate faster with smaller samples! Improve accuracy with ensemble learning (bagging)!

To do


Install xgboost first

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; make -j4
cd python-package; sudo python setup.py install

Then install xgbmagic

pip install xgbmagic


Input parameters:

predict(test_df, return_multi_outputs, return_mean_std)


import xgbmagic
import pandas as pd

# read the training data
df = pd.read_csv('train.csv')

# for logistic regression, target_type is 'binary'
target_type = 'binary'

# set columns that are categorical, numeric, and to be dropped here.
xgb = xgbmagic.Xgb(df, target_column='TARGET', id_column='ID', target_type=target_type, categorical_columns=[], drop_columns=[], numeric_columns=[], num_training_rounds=500, verbose=1, early_stopping_rounds=50)

# use the model to predict values for the test set
test_df = pd.read_csv('test.csv')
output = xgb.predict(test_df)

# write to csv

# save model

# load model
from sklearn.externals import joblib
xgb = joblib.load('xgbmodel.pkl')


Please report issues and feedback here