The 4th Place Solution to the 2019 ACM RecSys Challenge

Team Members

Kung-hsiang (Steeve), Huang (Rosetta.ai); Yi-fu, Fu; Yi-ting, Lee; Tzong-hann, Lee; Yao-chun, Chan (National Taiwan University); Yi-hui, Lee (University of Texas, Dallas); Shou-de, Lin (National Taiwan University)

Contact: steeve@rosetta.ai

Introduction

This repository contains RosettaAI's approach to the 2019 ACM Recys Challenge (paper, writeup). Instead of treating it as a ranking problem, we use Binary Cross Entropy as our loss function. Three different models were implemented:

  1. Neural Networks (based on DeepFM and this Youtube paper)
  2. LightGBM
  3. XGBoost

Environment

Project Structure

├── input
├── output
├── src
└── weights

Setup

Run the following commands to create directories that conform to the structure of the project, then place the unzipped data into the input directory.:

. setup.sh

Run the two python scripts to picklize the input data and obtain the utc offsets from countries:

cd src
python picklization.py
python country2utc.py

To enable the model to train on the whole data, set debug and subsample to False in the config.py file.

class Configuration(object):

    def __init__(self):
        ...
        self.debug = False
        self.sub_sample = False
        ...

Training & Submission

The models are all trained in an end-to-end fashion. To train and predict each of the three models, simply run the following commands:

python run_nn.py
python run_lgb.py
python run_xgb.py

The submission files are stored in the output directory.

The results generated from LightGBM alone would place us at the 5th position in the public leaderboard. To ensemble these three models, change the output name of each model in Merge.ipynb and run it.

Performance

Model Local Validation MRR Public Leaderboard MRR
LightGBM 0.685787 N/A
XGBoost 0.684521 0.681128
NN 0.675206 0.672117