SparkSMOTE

The Synthetic Minority Oversampling Technique (SMOTE) implemented in Spark (see original paper). This is a very useful method for dealing with highly imbalanced datasets.

Usage Details

Getting started

Compile and run example data (in data directory). Must specify input and output paths (see algorithmic parameters).

sbt compile
sbt package
./run

Output file will contain the original dataset combined with the artificial instances generated by SMOTE.

Data format

Algorithmic parameters

Parameters that MUST specified in the "run" file:

Parameters that can be specified in the "run" file: