Data workflows for the numer.ai machine learning competition
Currently implemented:
FetchAndExtractData
Fetches the dataset zipfile and extracts the contents to output-path
.
output-path
: where the datasets should be saved eventually (defaults to
./data/
)dataset-path
: URI of the remote datasetTrainAndPredict
Trains a Bernoulli Naïve Bayes classifier and predicts the targets. Output file
is saved at output-path
with a custom, timestamped file name.
output-path
: where the datasets should be saved eventually (defaults to
./data/
)dataset-path
: URI of the remote datasetUploadPredictions
Uploads the predictions of not already uploaded.
output-path
: where the datasets should be saved eventually (defaults to
./data/
)dataset-path
: URI of the remote datasetusermail
: user emailuserpass
: user passwordfilepath
: path to the file ought to be uploadedPrepare the project:
pip install -r requirements.txt --ignore-installed
If not alread done create an API key here with at least the following permissions:
To run the complete pipeline:
env PYTHONPATH='.' luigi --local-scheduler --module workflow Workflow --secret="YOURSECRET" --public-id="YOURPUBLICID"