Sample CNN

A TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"

This is a TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms" using Keras. This repository only implements the best model of the paper. (the model described in Table 1; m=3, n=9)

Table of contents

Prerequisites

Installing required Python packages

pip install -r requirements.txt
pip install madmom

The madmom package has a install-time dependency, so should be installed after installing packages in requirements.txt.

This will install the required packages:

Installing ffmpeg

ffmpeg is required for madmom.

MacOS (with Homebrew):

brew install ffmpeg

Ubuntu:

add-apt-repository ppa:mc3man/trusty-media
apt-get update
apt-get dist-upgrade
apt-get install ffmpeg

CentOS:

yum install epel-release
rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
rpm -Uvh http://li.nux.ro/download/nux/dextop/el ... noarch.rpm
yum install ffmpeg

Preparing MagnaTagATune (MTT) dataset

Download audio data and tag annotations from here. Then you should see 3 .zip files and 1 .csv file:

mp3.zip.001
mp3.zip.002
mp3.zip.003
annotations_final.csv

To unzip the .zip files, merge and unzip them (referenced here):

 cat mp3.zip.* > mp3_all.zip
 unzip mp3_all.zip

You should see 16 directories named 0 to f. Typically, 0 ~ b are used to training, c to validation, and d ~ f to test.

To make your life easier, place them in a directory as below:

├── annotations_final.csv
└── raw
    ├── 0
    ├── 1
    ├── ...
    └── f

And we will call the directory BASE_DIR. Preparing the MTT dataset is Done!

Preprocessing the MTT dataset

This section describes a required preprocessing task for the MTT dataset. Note that this requires 57G storage space.

These are what the preprocessing does:

To run the preprocessing, copy a shell template and edit the copy:

cp scripts/build_mtt.sh.template scripts/build_mtt.sh
vi scripts/build_mtt.sh

You should fill in the environment variables:

The below is an example:

BASE_DIR="/path/to/mtt/basedir"
N_PROCESSES=4
ENV_NAME="sample_cnn"

And run it:

./scripts/build_mtt.sh

The script will automatically run a process in the background, and tail output which the process prints. This will take a few minutes to an hour according to your device.

The converted TFRecord files will be located in your ${BASE_DIR}/tfrecord. Now, your BASE_DIR's structure should be like this:

├── annotations_final.csv
├── build_mtt.log
├── labels.txt
├── raw
│   ├── 0
│   ├── ...
│   └── f
└── tfrecord
    ├── test-000-of-036.seq.tfrecords
    ├── ...
    ├── test-035-of-036.seq.tfrecords
    ├── train-000-of-128.tfrecords
    ├── ...
    ├── train-127-of-128.tfrecords
    ├── val-000-of-012.seq.tfrecords
    ├── ...
    └── val-011-of-012.seq.tfrecords

Training a model from scratch

To train a model from scratch, copy a shell template and edit the copy like what did above:

cp scripts/train.sh.template scripts/train.sh
vi scripts/train.sh

And fill in the environment variables:

The below is an example:

BASE_DIR="/path/to/mtt/basedir"
TRAIN_DIR="/path/to/save/outputs"
ENV_NAME="sample_cnn"

Let's kick off the training!:

./scripts/train.sh

The script will automatically run a process in the background, and tail output which the process prints.

Evaluating a model

Copy an evaluating shell script template and edit the copy:

cp scripts/evaluate.sh.template scripts/evaluate.sh
vi scripts/evaluate.sh

Fill in the environment variables:

The script doesn't evaluate the latest model but the best model. If you want to evaluate the latest model, you should give --best=False as an option.