A TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"
This is a TensorFlow implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms" using Keras. This repository only implements the best model of the paper. (the model described in Table 1; m=3, n=9)
ffmpeg
(required for madmom
)pip install -r requirements.txt
pip install madmom
The madmom
package has a install-time dependency, so should be
installed after installing packages in requirements.txt
.
This will install the required packages:
ffmpeg
is required for madmom
.
brew install ffmpeg
add-apt-repository ppa:mc3man/trusty-media
apt-get update
apt-get dist-upgrade
apt-get install ffmpeg
yum install epel-release
rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
rpm -Uvh http://li.nux.ro/download/nux/dextop/el ... noarch.rpm
yum install ffmpeg
Download audio data and tag annotations from here. Then you should
see 3 .zip
files and 1 .csv
file:
mp3.zip.001
mp3.zip.002
mp3.zip.003
annotations_final.csv
To unzip the .zip
files, merge and unzip them (referenced here):
cat mp3.zip.* > mp3_all.zip
unzip mp3_all.zip
You should see 16 directories named 0
to f
. Typically, 0 ~ b
are
used to training, c
to validation, and d ~ f
to test.
To make your life easier, place them in a directory as below:
├── annotations_final.csv
└── raw
├── 0
├── 1
├── ...
└── f
And we will call the directory BASE_DIR
. Preparing the MTT dataset is Done!
This section describes a required preprocessing task for the MTT
dataset. Note that this requires 57G
storage space.
These are what the preprocessing does:
annotations_final.csv
59049
sample lengthTo run the preprocessing, copy a shell template and edit the copy:
cp scripts/build_mtt.sh.template scripts/build_mtt.sh
vi scripts/build_mtt.sh
You should fill in the environment variables:
BASE_DIR
the directory contains annotations_final.csv
file and
raw
directoryN_PROCESSES
number of processes to use; the preprocessing uses
multi-processingENV_NAME
(optional) if you use virtualenv
or conda
to create a
separated environment, write your environment nameThe below is an example:
BASE_DIR="/path/to/mtt/basedir"
N_PROCESSES=4
ENV_NAME="sample_cnn"
And run it:
./scripts/build_mtt.sh
The script will automatically run a process in the background, and tail output which the process prints. This will take a few minutes to an hour according to your device.
The converted TFRecord files will be located in your
${BASE_DIR}/tfrecord
. Now, your BASE_DIR
's structure should be like
this:
├── annotations_final.csv
├── build_mtt.log
├── labels.txt
├── raw
│ ├── 0
│ ├── ...
│ └── f
└── tfrecord
├── test-000-of-036.seq.tfrecords
├── ...
├── test-035-of-036.seq.tfrecords
├── train-000-of-128.tfrecords
├── ...
├── train-127-of-128.tfrecords
├── val-000-of-012.seq.tfrecords
├── ...
└── val-011-of-012.seq.tfrecords
To train a model from scratch, copy a shell template and edit the copy like what did above:
cp scripts/train.sh.template scripts/train.sh
vi scripts/train.sh
And fill in the environment variables:
BASE_DIR
the directory contains tfrecord
directoryTRAIN_DIR
where to save your trained model, and summaries to
visualize your training using TensorBoardENV_NAME
(optional) if you use virtualenv
or conda
to create a
separated environment, write your environment nameThe below is an example:
BASE_DIR="/path/to/mtt/basedir"
TRAIN_DIR="/path/to/save/outputs"
ENV_NAME="sample_cnn"
Let's kick off the training!:
./scripts/train.sh
The script will automatically run a process in the background, and tail output which the process prints.
Copy an evaluating shell script template and edit the copy:
cp scripts/evaluate.sh.template scripts/evaluate.sh
vi scripts/evaluate.sh
Fill in the environment variables:
BASE_DIR
the directory contains tfrecord
directoryCHECKPOINT_DIR
where you saved your model (TRAIN_DIR
when training)ENV_NAME
(optional) if you use virtualenv
or conda
to create a
separated environment, write your environment nameThe script doesn't evaluate the latest model but the best model. If you
want to evaluate the latest model, you should give --best=False
as an
option.