A Two Stream Baseline on Kinectics dataset

Kinectics Training on 1 GPU in 2 Days

This is PyTorch implementation of two stream network of action classification on Kinetics dataset. We train two streams of networks independently on individual(or stacked) frames of RGB (appearence) and optical flow (flow) as inputs.

Objective of this repository to establish a two stream baseline and ease the training process on such a huge dataset.

Table of Contents

Installation

Dataset

Kinetics dataset can be Downloaded using Crawler.
Notes:

Preprocess

First we need to extract images out of videos using ffmpeg and resave the annotations, so that annotations are compatible with this code.
You can take help of scripts in prep folder in the repo to do both the things.
You need to compute optical flow images using optical-flow. Compute farneback flow as it is much faster to compute and gives reasonable results. You might want to run multiple processes in parallel.

Training

Let's assume that you extracted dataset in /home/user/kinetics/ directory then your train command from the root directory of this repo is going to be:

CUDA_VISIBLE_DEVICES=0 python train.py --root=/home/user/kinetics/ --global_models_dir=/home/user/pretrained-models/
--visdom=True --input_type=rgb --stepvalues=200000,350000 --max_iterations=500000

To train of flow inputs

CUDA_VISIBLE_DEVICES=1 python train.py --root=/home/user/kinetics/ global_models_dir=/home/user/pretrained-models/
--visdom=True --input_type=farneback --stepvalues=250000,400000 --max_iterations=500000

Different paramneter in train.py will result in different performance

Evaluation

You can use test.py to generate frame-level scores and save video-level results in json file. Further use eval.py to evaluate results on validation set

produce frame-level scores

Once you have trained network then you can use test.py to generate frame-level scores. Simply specify the parameters listed in test.py as a flag or manually change them. for e.g.:

CUDA_VISIBLE_DEVICES=0 python3 test.py --root=/home/user/kinetics/ --input=rgb --test-iteration=500000

-Note

Video-level evaluation

Video-level labling requires frame-level scores. test.py not only store frame-level score but also video-level scores in evaluate function within. It will dump the video level output in json format (same a used in activtiyNet challenge) for validation set. Now you can specify the parameter in eval.py and evaluate

Performance

Table below records the performance of resnet101 model on Mini-Kinetics datasets. It is trained for 60K iteration with learning rate of 0.0005 and a drop by factor of 10 after 25000,40000,55000. Batch size used is 64.

method frame-top1 frame-top3 video-top1 video-top5 video-AVG video-mAP
Resnet101-RGB 61.5 77.9 75.7 92.2 83.9 78.1

Extras (comming soon)

Pre-trained models can be downloaded from the links given below. You will need to make changes in test.py to accept the downloaded weights.

Download pre-trained networks

TODO

References