Two-stream-action-recognition-keras

License: MIT DOI

We use spatial and temporal stream cnn under the Keras framework to reproduce published results on UCF-101 action recognition dataset. This is a project from a research internship at the Machine Intelligence team, IBM Research AI, Almaden Research Center, by Wushi Dong (dongws@uchicago.edu).

References

Data

Spatial input data -> rgb frames

First, download the dataset from UCF into the data folder: cd data && wget http://crcv.ucf.edu/data/UCF101/UCF101.rar

Then extract it with unrar e UCF101.rar. in disk, which costs about 5.9G.

We use split #1 for all of our experiments.

Motion input data -> stacked optical flows

Download the preprocessed tvl1 optical flow dataset directly from https://github.com/feichtenhofer/twostreamfusion.

  wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.001
  wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.002
  wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.003
  cat ucf101_tvl1_flow.zip* > ucf101_tvl1_flow.zip
  unzip ucf101_tvl1_flow.zip

Training

Spatial-stream cnn

Temporal-stream cnn

Data augmentation

Testing

Results

Network Simonyan et al [1] Ours
Spatial 72.7% 73.1%
Temporal 81.0% 78.8%
Fusion 85.9% 82.0%