RecNet

This project tries to implement RecNet proposed in Reconstruction Network for Video Captioning [1], CVPR 2018.

Environment

Requirements

How to use

Step 1. Setup python virtual environment

$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Step 2. Prepare Data

  1. Extract Inception-v4 [2] features from datasets, and locate them at <PROJECT ROOT>/<DATASET>/features/<DATASET>_InceptionV4.hdf5. I extracted the Inception-v4 features from here.

    Dataset Inception-v4
    MSVD link
    MSR-VTT link
  2. Split the dataset along with the official splits by running following:

    (.env) $ python -m splits.MSVD
    (.env) $ python -m splits.MSR-VTT

Step 3. Prepare Evaluation Codes

Clone evaluation codes from the official coco-evaluation repo.

   (.env) $ git clone https://github.com/tylin/coco-caption.git
   (.env) $ mv coco-caption/pycocoevalcap .
   (.env) $ rm -rf coco-caption

Step 4. Train

You can change some hyperparameters by modifying configs/train_stage1.py and configs/train_stage2.py.

Step 5. Inference

  1. Set the checkpoint path by changing ckpt_fpath of RunConfig in configs/run.py.
  2. Run
    (.env) $ python run.py

Performances

* NOTE: As you can see, the performance of RecNet does not outperform SA-LSTM. Better hyperparameters should be found out.

References

[1] Wang, Bairui, et al. "Reconstruction Network for Video Captioning." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[2] Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.