(Note: You can read an in-depth tutorial about the implementation in this blogpost.)
This is an implementation of image captioning model based on Vinyals et al. with a few differences:
We use different values for some hyperparameters:
Hyperparameter | Value |
---|---|
Learning rate | 0.00051 |
Batch size | 32 |
Epochs | 33 |
Dropout rate | 0.22 |
Embedding size | 300 |
LSTM output size | 300 |
LSTM layers | 3 |
Quantitatively, the proposed model's performance is on par with Vinyals' model on Flickr8k dataset:
Metric | Proposed Model | Vinyals' Model |
---|---|---|
BLEU-1 | 61.8 | 63 |
BLEU-2 | 40.8 | 41 |
BLEU-3 | 27.8 | 27 |
BLEU-4 | 19.0 | N/A |
METEOR | 21.5 | N/A |
CIDEr | 41.5 | N/A |
Download the dataset needed.
./scripts/download_dataset.sh
Download pretrained word vectors.
./scripts/download_pretrained_word_vectors.sh
Download pycocoevalcap data.
./scripts/download_pycocoevalcap_data.sh
Install the dependencies.
Note: It was only tested on Python 2.7. It may need minor code changes to work on Python 3.
# Optional: Create and activate your virtualenv / Conda environment
pip install -r requirements.txt
Setup PYTHONPATH
.
source ./scripts/setup_pythonpath.sh
Download a pretrained model from releases page.
Copy model-weights.hdf5
to keras-image-captioning/results/flickr8k/final-model
.
Now you can run an inference from that checkpoint by executing a command below from keras-image-captioning
directory:
python -m keras_image_captioning.inference \
--dataset-type test \
--method beam_search \
--beam-size 3 \
--training-dir results/flickr8k/final-model
For reproducing the model, execute:
python -m keras_image_captioning.training \
--training-label repro-final-model \
--from-training-dir results/flickr8k/final-model
There are many arguments available that you can look inside training.py
.
python -m keras_image_captioning.inference \
--dataset-type test \
--method beam_search \
--beam-size 3 \
--training-dir var/flickr8k/training-results/repro-final-model
Note:
dataset_type
can be either 'validation' or 'test'.var/flickr8k/training-results/repro-final-model/test-predictions-3-20.yaml
. You can compare it with my result at results/flickr8k/final-model/test-predictions-3-20.yaml
.MIT License. See LICENSE file for details.