Representation Learning by Learning to Count in Tensorflow

As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometimes delay) research in the AI community by promoting open-source projects. To this end, we implement state-of-the-art research papers, and publicly share them with concise reports. Please visit our group github site for other projects.

This project is implemented by Shao-Hua Sun and the codes have been reviewed by Te-Lin Wu before being published.

Descriptions

This project is a Tensorflow implementation of Representation Learning by Learning to Count. This paper proposes a novel framework for representation learning, where we are interested in learning good representations of visual content, by utilizing the concept of counting visual primitives.

In particular, it exploits the fact that the number of visual primitives presented in an image should be invariant to transformations such as scaling, rotation, etc. Given this fact, the model is able to learn meaningful representations by minimizing a contrastive loss where we enforce that the counting feature should be different between a pair of randomly selected images. During the fine-tuning phase, we train a set of linear classifiers to perform an image classification task on ImageNet based on learned representations to verify the effectiveness of the proposed framework. An illustration of the proposed framework is as follows.

The implemented model is trained and tested on ImageNet.

Note that this implementation only follows the main idea of the original paper while differing a lot in implementation details such as model architectures, hyperparameters, applied optimizer, etc. For example, the implementation adopts the VGG-19 architecture instead of AlexNet which is used in the origianl paper.

*This code is still being developed and subject to change.

Prerequisites

Usage

Datasets

The ImageNet dataset is located in the Downloads section of the website. Please specify the path to the downloaded dataset by changing the variable __IMAGENET_IMG_PATH__ in datasets/ImageNet.py. Also, please provide a list of file names for trainings in the directory __IMAGENET_LIST_PATH__ with the file name train_list.txt. By default, the train_list.txt includes all the training images in ImageNet dataset.

Train the models

Train models with downloaded datasets. For example:

$ python trainer.py --prefix train_from_scratch --learning_rate 1e-4 --batch_size 8

Fine-tune the models

Train models with downloaded datasets. For example:

$ python trainer_classifier.py --prefix fine_tune --learning_rate 1e-5 --batch_size 8 --checkpoint train_dir/train_from_scratch-ImageNet_lr_0.003-20170828-172936/model-10001

Note that you must specify a checkpoint storing the pretrained model. Also, linear classifiers are applied to all the features including conv1, conv2, ..., fc1, fc2, ..., etc, coming from the pretrained model with the same learning rate, optimizers, etc. To fine tune the model only with a certain feature, please specify it in the code model_classifier.py.

Test the fine-tuned models

$ python evaler.py --checkpoint train_dir/fine_tune-ImageNet_lr_0.0001-20170915-172936/model-10001

Train and test your own datasets:

Training details

Learning the representations

Related works

Author

Shao-Hua Sun / @shaohua0116 @ Joseph Lim's research lab @ USC