tf-lcnn : Fast Inference on CPU based on 'LCNN'

Tensorflow implementation for 'LCNN: Lookup-based Convolutional Neural Network'

This also have an implementations multi-gpu training codes for various models, so you can train your own model faster and predict images faster with Lookup Convolutions.

Lookup Convolution

Implementations

[x] Achieve MNist, ILSVRC2012 Baseline

[x] Training Imagenet on Multiple node with multiple gpus

[x] Training Code - Lookup-based Convolution Layer

[x] Same training result as the original paper

[x] Inference Code - Optimized Dense Matrix Operation by Implementing Custom Tensorflow Operation

[] Fast inference speed as the original paper

Custom Operation for Sparse Convolutional Layer

Build

Custom Operation have been implemented for LCNN's lookup convolution.

Source codes in /ops, and it should be build before run the inference code.

(Recommend tensorflow build with '-mavx -msse4.1 -msse4.2' options)

$ cp {tf-lcnn}/ops/* {tensorflow}/tensorflow/core/user_ops/
$ bazel build --config opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/core/user_ops:sparse_conv2d.so
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{tensorflow}/bazel-bin/tensorflow/core/user_ops/

Performance

As you can see below timeline, this custom lookup convolutional operation has very little weight in the whole time, when compared relatively with normal convolutional layer.

inference timeline

Training Results

Alexnet's Fully connected layer was replaced with convolutional layer. Codes will be optimized soon and inference times will be updated.

MNIST Dataset

For LCNN Model, Two versions of networks were trained for experiments.

The original paper was not evaluated on MNIST, but the dataset was suitable for rapid experiments.

Model Conv. Filter Inference (Top1) GPU Training Time Etc
Alexnet Convolution 140ms / 99.98% 1 GPU 1h 35m Epoch 40, Batch 128
Alexnet Convolution 140ms / 99.42% 4 GPU 27m (x3.5) Epoch 40, Batch 512
Alexnet LCNN-Fast 15ms / 99.24% 8 GPU 23m Epoch 40, Batch 128
Alexnet LCNN-Accurate 56ms / 99.43% 8 GPU 23m Epoch 40, Batch 128

Imagenet ILSVRC2012 Classification Task

Tests are in progress. Below is a partial result, and it will be updated soon.

Model Conv. Filter Inference (Top1/Top5) GPU Training Time Etc
Alexnet Convolution 144ms / 59.40%, 81.50% 1 GPU 53h Epoch 65, Batch 128
Alexnet Convolution 144ms / 59.21%, 81.33% 4 GPU 14h (x3.78) Epoch 65, Batch 128
Alexnet-LCNN LCNN-Fast 15ms / 50.60%, 72.34% 1 GPU 46h Epoch 65, Batch 128
Alexnet-LCNN LCNN-Mid
Alexnet-LCNN LCNN-Accurate 62ms / 58.17%, 78.54% 1 GPU 47h Epoch 65, Batch 128

TODO : More tests on Resnet and etcs.

The experimental results from the original paper are as follows.

lcnn result table

lcnn result table


References & Opensource Pakcages

This code is very experimental and have been helped a lot from various websites.

LCNN

[1] LCNN: Lookup-based Convolutional Neural Network

[2] http://openresearch.ai/t/lcnn-lookup-based-convolutional-neural-network

[3] author's code : https://github.com/hessamb/lcnn/blob/master/layers/PooledSpatialConvolution.lua

Base Networks (LENET, Alexnet) & Datasets (MNIST, ImageNet)

[1] ImageNet Classification with Deep Convolutional Neural Networks

[2] imagenet training on alexnet : https://github.com/dontfollowmeimcrazy/imagenet

[3] https://github.com/mouradmourafiq/tensorflow-convolution-models

[4] https://github.com/hpssjellis/easy-tensorflow-on-cloud9/blob/master/aymericdamien-Examples/examples/alexnet.py

Tensorflow Custom Operation

[1] https://www.tensorflow.org/extend/adding_an_op

[2] http://davidstutz.de/implementing-tensorflow-operations-in-c-including-gradients/

[3] https://github.com/tensorflow/tensorflow/blob/8eaf671025e8cd5358278f91f7e89e2fbbe6a26b/tensorflow/core/kernels/conv_ops.cc#L94

[4] https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/python/ops/sparse_ops.py

[5] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/nn_ops.cc#L503

[6] https://github.com/tensorflow/tensorflow/issues/2412

Tensorflow Build for Cmake

[1] https://www.tensorflow.org/install/install_sources

[2] https://github.com/cjweeks/tensorflow-cmake

[3] https://github.com/tensorflow/tensorflow/issues/2412

Multi GPU / Multi Node Training

[1] Distributed Tensorflow : https://www.tensorflow.org/deploy/distributed

[2] Distributed Tensorflow Example : https://github.com/tensorflow/models/tree/master/inception

[3] https://research.fb.com/publications/imagenet1kin1h/

Training Techniques

[1] https://stackoverflow.com/questions/34293714/can-i-measure-the-execution-time-of-individual-operations-with-tensorflow/37774470#37774470

[2] https://github.com/ppwwyyxx/tensorpack

[3] https://github.com/sorki/python-mnist

[4] imgaug : https://github.com/aleju/imgaug