AntreasAntoniou/DeepClassificationBot

Classification Bot

Welcome to the Classification Bot codebase. Classification Bot is an attempt of simplifying the collection, extraction and preprocessing of data as well as providing an end to end pipeline for using them to train large deep neural networks.

The system is composed of scrapers, data extractors, preprocessors, deep neural network models using Keras provided by Francois Chollet and an easy to use deployment module.

Installation

Make sure you have a GPU as the training is very compute intensive

(OSX) Install gcc: brew install gcc
Install CUDA_toolkit 7.5
Install cuDNN 4
Install Theano, using sudo pip install git+git://github.com/Theano/Theano.git
Install OpenCV
Install hdf5 library (libhdf5-dev)
Make sure you have Python 2.7.6 and virtualenv installed on your system
Install Python dependencies

$ virtualenv --python=python2 --system-site-packages env
$ . env/bin/activate
$ pip install -r requirements.txt

Training and deploying

To download images

Use google_image_scraper.py to download images. It takes a .csv file of the categories you want, and downloads a number of images per line.

The first line of the .csv file will be ignored.

The number of images per category is configurable. We suggest a number between 200-1000:

$ google_image_scraper.py -n 200 yourfilehere.csv

Easy Mode:

(For users that have a list of categories available at hand):

Create a .csv file with one category per line of what you want the scraper to search for.
Now let's download some images! Run python google_image_scraper.py yourfilehere.csv

Hacker Mode:

(For users that know an online repo that has their categories and want to fetch them, or if their categories are too many and you MUST automate the procedure, or if you much rather code stuff rather than copy and paste)

Write a script that can fetch your categories using Wikipedia or any other resource you would like. For an example look at examples/anime_names.py to see what we used to get our categories.
Have your script create a .csv file with the categories you require.
Then run python google_image_scraper.py yourfilehere.csv

To extract and preprocess data ready for training

Once you have your data ready, run python train.py extract_data to get all of your data ready and saved in HDF5 files.

To train your network

Once all of the above have been met then you are ready to train your network, by running python train.py --run to load data from HDF5 files or python train.py --run --extract_data to extract data and train in one procedure.
If you want to continue training a model, you can. After each epoch the weights are saved. If you want to continue training simply run python train.py --run --continue

Deploying a model

Once your training has finished and a good model has been trained then you can deploy your model.
To deploy a model on a single URL image use python deploy.py --URL [URL_LINK]
To deploy a model on a folder full of images use python deploy --image-folder path/to/folder
To deploy a model on a single file use python deploy --image-path path/to/file

Once deployed the model should return the top 5 predictions on each image in a nice string formatted view: e.g.

Image Name: Tengen.Toppa.Gurren-Lagann.full.174481.jpg
Categories:
0. Gurren Lagann: 0.999914288521
1. Kill La Kill: 7.29278544895e-05
2. Naruto: 4.92283288622e-06
3. Redline: 2.71744352176e-06
4. Cowboy Bebop: 1.41406655985e-06
_________________________________________________

Things for you to try

Create your own classifiers
Try different model architectures (Hint: go to google scholar or arxiv and search for GoogLeNet, VGG-Net, AlexNet, ResNet and follow the waves :) )

Twitter bot

deepanimebot/bot.py is a Twitter bot that provides an interface for querying the classifier.

Running the bot locally

Prerequisites

A classifier
A Twitter app registered under the bot account
Consumer key and secret for that app
Your access token and secret for that app

Copy bot.ini.example to bot.ini and overwrite with your consumer key/secret and access token/secret.

Run it

$ PYTHONPATH=. python deepanimebot/bot.py -c bot.ini --debug --classifier=local

python deepanimebot/bot.py --help will list all available command line options.

Web interface

deepanimebot/webapp.py is a Flask app for querying the classifier.

$ PYTHONPATH=. python deepanimebot/webapp.py

Deploying to Google Cloud Platform

This repo comes with the necessary support files for deploying the Twitter bot and/or the web app to Google Cloud Platform.

Prerequisites

A classifier
Twitter app credentials (see above)
Docker tools and an account on a docker registry
Google Cloud SDK
A Google Cloud Platform project

Building and registering your own Docker image

classificationbot/base:latest comes with all the dependencies installed. If you've modified the code and added a new dependency, make a new Docker image based on the dockerfiles in this repo.

This repo's base images are built with these commands:

$ docker build -t classificationbot/base:latest -f dockerfiles/base/Dockerfile .
$ docker push classificationbot/base:latest

$ docker build -t classificationbot/ci:latest -f dockerfiles/ci/Dockerfile .
$ docker push classificationbot/ci:latest

Deploying

There are two options:

(Not used anymore) Google Compute Engine, container-optimized instance, supervisord + tweepy: bot-standalone
Google Container Engine, kubernetes, gunicorn + flask + tweepy: follow this gist

Special Thanks

Special thanks to Francois Chollet (fchollet) for building the superb Keras deep learning library. We couldn't have brought a project ready to be used by non-machine learning people if it wasn't for the ease of use of Keras.

Special thanks to https://github.com/shuvronewscred/ for building the image scraper we adapted for our project. Original source code can be found at https://github.com/shuvronewscred/google-search-image-downloader