Welcome to the Classification Bot codebase. Classification Bot is an attempt of simplifying the collection, extraction and preprocessing of data as well as providing an end to end pipeline for using them to train large deep neural networks.
The system is composed of scrapers, data extractors, preprocessors, deep neural network models using Keras provided by Francois Chollet and an easy to use deployment module.
Make sure you have a GPU as the training is very compute intensive
brew install gcc
sudo pip install git+git://github.com/Theano/Theano.git
$ virtualenv --python=python2 --system-site-packages env
$ . env/bin/activate
$ pip install -r requirements.txt
Use google_image_scraper.py
to download images. It takes a .csv file of the categories you want, and downloads a number of images per line.
The first line of the .csv file will be ignored.
The number of images per category is configurable. We suggest a number between 200-1000:
$ google_image_scraper.py -n 200 yourfilehere.csv
(For users that have a list of categories available at hand):
python google_image_scraper.py yourfilehere.csv
(For users that know an online repo that has their categories and want to fetch them, or if their categories are too many and you MUST automate the procedure, or if you much rather code stuff rather than copy and paste)
examples/anime_names.py
to see what we used to get our categories.python google_image_scraper.py yourfilehere.csv
python train.py extract_data
to get all of your data ready and saved in HDF5 files.python train.py --run
to load data from HDF5 files or python train.py --run --extract_data
to extract data and train in one procedure.python train.py --run --continue
python deploy.py --URL [URL_LINK]
python deploy --image-folder path/to/folder
python deploy --image-path path/to/file
Once deployed the model should return the top 5 predictions on each image in a nice string formatted view: e.g.
Image Name: Tengen.Toppa.Gurren-Lagann.full.174481.jpg
Categories:
0. Gurren Lagann: 0.999914288521
1. Kill La Kill: 7.29278544895e-05
2. Naruto: 4.92283288622e-06
3. Redline: 2.71744352176e-06
4. Cowboy Bebop: 1.41406655985e-06
_________________________________________________
deepanimebot/bot.py
is a Twitter bot that provides an interface for querying the classifier.
Copy bot.ini.example
to bot.ini
and overwrite with your consumer key/secret and access token/secret.
$ PYTHONPATH=. python deepanimebot/bot.py -c bot.ini --debug --classifier=local
python deepanimebot/bot.py --help
will list all available command line options.
deepanimebot/webapp.py
is a Flask app for querying the classifier.
$ PYTHONPATH=. python deepanimebot/webapp.py
This repo comes with the necessary support files for deploying the Twitter bot and/or the web app to Google Cloud Platform.
classificationbot/base:latest
comes with all the dependencies installed.
If you've modified the code and added a new dependency,
make a new Docker image based on the dockerfiles in this repo.
This repo's base images are built with these commands:
$ docker build -t classificationbot/base:latest -f dockerfiles/base/Dockerfile .
$ docker push classificationbot/base:latest
$ docker build -t classificationbot/ci:latest -f dockerfiles/ci/Dockerfile .
$ docker push classificationbot/ci:latest
There are two options:
Special thanks to Francois Chollet (fchollet) for building the superb Keras deep learning library. We couldn't have brought a project ready to be used by non-machine learning people if it wasn't for the ease of use of Keras.
Special thanks to https://github.com/shuvronewscred/ for building the image scraper we adapted for our project. Original source code can be found at https://github.com/shuvronewscred/google-search-image-downloader