BvS

Dawn of AI
An Image classifier to identify whether the given image is Batman or Superman using a CNN with high accuracy. (Without using Dogs Vs Cats, From getting images from google to saving our trained model for reuse.)

What are we gonna do:

Setup:

Indepth explanation of each section:
Medium post with detailed step by step explanation for deeper understanding of CNNs and architecture of the network.

Data:

Collect data:

Augmentation:

Standardize:

Architecture:

A Simple Architecture:

For detailed explanation of Architecture and CNNs please read the medium post.
I've explained CNNs in depth over there, I highly recommend reading it.

In code:

      #level 1 convolution
      network=model.conv_layer(images_ph,5,3,16,1)
      network=model.pooling_layer(network,5,2)
      network=model.activation_layer(network)

      #level 2 convolution
      network=model.conv_layer(network,4,16,32,1)
      network=model.pooling_layer(network,4,2)
      network=model.activation_layer(network)

      #level 3 convolution
      network=model.conv_layer(network,3,32,64,1)
      network=model.pooling_layer(network,3,2)
      network=model.activation_layer(network)

      #flattening layer
      network,features=model.flattening_layer(network)

      #fully connected layer
      network=model.fully_connected_layer(network,features,1024)
      network=model.activation_layer(network)

      #output layer      
      network=model.fully_connected_layer(network,1024,no_of_classes)

A Brief Architecture:

With dimentional informations:

Training:

Update :You can get the data folder itself from here(50mb). Just download and extract!.

Our file structure should look like this,

data folder will be generated automatically by trainer.py from raw_data if data folder does not exist.

Saving our model:

Once training is over, we can see a folder named checkpoints is created which contains our model for which we trained. These two simple lines does that for us in tensorflow:

      saver = tf.train.Saver(max_to_keep=4)
      saver.save(session, model_save_name)  

You can get my pretrained model here.

We have three files in our checkpoints folder,

How to use it? Tensorflow is so well built that, it does all the heavy lifting for us. We just have to write four simple lines to load and infer our model.

      #Create a saver object to load the model
      saver = tf.train.import_meta_graph
                                      (os.path.join(model_folder,'.meta'))
      #restore the model from our checkpoints folder
      saver.restore(session,os.path.join('checkpoints','.\\'))
      #Create graph object for getting the same network architecture
      graph = tf.get_default_graph()
      #Get the last layer of the network by it's name which includes all the previous layers too
      network = graph.get_tensor_by_name("add_4:0")

Yeah, simple. Now that we got our network as well as the tuned values, we have to pass an image to it using the same placeholders(Image, labels).

im_ph= graph.get_tensor_by_name("Placeholder:0")
label_ph = graph.get_tensor_by_name("Placeholder_1:0")

If you run it now, you can see the output as [1234,-4322] like that. While this is right as the maximum value index represents the class, this is not as convenient as representing it in 1 and 0. Like this [1,0]. For that we should include a line of code before running it,

network=tf.nn.sigmoid(network)

While we could have done this in our training architecture itself and nothing would have changed, I want to show you that, you can add layers to our model even now, even in prediction stage. Flexibility.

Inference time:

Your training is nothing, If you don't have the will to act - Ra's Al Ghul.

To run a simple prediction,

You can see the results as [1,0]{Batman}, [0,1]{Superman} corresponding to the index.
Please note that this is not one-hot encoding.

Accuracy:

It is actually pretty good. It is almost right all the time. I even gave it an image with both Batman and Superman, it actually gave me values which are almost of same magnitude(after removing the sigmoid layer that we added just before).

Comment out network=tf.nn.sigmoid(network) in predict.py to see the real magnitudes as this will only give squashed outputs.

From here on you can do whatever you want with those values.
Initially loading the model will take some time(70 seconds) but once the model is loaded, you can put a for loop or something to throw in images and get output in a second or two!

Tensorboard:

I have added some additional lines in the training code for tensorboard options. Using tensorboard we can track progress of our training even while training and after. You can also see your network structure and all the other components inside it.It is very useful for visualizing the things happening. To start it, just go to the directory and open command line,

tensorboard --logdir checkpoints

You should see the following,

Now type the same address in in your browser. Your tensorboard is now started. Play with it.

Graph Structure Visualization:

Yeah, you can see our entire model with dimensions in each layer and operations here!

Future Implementations:

While this works for Binary classification, it will also work for Multiclass classification but not as well. We might need to alter architecture and build a larger model depending on the number of classes we want.

So, that's how Batman wins!

Batwin

Please Star the repo if you like it.
For any suggestions, doubts, clarifications please mail: ipaar3@gmail.com or raise an issue!.