Inter species classification
Updates: Paper accepted at 3rd WCVA Workshop, 11th ICVGIP'18 Conference :grimacing:
This is an implementation of bird species classification challenge hosted by IIT Mandi in ICCVIP Conference'18 on Python 3 and Keras with Tensorflow backend. The architecture consists of Mask R-CNN and ImageNet models end-to-end. ImageNet models used are Inception V3 and Inception ResNet V2.
The repository includes:
The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below).
pip install -r requirements.txt
To help running the model, end to end a docx has been added in case much information about each funcation and thier parameters are required. Here are the steps in summary:
From the original training data, firstly several data augmentation were taken careof as the dataset contains only 150 images. The number of images were increased to around 1330. The huge number of parameters were unable to learn and generalize in case of validaion data. This also helps in decreasing the effect of class imbalance. Some classes have 6 images whereas some have around 20 images.
After the data augmentation, validation dataset is created. 10% of each bird species were taken into validation data for model performance testing.
The model were trained on various Imagenet models such as AlexNet, VGG-16/19, ResNet50, Inception V3 and Inception ResNet V2 with pretrained Imagenet weights. Inception ResNet V2 outperforms them all.
Multi-stage training comes after that. Used Mask R-CNN to localize birds in the images in their original resolution. Single Shot Detector and YOLO were also used but they needs to be resized images into 416x416 or 512x512 due to which many information is lost. Mask R-CNN code and modules are well explained in this github repo. As Mask R-CNN is trained on COCO dataset, and COCO has a class of bird specie, it helped me to crop birds in most of the cases.
After getting the crops, data augmentation on the cropped images were done and dataset was increased to around 1600 images. Performed multi-stage training with both the dataset of cropped images as well as original images. It helps to improve the accuracy by around 10% in case of Inception V3 model and 2% in Inception ResNet V2.
Created an architecture end-to-end of Mask R-CNN and Trained Inception models for testing purposes. All the testing images were first passed through Mask R-CNN. After that, it splits into two cases:
After applying Mask R-CNN for both, using confusion matrix Inception V3 performs better in some classes than Inception ResNet V2. Using ensembling, by taking the prediction vector ofboth the models compared them and then finally assign the class to the image whosoever has the highest prediction for certain species. This helped to improve the accuracy by almost 5% from 51 to around 56%. Tables are dicussed below.
If you use this repository, please use this bibtex to cite the paper:
@InProceedings{10.1007/978-981-15-1387-9_3,
author="Kumar, Akash
and Das, Sourya Dipta",
editor="Arora, Chetan
and Mitra, Kaushik",
title="Bird Species Classification Using Transfer Learning with Multistage Training",
booktitle="Computer Vision Applications",
year="2019",
publisher="Springer Singapore",
address="Singapore",
pages="28--38",
isbn="978-981-15-1387-9"
}
Some important sub-parts are discussed below:
In this repo, I have used the dataset fom the ICCVIP'18 Bird Species Classification Challenge. Training dataset contains 150 images and test dataset contains 158 images with 1 image corrupted. There are total 16 species of birds to be classified. The resolution of the images lies in between 800x600 to 6000x4000.
Data Augmentation has been done using imgaug.Table for data Augmentation done for different species is shared in data_augmentation folder.
Mask R-CNN on the whole image helped to localize birds in the image. Below are the examples of the birds detection from a high resolution image. As the Mask R-CNN is trained on COCO dataset and it has bird class, it carves out bird ROIs very perfectly. More than 140 images were able to give successfull cropped bird images out of 150 images.
As a new dataset always have some problems whereas some major challenges too:
1) The training dataset mostly contains bird images in which bird were almost 10-20% of the whole image whereas in case of test images the bird contains 70-80% of the image. Sometimes, the model fails to detect the birds due to less number of birds in the dataset.
2) In some classes the birds cover not even 10% of the whole images or the colour of bird and surrounding are ver similar. Cases where birds are brown in colour. In those cases, model fails to localize birds due to occlusion problem or background similarity problems. Some cases as follows:
I tried multi-stage training with training on original images first and then on crops and on crops and then on original images. Firstly, training on the images and then on the crops gave the better results. As well as for testing, we use Inception V3 crops weights and Inception ResNet V2 crops+images weights to identify the specie of bird.
Please find the weight file for 7 epochs a follows:
[1] inception_v3_crops.h5 - Trained only on cropped images.
[2] inception_v3_crops+images.h5 - Trained on Images plus crops.
[3] inception_resnet_v2_images.h5 - Trained on Images only.
[4] inception_resnet_v2_images+crops.h5 - Trained on Images + crops for 7 epochs.
We could have trained it for more epochs but it was not giving significant iprovements in the results at all.
The architecture of the model is as below:
Results on the test data after Multi-stage training:
Model Architecture | Data Subset | Train | Validation | Test |
---|---|---|---|---|
Inception V3 | Images | 91.26 | 12.76 | 30.95 |
Inception V3 | Images + Crops | 93.97 | 15.50 | 41.66 |
Inception ResNet V2 | Images | 97.29 | 29.17 | 47.96 |
Inception ResNet V2 | Images + Crops | 92.29 | 33.69 | 49.09 |
Evaluation on test data in terms of class-averaged Precision, Recall and F1-scores:
Model Architecture | Precision | Recall | F1 |
---|---|---|---|
Mask R-CNN + Inception V3 | 48.61 | 45.65 | 47.09 |
Mask R-CNN + Inception ResNet V2 | 53.62 | 48.72 | 51.05 |
Mask R-CNN + Ensemble | 56.58 | 54.8 | 55.67 |
Final Confusion Matrix:
Hope it helps!!! If youmake any progress on the dataset or face any problems, please let me know. :relaxed:
The dataset is uploaded on Kaggle and the link is shared as follow:
Dataset
Description of all the codes have been shared in this PDF
Medium Post: Bird Species Classification in High Resolution Images
Paper uploaded on arXiv: prePrint Version: Bird Species Classification with Transfer Learning using Multistage Training
[1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, "
Rethinking the Inception Architecture for Computer Vision" arXiv preprint arXiv:1512.00567.
[2] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" arXiv preprint arXiv:1602.07261.
[3] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, "Mask R-CNN" arXiv preprint arXiv:1703.06870.
[4] Mask R-CNN Github repo. "Link"