bird-species-classification

Inter species classification

Updates: Paper accepted at 3rd WCVA Workshop, 11th ICVGIP'18 Conference :grimacing:

:relaxed:CHALLENGE WINNERS:relaxed:

This is an implementation of bird species classification challenge hosted by IIT Mandi in ICCVIP Conference'18 on Python 3 and Keras with Tensorflow backend. The architecture consists of Mask R-CNN and ImageNet models end-to-end. ImageNet models used are Inception V3 and Inception ResNet V2.

main_image

The repository includes:

The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below).

Getting Started

Step by Step Classification

To help running the model, end to end a docx has been added in case much information about each funcation and thier parameters are required. Here are the steps in summary:

Citation

If you use this repository, please use this bibtex to cite the paper:

@InProceedings{10.1007/978-981-15-1387-9_3,
author="Kumar, Akash
and Das, Sourya Dipta",
editor="Arora, Chetan
and Mitra, Kaushik",
title="Bird Species Classification Using Transfer Learning with Multistage Training",
booktitle="Computer Vision Applications",
year="2019",
publisher="Springer Singapore",
address="Singapore",
pages="28--38",
isbn="978-981-15-1387-9"
} 

Some important sub-parts are discussed below:

Dataset

In this repo, I have used the dataset fom the ICCVIP'18 Bird Species Classification Challenge. Training dataset contains 150 images and test dataset contains 158 images with 1 image corrupted. There are total 16 species of birds to be classified. The resolution of the images lies in between 800x600 to 6000x4000.

Data Augmentation

Data Augmentation has been done using imgaug.Table for data Augmentation done for different species is shared in data_augmentation folder.

Mask R-CNN

Mask R-CNN on the whole image helped to localize birds in the image. Below are the examples of the birds detection from a high resolution image. As the Mask R-CNN is trained on COCO dataset and it has bird class, it carves out bird ROIs very perfectly. More than 140 images were able to give successfull cropped bird images out of 150 images.

mask_rcnn

Challenges

As a new dataset always have some problems whereas some major challenges too: 1) The training dataset mostly contains bird images in which bird were almost 10-20% of the whole image whereas in case of test images the bird contains 70-80% of the image. Sometimes, the model fails to detect the birds due to less number of birds in the dataset. 2) In some classes the birds cover not even 10% of the whole images or the colour of bird and surrounding are ver similar. Cases where birds are brown in colour. In those cases, model fails to localize birds due to occlusion problem or background similarity problems. Some cases as follows:
drawbacks

Experiments

I tried multi-stage training with training on original images first and then on crops and on crops and then on original images. Firstly, training on the images and then on the crops gave the better results. As well as for testing, we use Inception V3 crops weights and Inception ResNet V2 crops+images weights to identify the specie of bird.
Please find the weight file for 7 epochs a follows:
[1] inception_v3_crops.h5 - Trained only on cropped images.
[2] inception_v3_crops+images.h5 - Trained on Images plus crops.
[3] inception_resnet_v2_images.h5 - Trained on Images only.
[4] inception_resnet_v2_images+crops.h5 - Trained on Images + crops for 7 epochs.

We could have trained it for more epochs but it was not giving significant iprovements in the results at all.

Model Architecture

The architecture of the model is as below: model_architecture

Test Results

Results on the test data after Multi-stage training:

Model Architecture Data Subset Train Validation Test
Inception V3 Images 91.26 12.76 30.95
Inception V3 Images + Crops 93.97 15.50 41.66
Inception ResNet V2 Images 97.29 29.17 47.96
Inception ResNet V2 Images + Crops 92.29 33.69 49.09

Evaluation on test data in terms of class-averaged Precision, Recall and F1-scores:

Model Architecture Precision Recall F1
Mask R-CNN + Inception V3 48.61 45.65 47.09
Mask R-CNN + Inception ResNet V2 53.62 48.72 51.05
Mask R-CNN + Ensemble 56.58 54.8 55.67

Final Confusion Matrix: final_confusion_matrix

Hope it helps!!! If youmake any progress on the dataset or face any problems, please let me know. :relaxed:

Extras

The dataset is uploaded on Kaggle and the link is shared as follow: Dataset

Description of all the codes have been shared in this PDF

Medium Post: Bird Species Classification in High Resolution Images

Paper uploaded on arXiv: prePrint Version: Bird Species Classification with Transfer Learning using Multistage Training

References

[1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, " Rethinking the Inception Architecture for Computer Vision" arXiv preprint arXiv:1512.00567.
[2] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi, "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning" arXiv preprint arXiv:1602.07261.
[3] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, "Mask R-CNN" arXiv preprint arXiv:1703.06870.
[4] Mask R-CNN Github repo. "Link"