This project shows how to localize objects in images by using simple convolutional neural networks.
Before getting started, we have to download a dataset and generate a csv file containing the annotations (boxes).
First, let's look at YOLOv2's approach:
We proceed in the same way to build the object detector:
_inverted_res_block
for MobileNetv2)The code in this repository uses MobileNetv2, because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see here for a comparison). If you use another architecture, change preprocess_input
.
python3 example_1/train.py
example_1/test.py
(given by the last script)python3 example_1/test.py
In the following images red is the predicted box, green is the ground truth:
This time we have to run the scripts example_2/train.py
and example_2/test.py
.
In order to distinguish between classes, we have to modify the loss function. I'm using here w_1*log((y_hat - y)^2 + 1) + w_2*FL(p_hat, p)
where w_1 = w_2 = 1
are two weights and FL(p_hat, p) = -(0.9(1 - p_hat)^2 p*log(p_hat) + 0.1*p_hat^2(1 - p)log(1-p_hat))
(focal loss).
Instead of using all 37 classes, the code will only output class 0 (contains only class 0) or class 1 (contains class 1 to 36). However, it is easy to extend this to more classes (use categorical cross entropy instead of focal loss and try out different weights).
In this example, we use a skip-net architecture similar to U-Net. For an in-depth explanation see my blog post.
This example is based on the three YOLO papers. For an in-depth explanation see this blog post.
example_4
the same code can be added to the other examplesALPHA
and IMAGE_SIZE
in train_model.pyIMAGE_SIZE
BATCH_SIZE
IMAGE_SIZE
and ALPHA