This project contains code to train and run a neural network to detect cat faces in videos. The network uses a pretrained ResNet-18 with รก trous trick as its core and adds three additional convolutional layers on top of that. It predicts heatmaps of face locations and derives bounding boxes from those outputs. The model does not use an RPN (region proposal network). Runtime is around 30-60ms per frame on medium hardware (though only ~5ms of that is down to the CNN, so there is a lot of room for improvement). Implementation is done in PyTorch.
Example video of detected bounding boxes:
Example video of the training progress:
sudo pip install imgaug
)/foo/bar/10k-cats
. That directory should contain the subdirectories CAT_00
, CAT_01
, etc.git clone https://github.com/aleju/cat-bbs.git
cd cat-bbs
python create_dataset.py --dataset_dir="/foo/bar/10k-cats"
python train.py
python predict_video.py --video="/path/to/video.mp4" --conf=0.7 size=400"
conf
is the confidence threshold of bounding boxes (higher values lead to less bounding boxes shown).size
is the size of the images to feed through the network (higher value lead to smaller cat faces being spotted).<repository-directory>/outputs/videos/<video-filename>/%05d.jpg
.cd <repository-directory>/outputs/videos
and then avconv -i "<video-filename>/%05d.jpg" -b:v 1000k "<video-filename>.mp4"
(you might have to replace avconv
with ffmpeg
, depending on what is installed on your system - parameters are the same for both).