This is a repo for course project of DD2424 Deep Learning in Data Science at KTH.
This project is a GoogLeNet Implementation of Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 in TensorFlow. Another Tensorflow implementation: FCN.tensorflow.
Our project is mainly based on these previous works and we performed several changes from them. We attach our report and slides (with several introductory pages skipped for presentation) here for reference.
Model Downloads
We provide models trained on two different datasets - PASCAL VOC 2012 and MIT Scene Parsing. Please download the corresponding folder, rename it to logs
and put them in your local repo to replace the old one. For more details, please read the subsection Visualize and test results.
Detailed Origins
In the original paper Fully Convolutional Networks for Semantic Segmentation, the authors mentioned several results of FCN-GoogLeNet and compared them with FCN-VGG16. The results showed a worse performance of GoogLeNet than VGG16 in semantic segmentation tasks. Two things make this conclusion questionable:
Given the above two points, we are quite curious about how it would perform if a public version of GoogLeNet is actually put into use, and it would also be a good practice to fill the vacancy of open-source FCN-GoogLeNet. That's basically why we make this repo.
First download model checkpoints (PASCAL VOC and MIT Scene Parsing) we've trained and put it in folder logs
and replace any other checkpoints if exist. Note if directory /logs/all
doesn't exist, please create it by mkdir FCN-GoogLeNet/logs/all
. Then change the flags tf.flags.DEFINE_string('mode', "visualize", "Mode: train/ test/ visualize")
at the beginning of the script inception_FCN.py
to set the mode to visulize or test results. After that, run python inception_FCN.py
from terminal to start running. The segmentation results are saved in the folder results
.
After training FCN (or downloading our models), you can launch Tensorboard by typing in tensorboard --logdir=logs/all
in terminal when you are inside the folder FCN-GoogLeNet
. Then open your web browser and navigate to localhost:6060
. Graph of pixelwise training loss and validation loss is expected to view now.
The following operations are needed if you want to train your own model from scratch.
First delete all the files in /logs
and /logs/all
. After this, you need to provide the path to a checkpoint from which to fine-tune. You can download the checkpoint of inception v3 model and correspondingly change tf.flags.DEFINE_string('checkpoint_path', '/path/to/checkpoint', 'The path to a checkpoint from which to fine-tune.')
. To avoid problems, it's better to directly copy the inception v3 model to /logs
and change the above flag to tf.flags.DEFINE_string('checkpoint_path', 'logs/inception_v3.ckpt', ...)
, although this seems to be an unclever way. To train whole net, we need two steps:
(1) Add upsampling layer on the top of inception v3; freeze lower layers and just train the output layer of the pretrained model and the upsampling layers:
To acheive this, change tf.flags.DEFINE_string('trainable_scopes', ...)
to 'InceptionV3/Logits,InceptionV3/Upsampling'
. Make sure you've set the flag of skip_layers
to the architecture you want. Set mode to train and run inception_FCN.py
. If the code is planned to run on PDC clusters, run sbatch ./batchpyjobunix.sh
to submit your job to the queuing system Slurm.
(2) Fine-tune all the variables:
Change tf.flags.DEFINE_string('trainable_scopes', ...)
to be None. Also remember to change tf.flags.DEFINE_string('checkpoint_path', ...)
to 'logs'
. Run inception_FCN.py
again. If the code is planned to run on PDC clusters, run sbatch ./batchpyjobunix.sh
to submit your job to the queuing system Slurm.
To train and test FCN on MIT Scene Parsing, two scripts should be changed manually as follow. Afterwards, you can play around with this new dataset according to the steps mentioned above.
(1) Script inception_FCN.py
:
read_MITSceneParsingData
and comment out read_PascalVocData
(line 8-9);data_dir
to the path of MIT Scene Parsing (line 20-21);NUM_OF_CLASSES = 151
(line 59).(2) Script BatchDatSetReader.py
:
...[np.expand_dims(self._transform(filename['annotation'], True), ...)
to ...[np.expand_dims(self._transform(filename['annotation'], False), ...)
(line 39).
http://techtalks.tv/talks/fully-convolutional-networks-for-semantic-segmentation/61606/
http://cs231n.github.io/convolutional-networks/#convert
This guy's Blog and his TensorFlow Image Segmentation can be useful.