Official Pytorch Implementation of Semantic-Aware Scene Recognition by Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós and Álvaro García-Martín (Elsevier Pattern Recognition).
This paper propose to improve scene recognition by using object information to focalize learning during the training process. The main contributions of the paper are threefold:
The propose CNN architecture is as follows:
RGB | Semantic | Top@1 | Top@2 | Top@5 | MCA |
---|---|---|---|---|---|
✓ | 55.90 | 67.25 | 78.00 | 20.96 | |
✓ | 50.60 | 60.45 | 72.10 | 12.17 | |
✓ | ✓ | 62.55 | 73.25 | 82.75 | 27.00 |
Method | Backbone | Number of Parameters | Top@1 |
---|---|---|---|
PlaceNet | Places-CNN | 62 M | 68.24 |
MOP-CNN | CaffeNet | 62 M | 68.90 |
CNNaug-SVM | OverFeat | 145 M | 69.00 |
HybridNet | Places-CNN | 62 M | 70.80 |
URDL + CNNaug | AlexNet | 62 M | 71.90 |
MPP-FCR2 | AlexNet | 62 M | 75.67 |
DSFL + CNN (7 Scales) | AlexNet | 62M | 76.23 |
MPP + DSFL | AlexNet | 62 M | 80.78 |
CFV | VGG-19 | 143 M | 81.00 |
CS | VGG-19 | 143 M | 82.24 |
SDO (1 Scale) | 2 x VGG-19 | 276 M | 83.98 |
VSAD | 2 x VGG-19 | 276 M | 86.20 |
SDO (9 Scales) | 2 x VGG-19 | 276 M | 86.76 |
Ours | ResNet-18 + Sem Branch + G-RGB-H | 47 M | 85.58 |
**Ours*** | ResNet-50 + Sem Branch + G-RGB-H | 85 M | 87.10 |
Method | Backbone | Number of Parameters | Top@1 |
---|---|---|---|
Decaf | AlexNet | 62 M | 40.94 |
MOP-CNN | CaffeNet | 62 M | 51.98 |
HybridNet | Places-CNN | 62 M | 53.86 |
Places-CNN | Places-CNN | 62 M | 54.23 |
Places-CNN ft | Places-CNN | 62 M | 56.20 |
CS | VGG-19 | 143 M | 64.53 |
SDO (1 Scale) | 2 x VGG-19 | 276 M | 66.98 |
VSAD | 2 x VGG-19 | 276 M | 73.00 |
SDO (9 Scale) | 2 x VGG-19 | 276 M | 73.41 |
Ours | ResNet-18 + Sem Branch + G-RGB-H | 47 M | 71.25 |
**Ours*** | ResNet-50 + Sem Branch + G-RGB-H | 85 M | 74.04 |
Network | Number of Parameters | Top@1 | Top@2 | Top@5 | MCA |
---|---|---|---|---|---|
AlexNet | 62 M | 47.45 | 62.33 | 78.39 | 49.15 |
AlexNet* | 62 M | 53.17 | - | 82.59 | - |
GooLeNet* | 7 M | 53.63 | - | 83.88 | - |
ResNet-18 | 12 M | 53.05 | 68.87 | 83.86 | 54.40 |
ResNet-50 | 25 M | 55.47 | 70.40 | 85.36 | 55.47 |
ResNet-50* | 25 M | 54.74 | - | 85.08 | - |
VGG-19* | 143 M | 55.24 | - | 84.91 | - |
DenseNet-161 | 29 M | 56.12 | 71.48 | 86.12 | 56.12 |
Ours | 47 M | 56.51 | 71.57 | 86.00 | 56.51 |
The repository has been tested in the following software versions.
Clone repository running the following command:
$ git clone https://github.com/vpulab/Semantic-Aware-Scene-Recognition.git
To create and setup the Anaconda Envirmorent run the following terminal command from the repository folder:
$ conda env create -f Config/Conda_Env.yml
$ conda activate SA-Scene-Recognition
Download and setup instructions for each datasets are provided in the follwing links:
In order to evaluate the models independently, download them from the following links and indicate the path in YAML configuration files (Usually /Data/Model Zoo/DATASET FOLDER
).
[Recommended] Alternatively you can run the following script from the repository folder to download all the available Model Zoo:
bash ./Scripts/download_ModelZoo.sh
ADE20K
MIT Indoor 67
SUN 397
Places 365
In order to evaluate models run evaluation.py
file from the respository folder indicating the dataset YAML configuration path:
python evaluation.py --ConfigPath [PATH to configuration file]
Example for ADE20K Dataset:
python evaluation.py --ConfigPath Config/config_ADE20K.yaml
All the desired configuration (backbone architecture to use, model to load, batch size...etc) should be changed in each separate YAML configuration file.
Computed performance metrics for both training and validation sets are:
If you find this code and work useful, please consider citing:
@article{lopez2020semantic,
title={Semantic-Aware Scene Recognition},
author={L{\'o}pez-Cifuentes, Alejandro and Escudero-Vi{\~n}olo, Marcos and Besc{\'o}s, Jes{\'u}s and Garc{\'\i}a-Mart{\'\i}n, {\'A}lvaro},
journal={Pattern Recognition},
pages={107256},
year={2020},
publisher={Elsevier}
}
This study has been partially supported by the Spanish Government through its TEC2017-88169-R MobiNetVideo project.