Visualization of Convolutional Neural Networks for Monocular Depth Estimation

Junjie Hu, Yan Zhang, Takayuki Okatani, "Visualization of Convolutional Neural Networks for Monocular Depth Estimation," ICCV, 2019. paper

Introduction

We attempt to interpret CNNs for monocular depth estimation. To this end, we propose to locate the most relevant pixels of input image to depth inference. We formulate it as an optimization problem of identifying the smallest number of image pixels from which the CNN can estimate a depth map with the minimum difference from the estimate from the entire image.

Predicted Masks

Extensive experimental results show

The behaviour of CNNs that they seem to select edges in input images depending not on their strengths but on importance for inference of scene geometry.
The tendency of attending not only on the boundary but the inside region of each individual object.
The importance of image regions around the vanishing points for depth estimation on outdoor scenes.

Please check our paper for more details.

Dependencies

python 2.7
pytorch 0.3.1

Running

Download the trained networks for depth estimation : Depth estimation networks

Download the trained networks for mask prediction : Mask prediction network

Download the NYU-v2 dataset: NYU-v2 dataset

Test

python test.py
Train

python train.py

Citation

If you use the code or the pre-processed data, please cite:

@inproceedings{Hu2019VisualizationOC,
  title={Visualization of Convolutional Neural Networks for Monocular Depth Estimation},
  author={Junjie Hu and Yan Zhang and Takayuki Okatani},
  booktitle={IEEE International Conf. on Computer Vision (ICCV)},
  year={2019}
}

@inproceedings{Hu2018RevisitingSI,
  title={Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries},
  author={Junjie Hu and Mete Ozay and Yan Zhang and Takayuki Okatani},
  booktitle={IEEE Winter Conf. on Applications of Computer Vision (WACV)},
  year={2019}
}