ACAN: Attention-based Context Aggregation Model for Monocular Depth Estimation.

Pytorch implementation of ACAN for monocular depth estimation.
More detalis arXiv

Architecture


Visualization of Attention Maps


Soft Inference VS Hard Inference


Requirements

torch=0.4.1
torchvision
tensorboardX
pillow
tqdm
h5py
scikit-learn
cv2

This code was tested with Pytorch 0.4.1, CUDA 9.1 and Ubuntu 18.04.
Training takes about 48 hours with the default parameters on the KITTI dataset on a Nvidia GTX1080Ti machine.

Data

There are two main datasets available:

KITTI

We used Eigen split of the data, amounting for approximately 22k training samples, you can find them in the kitti_path_txt folder.

NYU v2

We download the raw dataset, which weights about 428GB. We use the toolbox of NYU v2 to sample around 12k training samples, you can find them in the matlab folder and use Get_Dataset.m to produce the training set or download the processed dataset from BaiduCloud.

Training

Warning: The input sizes need to be mutiples of 8.

bash ./code/train_nyu_script.sh

Testing

bash ./code/test_nyu_script.sh

Attention Map

If you want to get the task-specific attention maps, you should first train your model from scratch, then finetuning with attention loss, by setting

BETA=1
RESUME=./workspace/log/best.pkl
EPOCHES=10

Thanks to the Third Party Libs

Non-local_pytorch

Pytorch-OCNet

NConv-CNN