Neural Rerendering in the Wild

Moustafa Meshry1, Dan B Goldman2, Sameh Khamis2, Hugues Hoppe2, Rohit Pandey2, Noah Snavely2, Ricardo Martin-Brualla2.

1University of Maryland, College Park      2Google Inc.

To appear at CVPR 2019 (Oral).

We will provide Tensorflow implementation and pretrained models for our paper soon.

Paper | Video | Code | Project page

Abstract

We explore total scene capture — recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.

Video

Supplementary material video

Appearance variation

We capture the appearance of the original images in the left column, and rerender several viewpoints under them. The last column is a detail of the previous one. The top row shows the renderings part of the input to the rerenderer, that exhibit artifacts like incomplete features in the statue, and an inconsistent mix of day and night appearances. Note the hallucinated twilight scene in the sky using the last appearance. Image credits: Flickr users William Warby, Neil Rickards, Rafael Jimenez, acme401 (Creative Commons).

Appearance interpolation

Frames from a synthesized camera path that smoothly transitions from the photo on the left to the photo on the right by smoothly interpolating both viewpoint and the latent appearance vectors. Please see the supplementary video. Photo Credits: Allie Caulfield, Tahbepet, Till Westermayer, Elliott Brown (Creative Commons).

Acknowledgements

We thank Gregory Blascovich for his help in conducting the user study, and Johannes Schönberger and True Price for their help generating datasets.

Run and train instructions

Staged-training consists of three stages:

Aligned dataset preprocessing

Manual preparation

base_dir=//to/neural_rendering
mkdir $base_dir
cd $base_dir

Data preprocessing

# Run locally
python tools/dataset_utils.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$base_dir/datasets/$dataset_name \
--output_dir=$base_dir/datasets/$dataset_name \
--xception_frozen_graph_path=$base_dir/deeplabv3_xception_ade20k_train/frozen_inference_graph.pb \
--alsologtostderr

Pretraining the appearance encoder network

# Run locally
python pretrain_appearance.py \
  --dataset_name=$dataset_name \
  --train_dir=$base_dir/train_models/$dataset_name-app_pretrain \
  --imageset_dir=$base_dir/datasets/$dataset_name/train \
  --train_resolution=512 \
  --metadata_output_dir=$base_dir/datasets/$dataset_name

Training the rerendering network with a fixed appearance encoder

Set the dataset_parent_dir variable below to point to the directory containing the generated TFRecords.

# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
load_pretrained_app_encoder=true
appearance_pretrain_dir=$base_dir/train_models/$dataset_name-app_pretrain
load_from_another_ckpt=false
fixed_appearance_train_dir=''
train_app_encoder=false

python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000

Finetuning the rerendering network and the appearance encoder

Set the fixed_appearance_train_dir to the train directory from the previous step.

# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-finetune_appearance
load_pretrained_app_encoder=false
appearance_pretrain_dir=''
load_from_another_ckpt=true
fixed_appearance_train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
train_app_encoder=true

python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000

Evaluate model on validation set

experiment_title=$dataset_name-staged-finetune_appearance
local_train_dir=$base_dir/train_models/$experiment_title
dataset_parent_dir=$base_dir/datasets/$dataset_name
val_set_out_dir=$local_train_dir/val_set_output

# Run the model on validation set
echo "Evaluating the validation set"
python neural_rerendering.py \
      --train_dir=$local_train_dir \
      --dataset_name=$dataset_name \
      --dataset_parent_dir=$dataset_parent_dir \
      --run_mode='eval_subset' \
      --virtual_seq_name='val' \
      --output_validation_dir=$val_set_out_dir \
      --logtostderr
# Evaluate quantitative metrics
python evaluate_quantitative_metrics.py \
      --val_set_out_dir=$val_set_out_dir \
      --experiment_title=$experiment_title \
      --logtostderr