3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk...

95
3D Scene Understanding from RGB-D Images Thomas Funkhouser

Transcript of 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk...

Page 1: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

3D Scene Understanding

from RGB-D Images

Thomas Funkhouser

Page 2: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Disclaimer: I am talking about the work of these people …

Shuran Song

Manolis Savva Angel Chang

Yinda Zhang Maciej Halber

Fisher Yu

Andy Zeng Kyle Genova

Cu

rren

t

Ph

.D.

Stu

de

nts

Re

ce

nt

Ph

.D.

Stu

de

nt

Cu

rre

nt

Po

std

oc

s

Page 3: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Motivation

Help devices with RGB-D cameras understand their 3D environments

• Robot manipulation

• Augmented reality

• Virtual reality

• Personal assistance

• Surveillance

• Navigation

• Mapping

• Games

• etc.

Page 4: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Goal

Given a RGB-D image, infer a complete, annotated 3D representation

Input: RGB-D ImageOutput: complete, annotated 3D representation

Colo

r (R

GB

)D

epth

(D

)

Output: complete, annotated 3D representation

Bed

Door

Nightstand Nightstand

Bench

Wall

Wall Picture

Pillow

Free space

Page 5: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Side viewInput: RGB-D Image

Page 6: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Rotating side viewInput: RGB-D Image

Page 7: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Top viewInput: RGB-D Image

Page 8: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Top viewInput: RGB-D Image

Beyond

Field of View

Page 9: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Top viewInput: RGB-D Image

Beyond

Field of View

Occluded

Regions

Page 10: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Missing

Depths

Top viewInput: RGB-D Image

Beyond

Field of View

Occluded

Regions

Page 11: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Top view

Missing

Depths

Structure

Free space

Input: RGB-D Image

Beyond

Field of View

Occluded

Regions

Page 12: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Problem

Challenge: get only partial observation of scene, must infer the rest

Top view

Bed

Door

Nightstand Nightstand

Bench

Wall

Wall Picture

Pillow

Missing

Depths

Semantics

Structure

Free space

Input: RGB-D Image

Beyond

Field of View

Occluded

Regions

Page 13: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future work

Page 14: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline (Part 1)

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future work

Yinda Zhang and Thomas Funkhouser,

“Deep Depth Completion of a Single RGB-D Image,”

CVPR 2018 (spotlight on Tuesday)

Page 15: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Goal: estimate depths missing from an RGB-D image

Color (RGB)

Raw Depth (D)

Output Depth (D)

Page 16: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Goal: estimate depths missing from an RGB-D image

Color (RGB)

Raw Depth (D) from Intel R200 camera

Missing

Depth

Shiny

Surfaces

Bright

illumination

Distant

Surfaces

Thin

Structures

Black

Surfaces

Page 17: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Motivation: help upstream applications “understand” 3D environment

Raw Depth Output Depth

RGB-D images shown as colored 3D point clouds

Page 18: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Previous work on depth estimation (from RGB):

Sparsity Invariant CNNs[Uhrig, 2017]

Previous work on depth completion (from RGB-D):

Deeper Depth Prediction[Laina, 2016]

Harmonizing Overcomplete Predictions[Chakrabarti, 2016]

Joint Bilateral Filter[Silberman, 2012]

Page 19: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Problem: estimating depth from color requires global scene understanding

Output DepthInput Color

FCN

Page 20: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Approach: estimate local surface normals from color,

and then solve for depths globally with system of equations

Output Depth

Input Depth

Input Color Surface Normals

FCNSystem ofEquations

Page 21: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 1: estimating surface normals is easier than estimating depths

• Constant within planar regions

• Determined by local shading (for diffuse surfaces)

• Often associated with specific textures

Color Estimated Surface Normals

Y. Zhang, S. Song, E. Yumer, M. Savva, J.-Y. Lee, H. Jin, T. Funkhouser, “Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks,” CVPR 2017

Page 22: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 2: depths can be estimated robustly from normals

• Solution is unique for each continuously connected component (up to scale)

r

q

N(p)

p

Non-linear system of equations:

N(p) = (v(p,q) x v(p,r))/||(v(p,q) x v(p,r))||

Linear approximation:

N(p) • v(p,q) = 0

N(p) • v(p,r) = 0

Page 23: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 2: depths can be estimated robustly from normals

• Solution is unique for each continuously connected component (up to scale)

r

q

N(p)

p

Page 24: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 2: depths can be estimated robustly from normals

• Real-world scenes generally have few (one) continuously connected components

Page 25: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 2: depths can be estimated robustly from normals

• We use observed depths and smoothness constraints to guarantee a solution

r

q

N(p)

p

Page 26: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion

Rationale 2: depths can be estimated robustly from normals

• Solving the linearized equations guarantees a globally optimal solution

Output Depth

Input Depth

Input Color Surface Normals

FCN

LinearSystem ofEquations

Page 27: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion: Data

Where get real training/test data?

Color Raw Depth

Missing

Depth

Page 28: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion: Data

Where get real training/test data?

• Complete depths by

rendering RGB-D SLAM

surface reconstructions

(ScanNet, Matteport3D)

ScanNet Surface Reconstruction

Color Raw Depth

A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017

Page 29: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion: Data

Where get real training/test data?

• Complete depths by

rendering RGB-D SLAM

surface reconstructions

(ScanNet, Matteport3D)

Color Raw Depth

A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017

ScanNet Surface Reconstruction

Page 30: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion: Data

Where get real training/test data?

• Complete depths by

rendering RGB-D SLAM

surface reconstructions

(ScanNet, Matteport3D)

Rendered DepthColor Raw Depth

A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, M. Niessner., “ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017

ScanNet Surface Reconstruction

Page 31: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Completion: Results

Comparisons to other depth completion methods:

[5] J. T. Barron and B. Poole. The fast bilateral solver. ECCV 2016.[6] D. Garcia. Robust smoothing of gridded data in one and higher dimensions with missing values. Comp. stat. & data anal., 2010.[13] Y. Zhang et al. Physically-based rendering for indoor scene understanding using convolutional neural networks. CVPR 2017.[20] D. Ferstl et al. Image guided depth upsampling using anisotropic total generalized variation. ICCV 2013.[64] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. ECCV 2012.

Page 32: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Deep Depth Estimation: Results

Comparison to other depth estimation methods:

Laina [37]

Chakr. [7]

Laina [37]

Chakr. [7]

[7] Chakrabarti, A. et al., Depth from a single image by harmonizing overcomplete local network predictions. NIPS 2016.[37] Laina, C. et al., Deeper depth prediction with fully convolutional residual networks. 3DV 2016.

Page 33: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Color Image Sensor Depth Completed Depth

Sensor Point Cloud Completed Point Cloud

Deep Depth Completion: Results

Intel RealSense R200 examples:

Page 34: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Color Image Sensor Depth Completed Depth

Sensor Point Cloud Completed Point Cloud

Deep Depth Completion: Results

Intel RealSense R200 examples:

Page 35: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline (Part 2)

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future workShuran Song, Fisher Yu, Andy Zeng,

Angel Chang, Manolis Savva, and Thomas Funkhouser,

“Semantic Scene Completion from a Single Depth Image,”

CVPR 2017 (oral)

Page 36: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Input: Single view depth map Output: Semantic scene completion

Semantic Scene Completion

Goal: estimate the semantics and geometry occluded from a depth camera

RGB-D Image

Page 37: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

3D Scene

visible surface

free space

occluded space

outside view

outside room

Semantic Scene Completion

Formulation: given a depth image, label all voxels by semantic class

Page 38: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

visible surface

free space

occluded space

outside view

outside room

3D Scene

Semantic Scene Completion

Formulation: given a depth image, label all voxels by semantic class

Page 39: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

semantic scene completion

This paper

scene completion Firman et al.

surface segmentation Silberman et al.

The occupancy and the object identity

are tightly intertwined !

3D Scene

Semantic Scene Completion

Prior work: segmentation OR completion

Page 40: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion

Approach: end-to-end 3D deep network

Prediction: N+1 classes

Simultaneously predict voxel occupancy and semantics classes by a single forward pass.

Input:

Single view depth map

Output:

Volumetric occupancy + semantics

SSCNet

Page 41: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Network Architecture

Page 42: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Network Architecture

Page 43: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Voxel size: 0.02 m

Semantic Scene Completion: Network Architecture

Page 44: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Voxel size: 0.02 m

Semantic Scene Completion: Network Architecture

Standard TSDFView

Page 45: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Encode 3D space using flipped TSDFVoxel size: 0.02 m

Semantic Scene Completion: Network Architecture

Flipped TSDFStandard TSDFView

Page 46: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m

Semantic Scene Completion: Network Architecture

Extract features for different physical scalesVoxel size: 0.02 m

Page 47: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Network Architecture

Larger receptive field with

same number of parameters

and same output resolution!

Dilated Convolutions

learnable parameterreceptive field

Receptive Field = 7x7x7

Parameters = 27

F. Yu et al., Multi-Scale Context Aggregation by Dilated Convolutions, ICLR 2016

Page 48: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Data

Where get training data?

NYUv2Small number of objects labeled with CAD models

(suitable for testing, not training)

N. Silberman, P. Kohli, D. Hoiem, R. Fergus, Indoor Segmentation and Support Inference from RGBD Images, ECCV 2012

R. Guo, C. Zou, D. Hoiem, Predicting Complete 3D models of Indoor Scenes, arXiv 2015

Page 49: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Data

SUNCG dataset

• 46K houses

• 50K floors

• 400K rooms

• 5.6M object instances

Page 50: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Data

SUNCG dataset

synthetic camera views depth

ground truth

semantic scene

completion

Page 51: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Experiments

Pre-train on SUNCG Fine-tune and test on NYUv2

Page 52: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Results

Ground TruthOur Result

Input Color

Input Depth

Page 53: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Results

Ground TruthOur Result

Input Color

Input Depth

Page 54: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Results

Result 1: better than previous volumetric completion algorithms

Comparison to previous algorithms for volumetric completion

Page 55: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic Scene Completion: Results

Result 2: better than previous semantic labeling algorithms

Comparison to previous algorithms for semantic labeling with 3D model fitting

Page 56: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline (Part 3)

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future workShuran Song, Andy Zeng, Angel X. Chang,

Manolis Savva, Silvio Savarese, and Thomas Funkhouser,

“Im2Pano3D: Extrapolating 360 Structure and Semantics

Beyond the Field of View,”

CVPR 2018 (oral)

Page 57: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Input: RGB-D Image

Semantic View Extrapolation

Goal: given an RGB-D image, predict 3D structure and semantics outside view

Output 1: 3D structure

BedBed

nightstand

door

chair

ceilingceiling

floor

Output 2: semantic segmentation°

360°

Page 58: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation

Input:

RGB-D Image

Page 59: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Wall

Window

Bed

Nightstand

Semantic View Extrapolation

Input:

RGB-D Image

Output:

360° panorama

with 3D structure

& semantics

360°

Page 60: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation

Prior work: extrapolating appearance (color) outside field of view

Pathak et al. CVPR 2017

Page 61: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation

Our work: predicting 3D structure and semantics for full 360° panorama

3D structure

BedBed

nightstand

door

chair

ceilingceiling

floor

Semantic segmentation

360°

Page 62: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation

3D structure representation: plane equation per pixel (normal and offset)

ax + by + cz - d=0

Plane Equation

(a,b,c) = normal d = plane offset from origin

Similar to first project

Page 63: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Network Architecture

Scene attribute losses:

Scene category

Object distribution

Pixel-wise loss

Adversarial loss

Page 64: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Training Objectives

Page 65: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

• Lose the ability to generalize.

• Hard for even humans to do.

Every pixel is

correct

Prediction

Ground truth

Semantic View Extrapolation: Training Objectives

Page 66: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Adversarial loss

Real or fake

Goodfellow et al. 2014

Prediction is

plausible

Prediction

Every pixel is

correct

Semantic View Extrapolation: Training Objectives

G:generator D: discriminator

Page 67: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction is

plausible Similar scene

attributes

Object Distribution

Every pixel is

correct

Scene Category

Semantic View Extrapolation: Training Objectives

Prediction Ground truth

wa

ll

flo

or

ce

ilin

g

ch

air … …

wa

ll

flo

or

ce

ilin

g

ch

air … …

Page 68: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction is

plausible Similar scene

attributeEvery pixel is

correct

Semantic View Extrapolation: Training Objectives

Object Distribution

Scene Category

Prediction Ground truth

Page 69: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Every pixel is

correct

Similar scene

attribute

Prediction is

plausible

Semantic View Extrapolation: Training Objectives

Page 70: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Network Architecture

Scene attribute losses:

Scene category

Object distribution

Pixel-wise loss

Adversarial loss

Page 71: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Data

Where get training/test data?

3D structure

BedBed

nightstand

door

chair

ceilingceiling

floor

Semantic segmentation

Page 72: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Data

Matterport3D dataset

Matterport Camera

3D Building Reconstruction

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017

Page 73: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Data

Matterport3D dataset

Matterport Camera

3D Building Reconstruction

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017

Page 74: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Data

Matterport3D dataset

Matterport Camera

RGB-D Panorama

with Semantics

3D Building Reconstruction

A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y Zhang, “Matterport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017

Page 75: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Experiments

Pre-train on SUNCG

58,866 synthetic panoramas

Fine-tune and test on Matterport3D

5,315 real panoramas

Page 76: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Results

Input Observation

Page 77: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Results

Ceiling

BedWall

Floor

Prediction

Page 78: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Semantic View Extrapolation: Results

Prediction

Bed

Object

Window

Ground truth

Page 79: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction

Bed

Object

Window

Ground truth

Semantic View Extrapolation: Results

Page 80: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction

Bed

Object

Window

Ground truth

Semantic View Extrapolation: Results

Page 81: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction

Bed

Object

Window

Ground truth

Semantic View Extrapolation: Results

Page 82: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Prediction

Bed

Object

Window

Ground truth

Semantic View Extrapolation: Results

Page 83: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

0

0.055

0.11

0.165

0.22

Semantic Accuracy (IoU)

0

0.225

0.45

0.675

0.9

1.125

3D Structure Error (L2)

Ours

Semantic View Extrapolation: Results

Comparison to alternative completion methods

Nearest

Two-Step

Ours

Nearest Two-Step

Input

Image Inpainting Two Step Approach

Ours

Page 84: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Summary

Scene understanding from partial observation …

Bed

Door

Nightstand Nightstand

Bench

Wall

Wall Picture

Pillow

Structure

Free space

Output: complete, annotated 3D representationInput: RGB-D Image

Semantics

Page 85: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future work

Page 86: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Common Themes

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Page 87: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Common Themes

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Page 88: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Common Themes

Surface Normals Plane EquationsFlipped TSDF

Page 89: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Common Themes

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Page 90: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Common Themes

Dilated

Convolutions

Global Solution to

Linear System of Equations

Panoramic

Representations

Page 91: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Common Themes

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Page 92: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Common Themes

Geometric representation

• Choice of 3D representation is critical

• Choosing the most obvious representation is usually not best

Large-scale context

• Global context is very important … even for simply estimating depth

• Can leverage larger contexts with global minimization, dilated convolutions, etc.

3D Dataset curation

• Synthetic 3D datasets very useful for training

• Real 3D datasets are important for testing. More needed

Largest 3D datasets available today for indoor environments

Synthetic RGB-D Image RGB-D Video

Object ShapeNet Intel RealSense Redwood

Room SUNCG SUN RGB-D ScanNet

Multiroom SUNCG Matterport3D SUN3D

Page 93: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Talk Outline

Introduction

Three recent projects

• Deep depth completion [CVPR 2018]

• Semantic scene completion [CVPR 2017]

• Semantic view extrapolation [CVPR 2018]

Common themes

Future work

Page 94: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Future work

Large-scale scenes

Self-supervision

Active sensing

Page 95: 3D Scene Understanding from RGB-D Imagesfunk/bridges18.pdfIntel RealSense R200 examples: Talk Outline (Part 2) Introduction Three recent projects •Deep depth completion [CVPR 2018]

Acknowledgments

Princeton students and postdocs:• Angel X. Chang, Kyle Genova, Maciej Halber, Manolis Savva, Elena Sizikova,

Shuran Song, Fisher Yu, Yinda Zhang, Andy Zeng

Google collaborators:• Martin Bokeloh, Alireza Fathi, Sean Fanello, Aleksey Golovinskiy, Shahram Izadi, Sameh

Khamis, Adarsh Kowdle, Johnny Lee, Christoph Rhemann, Jurgen Sturm, Vladimir Tankovich,

Julien Valentin, Stefan Welker

Other collaborators:• Angela Dai, Vladlen Koltun, Matthias Niessner, Alberto Rodriquez, Silvio Savarese,

Yifei Shi, Jianxiong Xiao, Kai Xu

Data:• SUN3D, NYU, Trimble, Planner5D, Matterport

Funding:• NSF, Google, Intel, Facebook, Amazon, Adobe, Pixar

Thank You!