Learning to generate 3D shapes - UMass Amherstsmaji/presentations/3dShape... · 2019-12-21 ·...

Post on 20-May-2020

3 views 0 download

Transcript of Learning to generate 3D shapes - UMass Amherstsmaji/presentations/3dShape... · 2019-12-21 ·...

Learning to generate 3D shapesSubhransu MajiCollege of Information and Computer SciencesUniversity of Massachusetts, Amhersthttp://people.cs.umass.edu/smaji

August 10, 2018 @ Caltech

ImagefromAutodesk3DMaya

Creating 3D shapes is not easy

3

Inferring 3D shapes from images

What shapes are puffins?

What shapes are pumpkinseed fish?

• Manytechniquesforrecognizing3Ddata,butrelativelyfewtechniquesforgeneratingthem

• Representationsforgeneration?• Voxels• Multiview• Geometryimages• Shapebasis• Set-based(points,triangles,etc.)• Procedural,e.g.,constructivesolidgeometry

Creating 3D shapes is not easy

Talk overview

• Generativemodelsfor3Dshapesandapplications• Multiview[3DV’17]• Multiresolutiontreenetworks[ECCV’18]• Constructivesolidgeometry[CVPR’18]

• Learning3Dshapeswithweaksupervision[3DV’17]

3D Shape Reconstruction from Sketches

via Multi-view Convolutional Networks

ZhaoliangLun MatheusGadelha

EvangelosKalogerakis SubhransuMaji

RuiWang

3DV 2017

Part 1

3D Shape Reconstruction from Sketches

via Multi-view Convolutional Networks

ZhaoliangLun MatheusGadelha

EvangelosKalogerakis SubhransuMaji

RuiWang

3DV 2017

Part 1

Why line drawings? Simple & intuitive

medium to convey shape!

ImagefromSuggestiveContourGallery,DeCarloetal.2003

Goal: 2D line drawings in, 3D shapes out!

ShapeMVDfrontview

sideview 3Dshape

Deep net architecture: U-net structure

Featurerepresentationsinthedecoderdependonpreviouslayer&encoder’scorrespondinglayer

U-net:Ronnebergeretal.2015,Isolaetal.2016

frontview

sideview

outputview1

outputview12

Penalizeper-pixeldepthreconstructionerror:&per-pixelnormalreconstructionerror:&“unreal”outputs:

Training: full loss

frontview

sideview

GeneratorNetwork

outputview1

outputview12

…frontview

sideview

outputview1

outputview12

log ( )P real−

DiscriminatorNetwork

Real?Fake?

Real?Fake?

DiscriminatorNetwork

cGAN:Isolaetal.2016

| |pred gtpixels

d d−∑(1 )pred gt

pixels

n n− ⋅∑

Training data

Character10Kmodels

Chair10Kmodels

Airplane3Kmodels

Modelsfrom“TheModelsResource”&3DWarehouse

Syntheticlinedrawings

Trainingdepthandnormalmaps

12views

Training data

Predict multi-view depth and normal maps!

outputview1

outputview12

frontview

sideview

Test time

• Depth derivatives should be

consistent with normals

• Corresponding depths and

normals across different

views should agree

Optimization problem

Multi-viewdepth&normalmaps

Consolidated pointcloud

outputview1

outputview12

Multi-view depth & normal map fusion

Multi-viewdepth&normalmaps

Consolidated pointcloud

Surfacereconstruction[Kazhdanetal.2013]

outputview1

outputview12

Surface reconstruction

Multi-viewdepth&normalmaps

Consolidated pointcloud

Surface“fine-tuning”

[Nealenetal.2005]

outputview1

outputview12Surface

reconstruction[Kazhdanetal.2013]

Surface deformation

Multi-viewdepth&normalmaps

Consolidated pointcloud

Surface“fine-tuning”[Nealenetal.2005]

outputview1

outputview12Surface

reconstruction[Kazhdanetal.2013]

Surface deformation

Experiments

Qualitative Results

reference shape

reference shape

nearestretrieval

nearestretrieval

our result

our result

volumetricnet

volumetricnet

reference shape

reference shape

nearestretrieval

nearestretrieval

our result

our result

volumetricnet

volumetricnet

Qualitative Results

Quantitative Results

Metric Ourmethod Volumetric NN

Hausdorffdistance 0.120 0.638 0.242Chamferdistance 0.023 0.052 0.045normaldistance 34.27 56.97 47.94

volumetricdistance 0.309 0.497 0.550

Character(humandrawing)

Hausdorffdistance 0.171 0.211 0.228Chamferdistance 0.028 0.032 0.038normaldistance 34.19 48.81 43.75

volumetricdistance 0.439 0.530 0.560

Man-made(humandrawing)

Single vs two input line drawings

Singlesketch

Twosketches

Resultingshape

Resultingshape

More results

Multiresolution Tree Networks for 3D

Point Cloud Processing

MatheusGadelhaSubhransuMaji

RuiWang

ECCV 18

Part 2

Multiresolution Tree Networks for 3D

Point Cloud Processing

MatheusGadelhaSubhransuMaji

RuiWang

ECCV 18

Part 2

Point-cloud decoders

• Globalshapebasisorfully-connecteddecoders

Morphable models[Figure from Booth et al., 16]

Requires perfect correspondence

How important is correspondence?

Gadelha et al., BMVC 17

Related work: Fan et al., CVPR 2017

How important is correspondence?

Multiresolution tree networks

• Addressesthelackof• convolutionalstructure,and• coarsetofinereasoning

• Basicidea:linearize3Dpointsanduse1Dconvolutions

Points colored with kdtree sort index

Multiresolution tree networks

• Addressesthelackof• convolutionalstructure,and• coarsetofinereasoning

• Basicidea:linearize3Dpointsanduse1Dconvolutions

Implicit mulitresolution structure

Multiresolution tree networks

Architecture for encoding and decoding

Multiresolution convolution block

Does multiresolution analysis help?

single-resolutionfully-connected multi-resolution

Color indicates the position in the list

Other shape tasks with MRTNet

Single image shape reconstruction

Quantitative evaluation: ShapeNet dataset

voxel-based fully-conn. multiview

Chamfer distance: pred g GT / GT g pred

Lin et al. AAAI 2018

Quantitative evaluation: ShapeNet dataset

voxel-based fully-conn. multiview

Chamfer distance: pred g GT / GT g pred

MRTNet summary

• Agenericarchitecturefor• Pointcloudclassification(91.7%onModelNet40)• Semanticsegmentation(seeresultsinthepaper)• Generation

• Projectpage:http://mgadelha.me/mrt/index.html

CSGNet: Neural Shape Parser for

Constructive Solid Geometry

GopalSharmaRishabhGoyal

DifanLiuEvangelosKalogerakis

SubhransuMaji

interpretable and editable

Part 3

CVPR 18

Constructive 2D geometry

Constructive solid geometry

Constructive solid geometry

Circle1Triangle1-Circle2-Triangle2-Triangle3-[STOP]

Postfix notation

CNN RNN Program

Learning

• Supervisedsetting:learntopredictprogramsdirectly

• Unsupervisedsetting:Noground-truthprograms.• Learnparameterstominimizeareconstructionerrorthroughpolicygradients[REINFORCE,Willams1992]

CNN RNN Program

reconstruction loss

CSGNet: 2D/3D a programs

CSGNet: 2D/3D a programs

Train on synthetic data and adapt to new domains using policy gradients

Logos

Synthetic CAD shapes

CSGNet: 2D/3D a programs

How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples

CSGNet: 2D/3D a programs

How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples

CSGNet: 2D/3D a programs

CAD shapes dataset: Chamfer distance 1.94 (NN), 0.51 (CSGNet)

CSGNet: 2D/3D a programs

Input

• Moreresultsinthepaper:rewardshaping,comparisontoFasterR-CNNforprimitivedetection,resultson3D,etc.

• Preprintavailable:https://arxiv.org/abs/1712.08290

Learning 3D Shape Representations

with Weak Supervision

MatheusGadelhaSubhransuMaji

RuiWang

3DV 2017

Part 4

Related Work

• 3Dshapefromcollectionofimages• Visualhull—sameinstance,knownviewpoints

• Photometricstereo—sameinstance,knownlighting,simplereflectance

• Structurefrommotion—sameinstance(or3D)

• Non-rigidstructurefrommotion—knownshapefamily(e.g.,faces)

• Ourwork—unknownshapefamily,unknownviewpoints

• 3Dshapefromsingleimage• Optimization-basedapproaches;

• Recognition-basedapproaches;

A motivating example

Small cubes are reddishBig cubes are bluish

Hypothesis: It is easier to generate these images by reasoning in 3D

Approach• Our goal is to learn a 3D shape generator whose

projections match the provided set of the views

Generator Projection

Approach• Our goal is to learn a 3D shape generator whose

projections match the provided set of the views

Generator Projection

How do we match distributions?

Approach• Our goal is to learn a 3D shape generator whose

projections match the provided set of the views

Generator Projection

How do we match distributions?

min

GDKL(G||D) = min

z⇠GElog

G(z)

D(z)

�generated true

Approach• Our goal is to learn a 3D shape generator whose

projections match the provided set of the views

Generator Projection

How do we match distributions?

min

GDKL(G||D) = min

z⇠GElog

G(z)

D(z)

�generated trueestimate using

logistic regression

Approach• Our goal is to learn a 3D shape generator whose

projections match the provided set of the views

Generator Projection

How do we match distributions?

min

GDKL(G||D) = min

z⇠GElog

G(z)

D(z)

�generated true

min

G

max

d

Ex⇠D

[log d(x)] + Ez⇠G

[log(1� d(z))]

Generative adversarial networks [Goodfellow et al.]

estimate using logistic regression

PrGAN

Projection using line integration along the view direction

I(x) = 1� exp

✓�Z 1

0V (x+ r)dr

Generator maps z to a voxel occupancy grid and a viewpoint

Dataset generation

Airplanes

input generated

Airplanes

input generated

Vases

input generated

Vases

input generated

Mixed categories

input generated

MMD metric: 2D-GAN 90.1, PrGAN 88.3

MMD metric 3D-GAN 347.5PrGAN 442.9

Trained with 3D supervision

Projection GAN

• Themodelisabletorecoverthecoarse3Dstructure

• Butshouldusesideinformationwhenavailable• Viewpoint• Landmarks/partlabels• Poseestimates

• Iterative:bootstrap3Dtoestimatepose&viewpoint

Thank you!

• Collaborators:MatheusGadelha,ZhaoliangLun,GopalSharma,RuiWang,EvangelosKalogerakis

• FundingfromNSF,NVIDIA,Facebook• https://people.cs.umass.edu/smaji/projects.html