Post on 20-May-2020
Learning to generate 3D shapesSubhransu MajiCollege of Information and Computer SciencesUniversity of Massachusetts, Amhersthttp://people.cs.umass.edu/smaji
August 10, 2018 @ Caltech
ImagefromAutodesk3DMaya
Creating 3D shapes is not easy
3
Inferring 3D shapes from images
What shapes are puffins?
What shapes are pumpkinseed fish?
• Manytechniquesforrecognizing3Ddata,butrelativelyfewtechniquesforgeneratingthem
• Representationsforgeneration?• Voxels• Multiview• Geometryimages• Shapebasis• Set-based(points,triangles,etc.)• Procedural,e.g.,constructivesolidgeometry
Creating 3D shapes is not easy
Talk overview
• Generativemodelsfor3Dshapesandapplications• Multiview[3DV’17]• Multiresolutiontreenetworks[ECCV’18]• Constructivesolidgeometry[CVPR’18]
• Learning3Dshapeswithweaksupervision[3DV’17]
3D Shape Reconstruction from Sketches
via Multi-view Convolutional Networks
ZhaoliangLun MatheusGadelha
EvangelosKalogerakis SubhransuMaji
RuiWang
3DV 2017
Part 1
3D Shape Reconstruction from Sketches
via Multi-view Convolutional Networks
ZhaoliangLun MatheusGadelha
EvangelosKalogerakis SubhransuMaji
RuiWang
3DV 2017
Part 1
Why line drawings? Simple & intuitive
medium to convey shape!
ImagefromSuggestiveContourGallery,DeCarloetal.2003
Goal: 2D line drawings in, 3D shapes out!
ShapeMVDfrontview
sideview 3Dshape
Deep net architecture: U-net structure
Featurerepresentationsinthedecoderdependonpreviouslayer&encoder’scorrespondinglayer
U-net:Ronnebergeretal.2015,Isolaetal.2016
frontview
sideview
outputview1
outputview12
Penalizeper-pixeldepthreconstructionerror:&per-pixelnormalreconstructionerror:&“unreal”outputs:
Training: full loss
frontview
sideview
GeneratorNetwork
outputview1
outputview12
…frontview
sideview
outputview1
outputview12
log ( )P real−
DiscriminatorNetwork
Real?Fake?
Real?Fake?
DiscriminatorNetwork
cGAN:Isolaetal.2016
| |pred gtpixels
d d−∑(1 )pred gt
pixels
n n− ⋅∑
Training data
Character10Kmodels
Chair10Kmodels
Airplane3Kmodels
Modelsfrom“TheModelsResource”&3DWarehouse
…
Syntheticlinedrawings
Trainingdepthandnormalmaps
12views
Training data
Predict multi-view depth and normal maps!
outputview1
outputview12
frontview
sideview
Test time
• Depth derivatives should be
consistent with normals
• Corresponding depths and
normals across different
views should agree
Optimization problem
Multi-viewdepth&normalmaps
Consolidated pointcloud
outputview1
outputview12
Multi-view depth & normal map fusion
Multi-viewdepth&normalmaps
Consolidated pointcloud
Surfacereconstruction[Kazhdanetal.2013]
outputview1
outputview12
Surface reconstruction
Multi-viewdepth&normalmaps
Consolidated pointcloud
Surface“fine-tuning”
[Nealenetal.2005]
outputview1
outputview12Surface
reconstruction[Kazhdanetal.2013]
Surface deformation
Multi-viewdepth&normalmaps
Consolidated pointcloud
Surface“fine-tuning”[Nealenetal.2005]
outputview1
outputview12Surface
reconstruction[Kazhdanetal.2013]
Surface deformation
Experiments
Qualitative Results
reference shape
reference shape
nearestretrieval
nearestretrieval
our result
our result
volumetricnet
volumetricnet
reference shape
reference shape
nearestretrieval
nearestretrieval
our result
our result
volumetricnet
volumetricnet
Qualitative Results
Quantitative Results
Metric Ourmethod Volumetric NN
Hausdorffdistance 0.120 0.638 0.242Chamferdistance 0.023 0.052 0.045normaldistance 34.27 56.97 47.94
volumetricdistance 0.309 0.497 0.550
Character(humandrawing)
Hausdorffdistance 0.171 0.211 0.228Chamferdistance 0.028 0.032 0.038normaldistance 34.19 48.81 43.75
volumetricdistance 0.439 0.530 0.560
Man-made(humandrawing)
Single vs two input line drawings
Singlesketch
Twosketches
Resultingshape
Resultingshape
More results
Multiresolution Tree Networks for 3D
Point Cloud Processing
MatheusGadelhaSubhransuMaji
RuiWang
ECCV 18
Part 2
Multiresolution Tree Networks for 3D
Point Cloud Processing
MatheusGadelhaSubhransuMaji
RuiWang
ECCV 18
Part 2
Point-cloud decoders
• Globalshapebasisorfully-connecteddecoders
Morphable models[Figure from Booth et al., 16]
Requires perfect correspondence
How important is correspondence?
Gadelha et al., BMVC 17
Related work: Fan et al., CVPR 2017
How important is correspondence?
Multiresolution tree networks
• Addressesthelackof• convolutionalstructure,and• coarsetofinereasoning
• Basicidea:linearize3Dpointsanduse1Dconvolutions
Points colored with kdtree sort index
Multiresolution tree networks
• Addressesthelackof• convolutionalstructure,and• coarsetofinereasoning
• Basicidea:linearize3Dpointsanduse1Dconvolutions
Implicit mulitresolution structure
Multiresolution tree networks
Architecture for encoding and decoding
Multiresolution convolution block
Does multiresolution analysis help?
single-resolutionfully-connected multi-resolution
Color indicates the position in the list
Other shape tasks with MRTNet
Single image shape reconstruction
Quantitative evaluation: ShapeNet dataset
voxel-based fully-conn. multiview
Chamfer distance: pred g GT / GT g pred
Lin et al. AAAI 2018
Quantitative evaluation: ShapeNet dataset
voxel-based fully-conn. multiview
Chamfer distance: pred g GT / GT g pred
MRTNet summary
• Agenericarchitecturefor• Pointcloudclassification(91.7%onModelNet40)• Semanticsegmentation(seeresultsinthepaper)• Generation
• Projectpage:http://mgadelha.me/mrt/index.html
CSGNet: Neural Shape Parser for
Constructive Solid Geometry
GopalSharmaRishabhGoyal
DifanLiuEvangelosKalogerakis
SubhransuMaji
interpretable and editable
Part 3
CVPR 18
Constructive 2D geometry
Constructive solid geometry
Constructive solid geometry
Circle1Triangle1-Circle2-Triangle2-Triangle3-[STOP]
Postfix notation
CNN RNN Program
Learning
• Supervisedsetting:learntopredictprogramsdirectly
• Unsupervisedsetting:Noground-truthprograms.• Learnparameterstominimizeareconstructionerrorthroughpolicygradients[REINFORCE,Willams1992]
CNN RNN Program
reconstruction loss
CSGNet: 2D/3D a programs
CSGNet: 2D/3D a programs
Train on synthetic data and adapt to new domains using policy gradients
Logos
Synthetic CAD shapes
CSGNet: 2D/3D a programs
How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples
CSGNet: 2D/3D a programs
How well does the nearest neighbor perform? Chamfer distance 1.88 (NN), 1.36 (CSGNet) with 675K training examples
CSGNet: 2D/3D a programs
CAD shapes dataset: Chamfer distance 1.94 (NN), 0.51 (CSGNet)
CSGNet: 2D/3D a programs
Input
• Moreresultsinthepaper:rewardshaping,comparisontoFasterR-CNNforprimitivedetection,resultson3D,etc.
• Preprintavailable:https://arxiv.org/abs/1712.08290
Learning 3D Shape Representations
with Weak Supervision
MatheusGadelhaSubhransuMaji
RuiWang
3DV 2017
Part 4
Related Work
• 3Dshapefromcollectionofimages• Visualhull—sameinstance,knownviewpoints
• Photometricstereo—sameinstance,knownlighting,simplereflectance
• Structurefrommotion—sameinstance(or3D)
• Non-rigidstructurefrommotion—knownshapefamily(e.g.,faces)
• Ourwork—unknownshapefamily,unknownviewpoints
• 3Dshapefromsingleimage• Optimization-basedapproaches;
• Recognition-basedapproaches;
A motivating example
Small cubes are reddishBig cubes are bluish
Hypothesis: It is easier to generate these images by reasoning in 3D
Approach• Our goal is to learn a 3D shape generator whose
projections match the provided set of the views
Generator Projection
Approach• Our goal is to learn a 3D shape generator whose
projections match the provided set of the views
Generator Projection
How do we match distributions?
Approach• Our goal is to learn a 3D shape generator whose
projections match the provided set of the views
Generator Projection
How do we match distributions?
min
GDKL(G||D) = min
z⇠GElog
G(z)
D(z)
�generated true
Approach• Our goal is to learn a 3D shape generator whose
projections match the provided set of the views
Generator Projection
How do we match distributions?
min
GDKL(G||D) = min
z⇠GElog
G(z)
D(z)
�generated trueestimate using
logistic regression
Approach• Our goal is to learn a 3D shape generator whose
projections match the provided set of the views
Generator Projection
How do we match distributions?
min
GDKL(G||D) = min
z⇠GElog
G(z)
D(z)
�generated true
min
G
max
d
Ex⇠D
[log d(x)] + Ez⇠G
[log(1� d(z))]
Generative adversarial networks [Goodfellow et al.]
estimate using logistic regression
PrGAN
Projection using line integration along the view direction
I(x) = 1� exp
✓�Z 1
0V (x+ r)dr
◆
Generator maps z to a voxel occupancy grid and a viewpoint
Dataset generation
Airplanes
input generated
Airplanes
input generated
Vases
input generated
Vases
input generated
Mixed categories
input generated
MMD metric: 2D-GAN 90.1, PrGAN 88.3
MMD metric 3D-GAN 347.5PrGAN 442.9
Trained with 3D supervision
Projection GAN
• Themodelisabletorecoverthecoarse3Dstructure
• Butshouldusesideinformationwhenavailable• Viewpoint• Landmarks/partlabels• Poseestimates
• Iterative:bootstrap3Dtoestimatepose&viewpoint
Thank you!
• Collaborators:MatheusGadelha,ZhaoliangLun,GopalSharma,RuiWang,EvangelosKalogerakis
• FundingfromNSF,NVIDIA,Facebook• https://people.cs.umass.edu/smaji/projects.html