More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry...

72
More Generative Models Prof. Leal-Taixé and Prof. Niessner 1

Transcript of More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry...

Page 1: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

More Generative Models

Prof. Leal-Taixé and Prof. Niessner 1

Page 2: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Conditional GANs on Videos• Challenge:

– Each frame is high quality, but temporally inconsistent

Prof. Leal-Taixé and Prof. Niessner 2

Page 3: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Video-to-Video Synthesis• Sequential Generator:

• Conditional Image Discriminator 𝐷𝑖 (is it real image)

• Conditional Video Discriminator 𝐷𝑣 (temp. consistency via flow)

Prof. Leal-Taixé and Prof. Niessner 3Wang et al. 18: Vid2Vid

past L source framespast L generated frames

(set L = 2)

Full Learning Objective:

Page 4: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Video-to-Video Synthesis

Prof. Leal-Taixé and Prof. Niessner 4Wang et al. 18: Vid2Vid

Page 5: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Video-to-Video Synthesis• Key ideas:

– Separate discriminator for temporal parts• In this case based on optical flow

– Consider recent history of prev. frames

– Train all of it jointly

Prof. Leal-Taixé and Prof. Niessner 5Wang et al. 18: Vid2Vid

Page 6: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 7: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Similar to “Image-to-Image Translation” (Pix2Pix) [Isola et al.]

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 8: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 9: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Neural Network converts synthetic data to realistic video

Page 10: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 11: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 12: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 13: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits

Siggraph’18 [Kim et al 18]: Deep Portraits

Interactive Video Editing

Page 14: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Video Portraits: Insights

• Synthetic data for tracking is great anchor / stabilizer

• Overfitting on small datasets works pretty well

• Need to stay within training set w.r.t. motions

• No real learning; essentially, optimizing the problem with SGD-> should be pretty interesting for future directions

Siggraph’18 [Kim et al 18]: Deep Portraits

Page 15: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Everybody Dance Now

[Chan et al. ’18] Everybody Dance Now

Page 16: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Everybody Dance Now

[Chan et al. ’18] Everybody Dance Now

Page 17: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Everybody Dance Now

[Chan et al. ’18] Everybody Dance Now

Page 18: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Everybody Dance Now- cGANs work with

different input

- Requires consistent input i.e., accurate tracking

- Network has no explicit 3D notion

[Chan et al. ’18] Everybody Dance Now

Page 19: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Everybody Dance Now: Insights• Conditioning via tracking seems promising!

– Tracking quality translates to resulting image quality

– Tracking human skeletons is less developed than faces• Temporally it’s not stable… (e.g., OpenPose etc.)

– Fun fact, there were like 4 papers with a similar same idea that appeared around the same time…

[Chan et al. ’18] Everybody Dance Now

Page 20: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Videos still challenging for cGANs…

Page 21: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

[Sitzmann et al. CVPR’19] Deep Voxels

Page 22: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels• Main idea for video generation:

– Why learn 3D operations with 2D Convs !?!?

– We know how 3D transformations work• E.g., 6 DoF rigid pose [ R | t ]

– Incorporate these into the architectures• Need to be differentiable!

– Example application: novel view point synthesis• Given rigid pose, generate image for that view

[Sitzmann et al. CVPR’19] Deep Voxels

Page 23: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

2D U-Net

Rendering

Lifting Layer2D 3D

2D U-Net

2D FeatureExtraction

Source View R, t

Projection Layer

3D 2D

OutputSourceTarget

View R, t3D U-Net

3D Features

Simplified overview for novel view synthesis

[Sitzmann et al. CVPR’19] Deep Voxels

Page 24: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

[Sitzmann et al. CVPR’19] Deep Voxels

Page 25: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

Issue: we don’t know the depth for the target!-> Per-pixel softmax along the ray-> Network learns the depth

Occlusion Network:

[Sitzmann et al. CVPR’19] Deep Voxels

Page 26: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

[Sitzmann et al. ’18] Deep Voxels

Page 27: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels

[Sitzmann et al. ’18] Deep Voxels

Page 28: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deep Voxels: Insights• Lifting from 2D to 3D works great

– No need to take specific care for temp. coherency!

• All 3D operations are differentiable

• Currently, only for novel view-point synthesis– I.e., cGAN for new pose in a given scene

• But: limited resolution due to dense 3D voxel grid

[Sitzmann et al. ’18] Deep Voxels

Page 29: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Neural Textures: Features on 3D Mesh

Page 30: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Siggraph’19 [Thies et al.]: Neural Textures

3D Geometry

Neural Texture

Neural Textures: Features on 3D Mesh

Page 31: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

UV-Map

Sampled Texture

Siggraph’19 [Thies et al.]: Neural Textures

3D Geometry

Rendering

3D 2D

View R, t

Neural Texture

Neural Textures: Features on 3D Mesh

Page 32: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

UV-Map

Renderer Output Image

Sampled Texture

Siggraph’19 [Thies et al.]: Neural Textures

3D Geometry

Rendering

3D 2D

View R, t

Neural Texture

Neural Textures: Features on 3D Mesh

Page 33: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deferred Neural RenderingA

lbe

do

De

pth

No

rm

al

Lig

hti

ng

Deferred Renderer

Handcrafted ”Feature Maps“

Siggraph’19 [Thies et al.]: Neural Textures

Page 34: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deferred Neural RenderingA

lbe

do

De

pth

No

rm

al

Lig

hti

ng

Deferred Renderer

Handcrafted ”Feature Maps“Learned

Neural

Siggraph’19 [Thies et al.]: Neural Textures

Page 35: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

UV-Map

Renderer Output Image

Sampled Texture

3D Geometry

Rendering

3D 2D

View R, t

Neural Texture

Siggraph’19 [Thies et al.]: Neural Textures

Deferred Neural Rendering

Page 36: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Neural Textures: Features on 3D Mesh

Siggraph’19 [Thies et al.]: Neural Textures

Page 37: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Input UV-Map Ours

Novel View-Point Synthesis

Siggraph’19 [Thies et al.]: Neural Textures

Page 38: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Ground Truth Ours

Novel View-Point Synthesis

Siggraph’19 [Thies et al.]: Neural Textures

Page 39: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Ge

om

etr

yE

dit

ing

Inp

ut

Se

qu

en

ce

Scene Editing

Siggraph’19 [Thies et al.]: Neural Textures

Page 40: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Scene Editing

Siggraph’19 [Thies et al.]: Neural Textures

Page 41: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Scene Editing

Siggraph’19 [Thies et al.]: Neural Textures

Page 42: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Facial Animation

Siggraph’19 [Thies et al.]: Neural Textures

Page 43: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Facial Animation

Siggraph’19 [Thies et al.]: Neural Textures

Page 44: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Facial Animation

Siggraph’19 [Thies et al.]: Neural Textures

Page 45: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Facial Animation

Siggraph’19 [Thies et al.]: Neural Textures

Page 46: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Facial Animation

Siggraph’19 [Thies et al.]: Neural Textures

Page 47: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deferred Neural Rendering

Siggraph’19 [Thies et al.]: Neural Textures

Page 48: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Deferred Neural Rendering

Siggraph’19 [Thies et al.]: Neural Textures

Page 49: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Big Open Challenges

Page 50: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Big Open Challenges

Photo-realistic Reconstruction

Page 51: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Big Open Challenges: How much can AI do?

Siggraph’19 [Thies et al.]: Neural Textures

Page 52: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Big Open Challenges: 3D in NetworksWhy learn 3D operations, such as transformations?

-> differentiate known operators

Capsule networks are motivated by inverse graphics [Sabour et al. 17]

Page 53: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Prof. Leal-Taixé and Prof. Niessner 54

Page 54: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Autoregressive Models

Prof. Leal-Taixé and Prof. Niessner 55

Page 55: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Autoregressive Models vs GANs

• GANs learn implicit data distribution– i.e., output are samples (distribution is in model)

• Autoregressive models learn an explicit distribution governed by a prior imposed by model structure– i.e., outputs are probabilities (e.g., softmax)

Prof. Leal-Taixé and Prof. Niessner 56

Page 56: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN• Goal: model distribution of natural images• Interpret pixels of an image as product of conditional

distributions– Modeling an image → sequence problem– Predict one pixel at a time– Next pixel determined by all previously predicted pixels Use a Recurrent Neural Network

Prof. Leal-Taixé and Prof. Niessner 57[Van den Oord et al 2016]

Page 57: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN

Prof. Leal-Taixé and Prof. Niessner 58[Van den Oord et al 2016]

For RGB

Page 58: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN

Prof. Leal-Taixé and Prof. Niessner 59

𝑥𝑖 ∈ 0,255→ 256-way softmax

[Van den Oord et al 2016]

Page 59: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN• Row LSTM model architecture• Image processed row by row• Hidden state of pixel depends

on the 3 pixels above it– Can compute pixels in row in

parallel

• Incomplete context for each pixel

Prof. Leal-Taixé and Prof. Niessner 60[Van den Oord et al 2016]

Page 60: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN• Diagonal BiLSTM model

architecture• Solve incomplete context

problem• Hidden state of pixel

𝑝𝑖,𝑗depends on 𝑝𝑖,𝑗−1 and 𝑝𝑖−1,𝑗

• Image processed by diagonals

Prof. Leal-Taixé and Prof. Niessner 61[Van den Oord et al 2016]

Page 61: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN• Masked Convolutions• Only previously predicted

values can be used as context

• Mask A: restrict context during 1st conv

• Mask B: subsequent convs• Masking by zeroing out

values

Prof. Leal-Taixé and Prof. Niessner 62[Van den Oord et al 2016]

Page 62: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelRNN• Generated

64x64 images, trained on ImageNet

Prof. Leal-Taixé and Prof. Niessner 63[Van den Oord et al 2016]

Page 63: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelCNN• Row and Diagonal LSTM layers have potentially

unbounded dependency range within the receptive field– Can be very computationally costly

PixelCNN: – standard convs capture a bounded receptive field– All pixel features can be computed at once (during

training)

Prof. Leal-Taixé and Prof. Niessner 64[Van den Oord et al 2016]

Page 64: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelCNN• Model preserves spatial

dimensions• Masked convolutions to avoid

seeing future context

Prof. Leal-Taixé and Prof. Niessner 65[Van den Oord et al 2016]

http://sergeiturukin.com/2017/02/22/pixelcnn.html

Mask A

Page 65: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Gated PixelCNN• Gated blocks• Imitate multiplicative complexity of PixelRNNs to

reduce performance gap between PixelCNN and PixelRNN

• Replace ReLU with gated block of sigmoid, tanh

Prof. Leal-Taixé and Prof. Niessner 66[Van den Oord et al 2016]

𝑦 = tanh 𝑊𝑘,𝑓 ∗ 𝑥 ⊙ 𝜎(𝑊𝑘,𝑔 ∗ 𝑥)

kth layer sigmoid

element-wise product convolution

Page 66: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

PixelCNN Blind Spot

Prof. Leal-Taixé and Prof. Niessner 67[Van den Oord et al 2016]

http://sergeiturukin.com/2017/02/24/gated-pixelcnn.html

5x5 image / 3x3 conv Receptive Field Unseen context

Page 67: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

• Split convolution to two stacks• Horizontal stack conditions on

current row• Vertical stack conditions on pixels

above

PixelCNN: Eliminating Blind Spot

Prof. Leal-Taixé and Prof. Niessner 68[Van den Oord et al 2016]

Page 68: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Conditional PixelCNN• Conditional image generation• E.g., condition on semantic class, text description

Prof. Leal-Taixé and Prof. Niessner 69[Van den Oord et al 2016]

𝑦 = tanh 𝑊𝑘,𝑓 ∗ 𝑥 + 𝑉𝑘,𝑓𝑇 ℎ ⊙ 𝜎(𝑊𝑘,𝑔 ∗ 𝑥 + 𝑉𝑘,𝑔

𝑇 ℎ)

latent vector to be conditioned on

Page 69: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Conditional PixelCNN

Prof. Leal-Taixé and Prof. Niessner 70[Van den Oord et al 2016]

Page 70: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Autoregressive Models vs GANs• Advantages of autoregressive:

– Explicitly model probability densities– More stable training– Can be applied to both discrete and continuous data

• Advantages of GANs:– Have been empirically demonstrated to produce higher

quality images– Faster to train

Prof. Leal-Taixé and Prof. Niessner 71

Page 71: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

Autoregressive Models• State of the art is pretty impressive

Prof. Leal-Taixé and Prof. Niessner 72

Generating Diverse High-Fidelity Images with VQ-VAE-2https://arxiv.org/pdf/1906.00446.pdf [Razavi et al. 19]

Vector Quantized Variational AutoEncoder

Page 72: More Generative Models - dvl.in.tum.de · Siggraph’9 [Thies et al.]: Neural Textures 3D Geometry Rendering 3D 2D View R, t Neural Texture Neural Textures: Features on 3D Mesh ...

See you next week

Prof. Leal-Taixé and Prof. Niessner 73