Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer...

41
Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING

Transcript of Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer...

Page 1: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

Tuesday, 9 May 2017

Andrew Edelsten - NVIDIA Developer Technologies

ZOOM, ENHANCE, SYNTHESIZE! MAGIC UPSCALING AND MATERIAL SYNTHESIS USING DEEP LEARNING

Page 2: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

2

DEEP LEARNING FOR ARTActive R&D but ready now

▪ Style transfer

▪ Generative networks creating images and voxels

▪ Adversarial networks (DCGAN) – still early but promising

▪ DL & ML based tools from NVIDIA and partners

▪ NVIDIA

▪ Artomatix

▪ Allegorithmic

▪ Autodesk

Page 3: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

3

STYLE TRANSFERSomething Fun

Content Style▪ Doodle a masterpiece!

▪ Uses CNN to take the “style” from one image and apply it to another

▪ Sept 2015: A Neural Algorithm of Artistic Style by Gatys et al

▪ Dec 2015: neural-style (github)

▪ Mar 2016: neural-doodle (github)

▪ Mar 2016: texture-nets (github)

▪ Oct 2016: fast-neural-style (github)

▪ 2 May 2017 (last week!): Deep Image Analogy (arXiv)

▪ Also numerous services: Vinci, Prisma, Artisto, Ostagram

Page 4: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

4HTTP://OSTAGRAM.RU/STATIC_PAGES/LENTA

Page 5: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

5

STYLE TRANSFER

▪ Game remaster & texture enhancement

▪ Try Neural Style and use a real-world photo for the “style”

▪ For stylized or anime up-rez try https://github.com/nagadomi/waifu2x

▪ Experiment with art styles

▪ Dream or power-up sequences

▪ “Come Swim” by Kirsten Stewart - https://arxiv.org/pdf/1701.04928v1.pdf

Something Useful

Page 6: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

6

GAMEWORKS: MATERIALS & TEXTURESUsing DL for Game Development & Content Creation

▪ Set of tools targeting the game industry using machine learning and deep learning

▪ Launched at Game Developer Conference in March, tools run as a web service

▪ Sign up for the Beta at: https://gwmt.nvidia.com

▪ Tools in this initial release:

▪ Photo to Material: 2shot

▪ Texture Multiplier

▪ Super-Resolution

Page 7: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

7

PHOTO TO MATERIAL

▪ From two photos of a surface, generate a “material”

▪ Based on a SIGGRAPH 2015 paper by NVIDIA Research & Aalto University (Finland)

▪ “Two-Shot SVBRDF Capture for Stationary Materials”

▪ https://mediatech.aalto.fi/publications/graphics/TwoShotSVBRDF/

▪ Input is pixel aligned “flash” and “guide” photographs

▪ Use tripod and remote shutter or bracket

▪ Or align later

▪ Use for flat surfaces with repeating patterns

The 2Shot Tool

Page 8: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

8

MATERIAL SYNTHESIS FROM TWO PHOTOS

Flash image Guide image

Diffuse

albedoSpecular Normals Glossiness Anisotropy

Page 9: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

9

TEXTURE MULTIPLIER

▪ Put simply: texture in, new texture out

▪ Inspired by Gatys, Ecker & Bethge

▪ Texture Synthesis Using Convolutional Neural Networks

▪ https://arxiv.org/pdf/1505.07376.pdf

▪ Artomatix

▪ Similar product “Texture Mutation”

▪ https://artomatix.com/

Organic variations of textures

Page 10: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

10

SUPER RESOLUTION

Page 11: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

11

SUPER RESOLUTIONZoom.. ENHANCE!

Zoom in on the

license plate

OK!Sure!

Can you

enhance that?

Page 12: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

12

SUPER RESOLUTIONThe task at hand

Upscale

(magic?)

W

H

Given alow-resolution image

n * W

n * H

Construct ahigh-resolution image

Page 13: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

13

UPSCALE: CREATE MORE PIXELSAn ill-posed task?

Pixels of the upscaled image

Pixels of the given image? ? ?

? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

? ? ?

? ? ? ? ? ?

Page 14: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

14

TRADITIONAL APPROACH▪ Interpolation (bicubic, lanczos, etc.)

▪ Interpolation + Sharpening (and other filtration)

Filter-based sharpeningInterpolation

▪ Rough estimation of the data behavior too general

▪ Too many possibilities (8x8 grayscale has 256(8∗8) ≈ 10153 pixel combinations!)

Page 15: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

15

A NEW APPROACHFirst: narrow the possible set

Photos

Textures

All possible imagesFocus on the domain of “natural images”

Natural images

Page 16: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

16

A NEW APPROACH

Data from natural images is sparse, it’s compressible in some domain

Then “reconstruct” images (rather than create new ones)

Second: Place image in the domain, then reconstruct

+prior information

+constraints

ReconstructCompress

Page 17: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

17

PATCH-BASED MAPPING: TRAINING

Model

params

Mapping

Training images

,

LR,HR pairs of patches

training

Low-resolution patch High-resolution patch

Page 18: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

18

PATCH-BASED MAPPING

LR patch

HR patch

Encode Decode

𝒙𝑳

𝒙𝑯

High-level information about the patch

Page 19: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

19

PATCH-BASED MAPPING: SPARSE CODING

LR patch

HR patch

Sparse code

Encode Decode

𝒙𝑳

𝒙𝑯

High-level information about the patch“Features”

Page 20: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

20

PATCH FEATURES & RECONSTRUCTION

𝒙 = 𝑫𝒛 = 𝒅𝟏𝒛𝟏 +⋯+ 𝒅𝑲𝒛𝑲

= 0.8 * + 0.3 * + 0.5 *

𝑫

𝒅𝟑𝟔 𝒅𝟒𝟐 𝒅𝟔𝟑𝒙

Image patch can be reconstructed as a sparse linear combination of features

Features are learned from the dataset over time

𝒛

𝒙

𝑫 - dictionary

- patch

- sparse code

Page 21: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

21

GENERALIZED PATCH-BASED MAPPING

MappingMapping

LR patch

HR patchHigh-level

representation of the LR patch

“Features”

High-level representation of

the HR patch

Mapping in feature space

Page 22: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

22

GENERALIZED PATCH-BASED MAPPING

Mapping in feature space

MappingMapping

LR patch

HR patch

Trainable parameters

𝑊1 𝑊2 𝑊3

Page 23: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

23

MAPPING OF THE WHOLE IMAGEUsing Convolutions

LR image

HR image

Mapping in feature space

MappingMapping

Convolutional operators

Page 24: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

24

AUTO-ENCODERS

input output ≈ input

Page 25: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

25

AUTO-ENCODER

input

features

Encode

output ≈ input

Decode

Page 26: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

26

AUTO-ENCODER

𝑥 𝑦

Parameters

𝑊

Training

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )

𝑦 = 𝐹𝑊(𝑥)

Inference

𝑥𝑖 - training set

Page 27: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

27

AUTO-ENCODER

input

Encode

information loss

▪ Our encoder is LOSSY by definition

Page 28: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

28

SUPER-RESOLUTION AUTO-ENCODER

Training

𝑦 = 𝐹𝑊(𝑥)

Inference

𝑥𝑖 - training set

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝑥𝑖 )

𝑥 𝑦

Parameters

𝑊

Page 29: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

29

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝑖

𝐷𝑖𝑠𝑡(𝑥𝑖 , 𝐹𝑊 𝐷(𝑥𝑖) )

SUPER RESOLUTION AE: TRAINING

y

𝑥𝑖 - training set

Ground-truth HR image

Downscaling

LR image

SR AE

Reconstructed HR image

𝑥

𝐹W

𝐷

ො𝑥

𝑊

Page 30: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

30

SUPER RESOLUTION AE: INFERENCE

Given LR image

Constructed HR image

y

ො𝑥

𝑦 = 𝐹𝑊(ො𝑥)

SR AE

𝐹W

𝑊

Page 31: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

31

SUPER-RESOLUTION: ILL-POSED TASK?

Page 32: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

32

THE LOSS FUNCTION

Page 33: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

33

THE LOSS FUNCTION

Distance function is a key element to obtaining good results.

Measuring the “distance” from a good result

𝑊 = 𝑎𝑟𝑔𝑚𝑖𝑛

𝑖

𝐷 𝑥𝑖 , 𝐹𝑊(𝑥𝑖 )

Choice of the loss function is an important decision

Page 34: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

34

LOSS FUNCTION

1

𝑁𝑥 − 𝐹 𝑥 2

MSEMean Squared Error

Page 35: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

35

LOSS FUNCTION: PSNR

1

𝑁𝑥 − 𝐹 𝑥 2

MSEMean Squared Error

PSNR Peak Signal-to-Noise Ratio

10 ∗ 𝑙𝑜𝑔10𝑀𝐴𝑋2

𝑀𝑆𝐸

Page 36: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

36

LOSS FUNCTION: HFEN

1

𝑁𝑥 − 𝐹 𝑥 2

MSEMean Squared Error

PSNR Peak Signal-to-Noise Ratio

10 ∗ 𝑙𝑜𝑔10𝑀𝐴𝑋2

𝑀𝑆𝐸

𝐻𝑃(𝑥 − 𝐹 𝑥 ) 2

HFEN(see A)

High Frequency Error Norm High-Pass filter

Perceptual loss

Ref A: http://ieeexplore.ieee.org/document/5617283/

Page 37: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

37

REGULAR LOSS

Result 4x Result 4x

Page 38: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

38

REGULAR LOSS + PERCEPTUAL LOSS

Result 4x Result 4x

Page 39: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

39

WARNING… THIS IS EXPERIMENTAL!

Page 40: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

40

SUPER-RESOLUTION: GAN-BASED LOSS

Total loss = Regular (MSE+PSNR+HFEN) loss + GAN loss

Generator Discriminator

𝑥𝐹(𝑥)

𝐷(𝑦)𝑦

= −𝑙𝑛𝐷(𝐹 𝑥 )GAN loss

real

fake

Page 41: Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Tuesday, 9 May 2017 Andrew Edelsten - NVIDIA Developer Technologies

Extended presentation from Game Developer Conference 2017

https://developer.nvidia.com/deep-learning-games

GameWorks: Materials & Textures

https://gwmt.nvidia.com

QUESTIONS?