Generative Adversarial Networkcse.iitkgp.ac.in/~sudeshna/courses/DL17/GAN-6-april-17.pdfGenerative...

Generative Adversarial Network

Many slides from NIPS 2014

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,

Yoshua Bengio

• New method of training deep generative models

• Idea: pit a generator and a discriminator against each other

• Generator tries to draw samples from P(X)

• Discriminator tries to tell if sample came from the generator or the real world

• Both discriminator and generator are deep networks (differentiable functions)

• Can train with backprop: train discriminator for a while, then train generator, then discriminator, …

Generative adversarial networks

2

Generative?

3

• Data:

• Discriminative model: 𝑝𝐷 𝑥; 𝜃𝐷

• Generative model: 𝑝𝐺 𝑥; 𝜃𝐺

• True data distribution: 𝑝𝑑𝑎𝑡𝑎(𝑥)

• Train 𝑝𝐺 𝑥 ≈ 𝑝𝑑𝑎𝑡𝑎(𝑥) 𝑝𝑑𝑎𝑡𝑎(𝑥)

…


4

IT’S FAKE MONEY!

IT’S REAL MONEY!

• Counterfeiters vs Police Game


6

…

…

…

…

…

…

…

Random noise z

…

Discriminative Model D Generative Model G

p(z)

Sample x


7

…

…

…

…

…

…

……p(z)

…

…

…

Discriminative Model D: tries to distinguish between samples from real data p(x) and generated ones q(x).

Generative Model G

Try to classify the sample x• D(x)=0 when x from Data• D(x)=1 when x from G

1 0

random noise z

Try to generate sample x• As similar as the real data

sample xfrom data

sample xfrom G

Differentiable function represented by a multilayer perceptron with parameters

• To learn the G’s distribution 𝑝𝑔 over data 𝑥,

• we define a prior on input noise variables 𝑝𝑧(𝑧)

• Represent a mapping to data space as 𝐺(𝑧; 𝜃𝑔) whre G is a differ

entiable function (MLP)

• A second multilayer perceptron 𝐷(𝑥; 𝜃𝑑) that outputs a single scalar.

• D(x) represents the probability that x came from the data rather than 𝑝𝑔

• We train D to maximize the probability of assigning the correct label to both training example and samples from G.

• We simultaneously train G to minimize log 1 − 𝐷(𝐺 𝑧 )

• D and G play the following 2-player minimax game with value function V(G,D):

12


13

min𝐺

max𝐷

𝑉 𝐷, 𝐺

𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)

log(1 − 𝐷(𝐺(𝑧)))

…

…

…

…

…

…

……p(z)

…

…

…

Discriminative Model D

Generative Model G

1 0

random noise z


14

min𝐺

max𝐷

𝑉 𝐷, 𝐺


log(1 − 𝐷(𝐺(𝑧)))

…

…

…

…

…

…

……

…

…

…


Generative Model G

1 0

random noise zp(z)

• Fixed G, maximize V:

max𝐷

𝑉𝐺 𝐷 =max𝐷

𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)

log(1 − 𝐷(𝐺(𝑧)))

• From sample 𝑥(𝑖), 𝑧(𝑖)

max𝐷

𝑖=1

𝑚

log𝐷 𝑥 𝑖 + log 1 − 𝐷 𝐺 𝑧 𝑖

• Binary Classification (logistic loss):• Sample from data: label=1• Sample from generator: label = 0

Stochastic Gradient


15

min𝐺

max𝐷

𝑉 𝐷, 𝐺


log(1 − 𝐷(𝐺(𝑧)))

…

…

…

…

…

…

……

…

…

…


Generative Model G

1 0

random noise zp(z)

• Fixed D, minimize V(G):

min𝐺

𝑉𝐷 𝐺 = min 𝔼𝑧~𝑝𝑧(𝑧)log(1 − 𝐷(𝐺(𝑧)))

Try to make 𝐷(𝐺(𝑧)) = 1

Stochastic Gradient


16

Update D

Update G

GAN Results

17

Generated Samples

Nearest training example

18

• Alec Radford, Luke Metz and Soumith Chintala

Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Network, ICLR 2016.

Slide credit: Sangdoo Yun @PIL (pil.snu.ac.kr)

DCGAN

19

• Deep Convolutional Network + GAN

• Tricks for stable training

• Experimental Analysis

…

…

…

…

…

…

……p(z)

…

…

…


Generative Model G

1 0

random noise z

DCGAN

20

• Replace model’s network to CNN

• Example of generator G

(same as D)

…

…

…

…

…

…

……p(z)

…

…

…


Generative Model G

1 0

random noise z

z: uniform dist.

Experiments

21

• LSUN – bedroom dataset• 3 million training examples

Epoch #1 Epoch #5

Experiments

22

𝑧𝑖 𝑧𝑗

Interpolation form 𝑧𝑖 to 𝑧𝑗

Input noise z

Experiments

23

• Which activations(feature map) in CNN has representation of ‘window’?

• At feature activations, assign neuron in the window region is 1, otherwise 0.

• Logistic regression to find window-representative feature map.

Window feature map removal

Experiments

24

• Faces – scraped human face image from web• 3 million images from 10,000 people.

• Vector arithmetic

Experiments

25

26

Scott Reed*, Zeynep Akata**, Xinchen Yan*, Lajanugen Logeswaran*, Bernt Schiele**, Honglak Lee*

* University of Michigan, Ann Arbor, MI, USA (UMICH.EDU)** Max Planck Institute for Informatics, Saarbrucken, Germany (MPI-

INF.MPG.DE)

Generative Adversarial Text to Image Synthesis, ICML 2016.

Slide credit: Sangdoo Yun @PIL (pil.snu.ac.kr)

Generative Adversarial Text to Image Synthesis

27

Generative model

28

What I cannot create, I do not understand

—Richard Feynman

• Generating images• Image data (e.g. ImageNet): samples from the true data

distribution.

• Generative model (e.g. deep neural network): outputs images, which means samples from the model.

Review of GAN

29

IT’S FAKE MONEY!

IT’S REAL MONEY!

• Counterfeiters vs Police Game

Review of GAN

30

…

…

…

…

…

…

…

Random noise z

…

Discriminator Model D Generator Model G

p(z)

Sample x

Review of GAN

31

…

…

…

…

…

…

……p(z)

…

…

…

Discriminator Model D

Generator Model G

• Try to classify the sample x• D(x)=0 when x from Data• D(x)=1 when x from G (generator)1 0

random noise z

• Try to generate sample x• As similar as the real data

sample xfrom data

sample xfrom G

Differentiable function represented by a multilayer perceptron with parameters


32

min𝐺

max𝐷

𝑉 𝐷, 𝐺


log(1 − 𝐷(𝐺(𝑧)))

…

…

…

…

…

…

……p(z)

…

…

…

Discriminator Model D

Generator Model G

1 0

random noise z

Review of DCGAN

33

• Replace model’s network to CNN

• Example of generator G

(same as D)

…

…

…

…

…

…

……p(z)

…

…

…


Generative Model G

1 0

random noise z

z: uniform dist.

GAN – text to image synthesis

34

• 𝜓 𝑡 : text embedding function (map to 1024 dim)• -> Fully-connected layer -> 128 dim

• Used pre-trained text encoder (can be done end-to-end manner)

• 𝑧~𝑁 0,1 : 100 dim noise vector

Text-Conditional GAN

Conditional GAN

35

Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets."arXiv preprint arXiv:1411.1784 (2014).

Text-conditional GAN (naïve)

36

128 dim

4*4*16

ℎ = 𝜓 𝑡

min𝐺

max𝐷

𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑧~𝑝𝑧

log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ

Real image & matched text Fake image & arbitrary text

Matching-aware discriminator

37

128 dim

4*4*16

min𝐺

max𝐷

𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎

log 1 − 𝐷 𝑥, ℎ

+𝐸𝑧~𝑝𝑧log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ

ℎ = 𝜓(𝑡)Real image & matched text

Fake image & matched text

Real image & mismatched text

Matching-aware Discriminator

38

min𝐺

max𝐷






Real image & mismatched text

Learning with Manifold Interpolation

39

min𝐺

max𝐷






Real image & mis-matched text

𝐸𝑡1,𝑡2~𝑝𝑡𝑒𝑥𝑡𝑑𝑎𝑡𝑎log 1 − 𝐷 𝐺 𝑧, ℎ) , ℎ ,

ℎ = 𝛽ℎ1 + 1 − 𝛽 ℎ2

ℎ1 = 𝜓 𝑡1 , ℎ2 = 𝜓 𝑡2

Additional term to generator to minimize:

Experiments

Style Transfer

42

𝐿𝑠𝑡𝑦𝑙𝑒 = 𝐸𝑡,𝑧~𝑁 0,1 𝑧 − 𝑆 𝐺 𝑧, ℎ2

2

𝑆 𝑥 → 𝑧

128 dim

4*4*16

Finding an inverse mapping from an image to vector

Style Transfer

𝐿𝑠𝑡𝑦𝑙𝑒 = 𝐸𝑡,𝑧~𝑁 0,1 𝑧 − 𝑆 𝐺 𝑧, ℎ2

2

𝑆 𝑥 → 𝑧

Input image: 𝑥Style: 𝑧 = 𝑆(𝑥)Generated image: 𝐺(𝑧, ℎ)

Image Style vector

Text description

Style transferred image

Sentence Interpolation

44

45

Pixel Level Domain Transfer, ECCV 2016.

Domain transfer

46

Whole architecture

47

Converter: deconvnet

Discriminator: Real vs. Fake

Discriminator : Associated or not

Dataset

48

Results

49

Results – varying input conditions

50

Results – inverse setting

51

Image to Image Translation

52

Image-to-Image Translation with

Conditional Adversarial Networks


53

+ L1 loss function

+ PatchGAN

Low-freq correctness

High-freq correctness


54


55

56

Plug & Play Generative NetworksArXiv 2016.

Plug & Play Generative Networks

57

Plug & Play Generative Networks

58

Noiseless joint PPGN-h

59

Update Rule for a feature vector h

Training G

Training D

Noiseless joint PPGN-h

60

Encoder network is pre-trained.

G and D are trained with standard GAN learning technique.

G is not directly used to generate image, but used as a guiding function combined with DAE.

PPGN results

61

PPGN results

62

63

Learning What and Where to Draw

NIPS 2016.

Motivation

64

Generative adversarial what-where nets (GAWWN)

Give a bbox

Give part locations

Give a part location

Bounding Box Control

65

Generator input: z, text, bbox location


66

Discriminator input: real/fake image, text, bbox location


67

Overall structure

Keypoint-Conditional Control

68

Generator input: z, text, keypoint location

(e.g., head in channel 1, left foot in channel 2, …)


69

Discriminator input: real/fake image, text, keypoint location


70

Overall structure

Keypoint Generation

71

There are too much efforts to enter all keypoints (e.g., 15 parts for a bird).

Given a subset of keypoints, let’s find the remaining keypoints’ location.

Among many ways, they chose to use GAN.

𝑘𝑖 = 𝑥𝑖 , 𝑦𝑖 , 𝑣𝑖 , 𝑖 = 1, … , 𝐾 𝐤 ∈ 0,1 𝐾×3Keypoints: 𝑣𝑖 = 1 𝑖𝑓 𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑒𝑙𝑠𝑒 0

User input: 𝐬 ∈ 0,1 𝐾 𝑠𝑖 = 1 𝑖𝑓 𝑔𝑖𝑣𝑒𝑛 𝑒𝑙𝑠𝑒 0

GeneratedKeypoints:

𝑓: 𝑅𝑍+𝑇+3𝐾 → 𝑅3𝐾 MLP (3-layer fully connected network is used)

KepointsDiscriminator:

Distinguish (𝑘𝑟𝑒𝑎𝑙 , 𝑡𝑟𝑒𝑎𝑙) from synthetic

“Given” probability: 0.1 is used

(Dind’t say what is used. Maybe MLP.)

Experiments

72

Bbox control (fix text and z / varying bbox)

Experiments

73

Keypoint control (fix text / use gt keypoints / varying z)

Keypoint control (fix text and z / varying beak and tail keypoint positions andgenerate other keypoints conditionally)

Experiments

74

Keypoint control (fix text and z / generate all keypoints conditioned on text)

Generative Adversarial Networkcse.iitkgp.ac.in/~sudeshna/courses/DL17/GAN-6-april-17.pdfGenerative...

Documents

Transcript of Generative Adversarial Networkcse.iitkgp.ac.in/~sudeshna/courses/DL17/GAN-6-april-17.pdfGenerative...