Generative Adversarial Networkcse.iitkgp.ac.in/~sudeshna/courses/DL17/GAN-6-april-17.pdfGenerative...
Transcript of Generative Adversarial Networkcse.iitkgp.ac.in/~sudeshna/courses/DL17/GAN-6-april-17.pdfGenerative...
Generative Adversarial Network
Many slides from NIPS 2014
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
Yoshua Bengio
• New method of training deep generative models
• Idea: pit a generator and a discriminator against each other
• Generator tries to draw samples from P(X)
• Discriminator tries to tell if sample came from the generator or the real world
• Both discriminator and generator are deep networks (differentiable functions)
• Can train with backprop: train discriminator for a while, then train generator, then discriminator, …
Generative adversarial networks
2
Generative?
3
• Data:
• Discriminative model: 𝑝𝐷 𝑥; 𝜃𝐷
• Generative model: 𝑝𝐺 𝑥; 𝜃𝐺
• True data distribution: 𝑝𝑑𝑎𝑡𝑎(𝑥)
• Train 𝑝𝐺 𝑥 ≈ 𝑝𝑑𝑎𝑡𝑎(𝑥) 𝑝𝑑𝑎𝑡𝑎(𝑥)
…
Generative Adversarial Network
4
IT’S FAKE MONEY!
IT’S REAL MONEY!
• Counterfeiters vs Police Game
5
Generative Adversarial Network
6
…
…
…
…
…
…
…
Random noise z
…
Discriminative Model D Generative Model G
p(z)
Sample x
Generative Adversarial Network
7
…
…
…
…
…
…
……p(z)
…
…
…
Discriminative Model D: tries to distinguish between samples from real data p(x) and generated ones q(x).
Generative Model G
Try to classify the sample x• D(x)=0 when x from Data• D(x)=1 when x from G
1 0
random noise z
Try to generate sample x• As similar as the real data
sample xfrom data
sample xfrom G
Differentiable function represented by a multilayer perceptron with parameters
8
9
10
11
• To learn the G’s distribution 𝑝𝑔 over data 𝑥,
• we define a prior on input noise variables 𝑝𝑧(𝑧)
• Represent a mapping to data space as 𝐺(𝑧; 𝜃𝑔) whre G is a differ
entiable function (MLP)
• A second multilayer perceptron 𝐷(𝑥; 𝜃𝑑) that outputs a single scalar.
• D(x) represents the probability that x came from the data rather than 𝑝𝑔
• We train D to maximize the probability of assigning the correct label to both training example and samples from G.
• We simultaneously train G to minimize log 1 − 𝐷(𝐺 𝑧 )
• D and G play the following 2-player minimax game with value function V(G,D):
12
Generative Adversarial Network
13
min𝐺
max𝐷
𝑉 𝐷, 𝐺
𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)
log(1 − 𝐷(𝐺(𝑧)))
…
…
…
…
…
…
……p(z)
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise z
Generative Adversarial Network
14
min𝐺
max𝐷
𝑉 𝐷, 𝐺
𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)
log(1 − 𝐷(𝐺(𝑧)))
…
…
…
…
…
…
……
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise zp(z)
• Fixed G, maximize V:
max𝐷
𝑉𝐺 𝐷 =max𝐷
𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)
log(1 − 𝐷(𝐺(𝑧)))
• From sample 𝑥(𝑖), 𝑧(𝑖)
max𝐷
𝑖=1
𝑚
log𝐷 𝑥 𝑖 + log 1 − 𝐷 𝐺 𝑧 𝑖
• Binary Classification (logistic loss):• Sample from data: label=1• Sample from generator: label = 0
Stochastic Gradient
Generative Adversarial Network
15
min𝐺
max𝐷
𝑉 𝐷, 𝐺
𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)
log(1 − 𝐷(𝐺(𝑧)))
…
…
…
…
…
…
……
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise zp(z)
• Fixed D, minimize V(G):
min𝐺
𝑉𝐷 𝐺 = min 𝔼𝑧~𝑝𝑧(𝑧)log(1 − 𝐷(𝐺(𝑧)))
Try to make 𝐷(𝐺(𝑧)) = 1
Stochastic Gradient
Generative Adversarial Network
16
Update D
Update G
GAN Results
17
Generated Samples
Nearest training example
18
• Alec Radford, Luke Metz and Soumith Chintala
Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Network, ICLR 2016.
Slide credit: Sangdoo Yun @PIL (pil.snu.ac.kr)
DCGAN
19
• Deep Convolutional Network + GAN
• Tricks for stable training
• Experimental Analysis
…
…
…
…
…
…
……p(z)
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise z
DCGAN
20
• Replace model’s network to CNN
• Example of generator G
(same as D)
…
…
…
…
…
…
……p(z)
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise z
z: uniform dist.
Experiments
21
• LSUN – bedroom dataset• 3 million training examples
Epoch #1 Epoch #5
Experiments
22
𝑧𝑖 𝑧𝑗
Interpolation form 𝑧𝑖 to 𝑧𝑗
Input noise z
Experiments
23
• Which activations(feature map) in CNN has representation of ‘window’?
• At feature activations, assign neuron in the window region is 1, otherwise 0.
• Logistic regression to find window-representative feature map.
Window feature map removal
Experiments
24
• Faces – scraped human face image from web• 3 million images from 10,000 people.
• Vector arithmetic
Experiments
25
26
Scott Reed*, Zeynep Akata**, Xinchen Yan*, Lajanugen Logeswaran*, Bernt Schiele**, Honglak Lee*
* University of Michigan, Ann Arbor, MI, USA (UMICH.EDU)** Max Planck Institute for Informatics, Saarbrucken, Germany (MPI-
INF.MPG.DE)
Generative Adversarial Text to Image Synthesis, ICML 2016.
Slide credit: Sangdoo Yun @PIL (pil.snu.ac.kr)
Generative Adversarial Text to Image Synthesis
27
Generative model
28
What I cannot create, I do not understand
—Richard Feynman
• Generating images• Image data (e.g. ImageNet): samples from the true data
distribution.
• Generative model (e.g. deep neural network): outputs images, which means samples from the model.
Review of GAN
29
IT’S FAKE MONEY!
IT’S REAL MONEY!
• Counterfeiters vs Police Game
Review of GAN
30
…
…
…
…
…
…
…
Random noise z
…
Discriminator Model D Generator Model G
p(z)
Sample x
Review of GAN
31
…
…
…
…
…
…
……p(z)
…
…
…
Discriminator Model D
Generator Model G
• Try to classify the sample x• D(x)=0 when x from Data• D(x)=1 when x from G (generator)1 0
random noise z
• Try to generate sample x• As similar as the real data
sample xfrom data
sample xfrom G
Differentiable function represented by a multilayer perceptron with parameters
Generative Adversarial Network
32
min𝐺
max𝐷
𝑉 𝐷, 𝐺
𝑉 𝐷, 𝐺 = 𝔼𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)log𝐷 𝑥 + 𝔼𝑧~𝑝𝑧(𝑧)
log(1 − 𝐷(𝐺(𝑧)))
…
…
…
…
…
…
……p(z)
…
…
…
Discriminator Model D
Generator Model G
1 0
random noise z
Review of DCGAN
33
• Replace model’s network to CNN
• Example of generator G
(same as D)
…
…
…
…
…
…
……p(z)
…
…
…
Discriminative Model D
Generative Model G
1 0
random noise z
z: uniform dist.
GAN – text to image synthesis
34
• 𝜓 𝑡 : text embedding function (map to 1024 dim)• -> Fully-connected layer -> 128 dim
• Used pre-trained text encoder (can be done end-to-end manner)
• 𝑧~𝑁 0,1 : 100 dim noise vector
Text-Conditional GAN
Conditional GAN
35
Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets."arXiv preprint arXiv:1411.1784 (2014).
Text-conditional GAN (naïve)
36
128 dim
4*4*16
ℎ = 𝜓 𝑡
min𝐺
max𝐷
𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑧~𝑝𝑧
log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ
Real image & matched text Fake image & arbitrary text
Matching-aware discriminator
37
128 dim
4*4*16
min𝐺
max𝐷
𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎
log 1 − 𝐷 𝑥, ℎ
+𝐸𝑧~𝑝𝑧log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ
ℎ = 𝜓(𝑡)Real image & matched text
Fake image & matched text
Real image & mismatched text
Matching-aware Discriminator
38
min𝐺
max𝐷
𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎
log 1 − 𝐷 𝑥, ℎ
+𝐸𝑧~𝑝𝑧log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ
ℎ = 𝜓(𝑡)Real image & matched text
Fake image & matched text
Real image & mismatched text
Learning with Manifold Interpolation
39
min𝐺
max𝐷
𝐸𝑥~𝑝𝑑𝑎𝑡𝑎log 𝐷 𝑥, ℎ + 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎
log 1 − 𝐷 𝑥, ℎ
+𝐸𝑧~𝑝𝑧log 1 − 𝐷 𝐺 𝑧, ℎ , ℎ
ℎ = 𝜓(𝑡)Real image & matched text
Fake image & matched text
Real image & mis-matched text
𝐸𝑡1,𝑡2~𝑝𝑡𝑒𝑥𝑡𝑑𝑎𝑡𝑎log 1 − 𝐷 𝐺 𝑧, ℎ) , ℎ ,
ℎ = 𝛽ℎ1 + 1 − 𝛽 ℎ2
ℎ1 = 𝜓 𝑡1 , ℎ2 = 𝜓 𝑡2
Additional term to generator to minimize:
Experiments
Experiments
Style Transfer
42
𝐿𝑠𝑡𝑦𝑙𝑒 = 𝐸𝑡,𝑧~𝑁 0,1 𝑧 − 𝑆 𝐺 𝑧, ℎ2
2
𝑆 𝑥 → 𝑧
128 dim
4*4*16
Finding an inverse mapping from an image to vector
Style Transfer
𝐿𝑠𝑡𝑦𝑙𝑒 = 𝐸𝑡,𝑧~𝑁 0,1 𝑧 − 𝑆 𝐺 𝑧, ℎ2
2
𝑆 𝑥 → 𝑧
Input image: 𝑥Style: 𝑧 = 𝑆(𝑥)Generated image: 𝐺(𝑧, ℎ)
Image Style vector
Text description
Style transferred image
Sentence Interpolation
44
45
Pixel Level Domain Transfer, ECCV 2016.
Domain transfer
46
Whole architecture
47
Converter: deconvnet
Discriminator: Real vs. Fake
Discriminator : Associated or not
Dataset
48
Results
49
Results – varying input conditions
50
Results – inverse setting
51
Image to Image Translation
52
Image-to-Image Translation with
Conditional Adversarial Networks
Image to Image Translation
53
+ L1 loss function
+ PatchGAN
Low-freq correctness
High-freq correctness
Image to Image Translation
54
Image to Image Translation
55
56
Plug & Play Generative NetworksArXiv 2016.
Plug & Play Generative Networks
57
Plug & Play Generative Networks
58
Noiseless joint PPGN-h
59
Update Rule for a feature vector h
Training G
Training D
Noiseless joint PPGN-h
60
Encoder network is pre-trained.
G and D are trained with standard GAN learning technique.
G is not directly used to generate image, but used as a guiding function combined with DAE.
PPGN results
61
PPGN results
62
63
Learning What and Where to Draw
NIPS 2016.
Motivation
64
Generative adversarial what-where nets (GAWWN)
Give a bbox
Give part locations
Give a part location
Bounding Box Control
65
Generator input: z, text, bbox location
Bounding Box Control
66
Discriminator input: real/fake image, text, bbox location
Bounding Box Control
67
Overall structure
Keypoint-Conditional Control
68
Generator input: z, text, keypoint location
(e.g., head in channel 1, left foot in channel 2, …)
Keypoint-Conditional Control
69
Discriminator input: real/fake image, text, keypoint location
Keypoint-Conditional Control
70
Overall structure
Keypoint Generation
71
There are too much efforts to enter all keypoints (e.g., 15 parts for a bird).
Given a subset of keypoints, let’s find the remaining keypoints’ location.
Among many ways, they chose to use GAN.
𝑘𝑖 = 𝑥𝑖 , 𝑦𝑖 , 𝑣𝑖 , 𝑖 = 1, … , 𝐾 𝐤 ∈ 0,1 𝐾×3Keypoints: 𝑣𝑖 = 1 𝑖𝑓 𝑣𝑖𝑠𝑖𝑏𝑙𝑒 𝑒𝑙𝑠𝑒 0
User input: 𝐬 ∈ 0,1 𝐾 𝑠𝑖 = 1 𝑖𝑓 𝑔𝑖𝑣𝑒𝑛 𝑒𝑙𝑠𝑒 0
GeneratedKeypoints:
𝑓: 𝑅𝑍+𝑇+3𝐾 → 𝑅3𝐾 MLP (3-layer fully connected network is used)
KepointsDiscriminator:
Distinguish (𝑘𝑟𝑒𝑎𝑙 , 𝑡𝑟𝑒𝑎𝑙) from synthetic
“Given” probability: 0.1 is used
(Dind’t say what is used. Maybe MLP.)
Experiments
72
Bbox control (fix text and z / varying bbox)
Experiments
73
Keypoint control (fix text / use gt keypoints / varying z)
Keypoint control (fix text and z / varying beak and tail keypoint positions andgenerate other keypoints conditionally)
Experiments
74
Keypoint control (fix text and z / generate all keypoints conditioned on text)