Post on 17-Jan-2017
Generative Adversarial Networks ( GAN )The coolest idea in ML in the last twenty years - Yann Lecun
2017.02 Namju Kim (buriburisuri@gmail.com)
Introduction
Taxonomy of ML
Introduction
From David silver, Reinforcement learning (UCL course on RL, 2015).
Supervised Learning
• Find deterministic function f : y = f(x), x : data, y : label
Introduction
Catf F18f
Supervised Learning• How to implement the following function prototype.
Introduction
Supervised Learning• Challenges
– Image is high dimensional data
• 224 x 224 x 3 = 150,528
– Many variations
• viewpoint, illumination, deformation, occlusion,
background clutter, intraclass variation
Introduction
Supervised Learning• Solution
– Feature Vector
• 150,528 → 2,048
– Synonyms
• Latent Vector, Hidden Vector, Unobservable Vector,
Feature, Representation
Introduction
Supervised Learning
Introduction
Supervised Learning• More flexible solution
– Get probability of the label for given data instead of label
itself
Introduction
Cat : 0.98Cake : 0.02Dog : 0.00
f
Supervised Learning
Introduction
Supervised Learning• Mathematical notation of optimization
– y : label, x : data, z : latent, : learnable parameter
Introduction
given parameterized byprobabilityget when P is maximum
Optimal
Supervised Learning• Mathematical notation of classifying ( greedy policy )
– y : label, x : data, z : latent, : fixed optimal parameter
Introduction
given parameterized byprobabilityget y when P is maximum
Optimallabel
prediction
Unsupervised Learning
• Find deterministic function f : z = f(x), x : data, z : latent
Introduction
[0.1, 0.3, -0.8, 0.4, … ]f
Good features• Less redundancy
• Similar features for similar data
• High fidelity
From Yann Lecun, Predictive Learning (NIPS tutorial, 2016).
RL ( cherry )
SL ( icing )
UL ( cake )
Good Bad
Introduction
Unsupervised Learning
Introduction
From Ian Goodfellow, Deep Learning (MIT press, 2016).
E2E(end-to-end)
Unsupervised Learning• More challenging than supervised learning :
– No label or curriculum → self learning
• Some NN solutions :
– Boltzmann machine
– Auto-encoder or Variational Inference
– Generative Adversarial Network
Introduction
Generative Model
• Find generation function g : x = g(z), x : data, z : latent
Introduction
[0.1, 0.3, -0.8, 0.4, … ] g
Unsupervised learning vs. Generative model• z = f(x) vs. x = g(z)
• P(z|x) vs. P(x|z)
• Encoder vs. Decoder ( Generator )– P(x, z) needed. ( cf : P(y|x) in supervised learning )
• P(z|x) = P(x, z) / P(x) → P(x) is intractable ( ELBO )
• P(x|z) = P(x, z) / P(z) → P(z) is given. ( prior )
Introduction
Autoencoders
Stacked autoencoder - SAE• Use data itself as label → Convert UL into reconstruction SL
• z = f(x), x=g(z) → x = g(f(x)) • https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_sae.py
Autoencoders
f(enc)
g(dec)
x x’z
Supervised learningwith L2 loss ( = MSE )
Denoising autoencoder - DAE• Almost same as SAE
• Add random noise to input data → Convert UL into denoising SL• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_dae.py
Autoencoders
f(enc)
g(dec)
x + noise x’z
Supervised learningwith L2 loss ( = MSE )
Variational autoencoder - VAE• Kingma et al, “Auto-Encoding Variational Bayes”, 2013.
• Generative Model + Stacked Autoencoder
– Based on Variational approximation
– other approaches :
• Point estimation ( MAP )
• Markov Chain Monte Carlo ( MCMC )
Autoencoders
Variational autoencoder - VAE• Training• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_vae.py
Autoencoders
f(enc)
g(dec)
x x’z :
Supervised learningwith L2 loss ( = MSE ) +
KL regularizer
z +
Variational autoencoder - VAE• Generating• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_vae_eval.py
Autoencoders
g(dec)
z :
Variational autoencoder - VAE• Question :
• Solution :
– p(z|x) is intractable but MCMC is slow.
– Introduce auxiliary variational function q(z|x)
– Approximate q(z|x) to p(z|x) by ELBO
– Make q(z|x)
Autoencoders
z = z → p(z|x) = +
Variational autoencoder - VAE• Simply : → 0, → 1
• KLD Regularizer :
–
– Gaussian case :
• -
• If we fix = 1, ( Practical approach )
– → MSE ( )
Autoencoders
Variational autoencoder - VAE• KLD ( Kullback-Leibler divergence )
• A sort of distribution difference measure – cf : KS (Kolmogorov-smirov) test, JSD(Jensen-shannon Divergence)
Autoencoders
Variational autoencoder - VAE• Find problems in the following code
Autoencoders
Variational autoencoder - VAE• Reparameterization trick
– Enable back propagation
– Reduce variances of gradients
Autoencoders
f(enc)
g(dec)
z :
z +f(enc)
g(dec)
z :
Variational autoencoder - VAE• This is called ‘Reparameterization trick’
Autoencoders
Variational autoencoder - VAE• Results
Autoencoders
Generative Adversarial Networks
Generative Adversarial Networks - GAN
• Ian Goodfellow et al, “Generative Adversarial Networks”, 2014.
– Learnable cost function
– Mini-Max game based on Nash Equilibrium
• Little assumption
• High fidelity
– Hard to training - no guarantee to equilibrium.
GAN
Generative Adversarial Networks - GAN• Alternate training• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_gan.py
GAN
G(generator)
z :
D(discriminator)
real image
fake image
1(Real)
0(fake)
1(real)
Discriminator trainingGenerator training
Generative Adversarial Networks - GAN• Generating• https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_gan_eval.py
GAN
G(generator)
z :
Generative Adversarial Networks - GAN• Mathematical notation
GAN
x is sampled from real data
z is sampled from N(0, I)
Expectation prob. of D(real) prob. of D(fake)
Maximize DMinimize G
Value of
fake
Generative Adversarial Networks - GAN• Mathematical notation - discriminator
GAN
Maximize prob. of D(real) Minimize prob. of D(fake)
BCE(binary cross entropy) with label 1 for real, 0 for fake.( Practically, CE will be OK. or more plausible. )
Generative Adversarial Networks - GAN• Mathematical notation - generator
GAN
Maximize prob. of D(fake)
BCE(binary cross entropy) with label 1 for fake.( Practically, CE will be OK. or more plausible. )
Generative Adversarial Networks - GAN• Mathematical notation - equilibrium
GAN
Jansen-Shannon divergence
0.5
Generative Adversarial Networks - GAN
GAN
Generative Adversarial Networks - GAN• Result
GAN
Generative Adversarial Networks - GAN• Why blurry and why sharper ?
GAN
From Ian Goodfellow, Deep Learning (MIT press, 2016) From Christian et al, Photo-realistic single image super-resolution using generative adversarial networks, 2016
Training GANs
Training GANs
Pitfall of GAN• No guarantee to equilibrium
– Mode collapsing
– Oscillation
– No indicator when to finish
• All generative model– Evaluation metrics ( Human turing test )
– Overfitting test ( Nearest Neighbor ) → GAN is robust !!!
– Diversity test
– Wu et al, On the quantitative analysis of decoder-based generative model, 2016
DCGAN• Radford et al, Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks, 2015
– Tricks for gradient flow
• Max pooling → Strided convolution or average pooling
• Use LeakyReLU instead of ReLU
– Other tricks
• Use batch normal both generator and discriminator
• Use Adam optimizer ( lr = 0.0002, a = 0.9, b=0.5 )
Training GANs
DCGAN• Results
Training GAN
DCGAN• Latent vector arithmetic
Training GAN
Pitfall of GAN
Training GANs
mode collapsing oscillating
Escaping mode collapsing
• Minibatch discrimination ( Salimans et al, 2016 ) ← slow
• Replay memory ( Shrivastava et al, 2016 ) ← controversial
• Pull-away term regularizer ( Zhao et al, 2016 ) ← not so good
• Unrolled GAN ( Metz et al, 2016 ) ← not so good
Training GANs
Escaping mode collapsing
• Discriminator ensemble ( Durugkar, 2016) ← not tested
• Latent-Image pair discrimination ( Donahue et al, 2016,
Dumoulin et al, 2016 ) ← Good ( additional computing time )
• InfoGAN ( Chen et al, 2016 ) ← Good
Training GANs
Escaping mode collapsing
Training GANs
Other tricks• Salimans et al, Improved Techniques for Training GANs, 2016
• https://github.com/soumith/ganhacks
– One-sided label smoothing ← not sure
– Historical averaging ← Unrolled GAN is better
– Feature matching ← working
– Use noise or dropout into real/fake image ← working
– Don’t balance D and G loss ← I agree
Training GANs
Batch normalization in GAN
• Use in G but cautious in D
• Do not use at last layer of G
and first layer of D
• IMHO, don’t use BN in D
• Reference BN or Virtual BN
( Not sure )
Training GANs
Variants of GANs
Taxonomy of Generative Models
Variants of GANs
From Ian Goodfellow, NIPS 2016 Tutorial : Generative Adversarial Netorks, 2016
SOTA !!!AR based model :
PixelRNNWaveNet
Variants of GANs
From Ordena et al, Conditional Image Synthesis With Auxiliary Classifier GANs, 2016
SOTA with LabelSOTA without Label
Taxonomy of GANs
AFL, ALI• Donahue et al, Adversarial Feature Learning, 2016
• Domoulin et al, Adversarial Learned Inference, 2016
Variants of GANs
AFL, ALI• Results
Variants of GANs
InfoGAN• Chen et al, InfoGAN : Interpretable Representation Learning by
Information Maximizing Generative Adversarial Nets, 2016
Variants of GANs
InfoGAN• Results
• My implementation : https://github.com/buriburisuri/sugartensor/blob/master/sugartensor/example/mnist_info_gan.py
Variants of GANs
EBGAN• Junbo et al, Energy-based generative adversarial networks,
2016
Variants of GANs
EBGAN• Results
Variants of GANs
EBGAN• My implementation : https://github.com/buriburisuri/ebgan
Variants of GANs
ACGAN• Ordena et al, Conditional Image Synthesis with Auxiliary
Classifier GANs, 2016
Variants of GANs
ACGAN• Results
Variants of GANs
ACGAN• My implementation : https://github.com/buriburisuri/ac-gan
Variants of GANs
WGAN• Martin et al, Wasserstein GAN, 2017
Variants of GANs
G(generator)z :
C(critic)
real image
fake image
High valueLow valueHigh value
Critic trainingGenerator training
Weight Clamping ( No BCE )
WGAN• Martin et al, Wasserstein GAN, 2017
– JSD → Earth Mover Distance(=Wasserstein-1 distance)
– Prevent the gradient vanishing by using weak distance metrics
– Provide the parsimonious training indicator.
Variants of GANs
WGAN• Results
Variants of GANs
Application of GANs
Generating image• Wang et al, Generative Image Modeling using Style and
Structure Adversarial Networks, 2016
Application of GANs
Generating image• Wang et al : Results
Application of GANs
Generating image• Jamonglab, 128x96 face image generation, 2016.11
Application of GANs
Generating image• Two-step generation : Sketch → Color
• Binomial random seed instead of gaussian
Application of GANs
Generating image• Zhang et al, StackGAN : Text to Photo-realistic Image Synthesis
with Stacked GANs, 2016
Application of GANs
Generating image• Zhang et al : Results
Application of GANs
Generating image• Reed et al, Learning What and Where to Draw, 2016
Application of GANs
Generating image• Reed et al : Results
Application of GANs
Generating image• Nguyen et al, Plug & Play Generative Networks, 2016
– Complex langevin sampling via DAE
Application of GANs
Generating image• Arici et al, Associative Adversarial Networks, 2016
–
Application of GANs
p(z|x) =
Generating image• Arici et al : Results
Application of GANs
Translating image• Perarnau et al, Invertible Conditional GANs for image editing,
2016
Application of GANs
Translating image• Perarnau et al : results
Application of GANs
Translating image• Isola et al, Image-to-image translation with convolutional
adversarial networks, 2016
Application of GANs
Translating image• Isola et al : Results
Application of GANs
Translating image• Ledig et al, Photo-Realistic Single Image Super-Resolution
Using a GAN, 2016
Application of GANs
Translating image• Ledig et al : Results
Application of GANs
Translating image• Sajjadi et al, EnhanceNet : Single Image Super-Resolution
through Automated Texture Synthesis, 2016
– SRGAN + Gram metrics loss
Application of GANs
Domain adaptation• Taigman et al, Unsupervised cross-domain image generation,
2016
Application of GANs
Domain adaptation• Taigman et al, Unsupervised cross-domain image generation,
2016
Application of GANs
Domain adaptation• Bousmalis et al, Unsupervised Pixel-Level Domain Adaptation
with Generative Adversarial Networks, 2016
Application of GANs
Domain adaptation• Bousmalis et al : Results
Application of GANs
Domain adaptation• Shrivastava et al, Learning from simulated and unsupervised
images through adversarial training, 2016
Application of GANs
Domain adaptation• Shrivastava et al : Results
Application of GANs
Unsupervised clustering• Jamonglab : https://github.com/buriburisuri/timeseries_gan
– Using InfoGAN
Application of GANs
Real Fake
Unsupervised clustering
Application of GANs
Text generation• Yu et al, SeqGAN : Sequence Generative Adversarial Nets with
Policy Gradient, 2016
Application of GANs
Text generation• Maddison et al, The concrete distribution : A continous
relaxation of discrete random variables, 2016
• Kusner et al, GANs for sequences of discrete elements with the
Gumbel-softmax distribution, 2016
Application of GANs
Text generation• Zhang et al, Generating text via Adversarial Training, 2016
Application of GANs
Imitation learning ( IRL or Optimal control )• Ho et al, Model-free imitation learning with policy optimization, 2016
• Finn et al, A connection between generative adversarial networks,
Inverse Reinforcement Learning, and Energy-based models, 2016
Application of GANs
Future of GANs
Pre-trained unified features• Unified features by unsupervised learning
• Pre-trained and shipped on mobile chip as default
Future of GANs
E(pre-trained
encoder)
zx C
(classifier)
1(Good)
0(Bad)
Personalization at scale• Environment ( User ) simulator
Future of GANs
Personalization at scale• Intelligent agent using RL
Future of GANs
Personalization at scale• For example, cardiac simulator using GAN
Future of GANs
Conclusion• Unsupervised learning is next frontier in AI
– Same algos, multiple domains
• Creativity and experience are no longer human temperament
Towards Artificial General Intelligence
Future of GANs
Thank you.
http://www.slideshare.net/ssuser77ee21/generative-adversarial-networks-70896091