Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1....

17
Neural Variational Inference Di Wu

Transcript of Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1....

Page 1: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference

Di Wu

Page 2: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Begin with VAE

Variational auto-encoder is used to perform approximate inference on

probabilistic models which have intractable posterior distribution over

latent variables and parameters. It fits a approximate inference model (also

called recognition model, just an encoder) to the true posterior using a

estimator of the ELBO.

Two key points:

reparameterization trick

efficient optimization of ELBO by stochastic gradient

Page 3: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

VAE Model

Marginal log likelihood is:

Rewrite the likelihood using a variational distribution 𝑞𝜙 𝑧 𝑥 :

(*1*)

Page 4: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

We want to differentiate and optimize the lower bound ℒ in (1) w.r.t 𝜙 𝑎𝑛𝑑 𝜃, the main difficulty is

the gradient of 𝜙. With a well chosen posterior 𝑞𝜙(𝑧|𝑥), we can reparameterize variable 𝑧 ∼

𝑞𝜙(𝑧|𝑥) using a differentiable transformation 𝑔𝜙 𝜖, 𝑥 , 𝜖 is an auxiliary noise variable:

Reparameterization Trick: Estimator of the ELBO

For example:

The ELBO can be rewrited as:

Monte Carlo estimates:

Page 5: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Flexibilty of the choice of the prior and the design of variational posterior

Page 6: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

How to Apply VAE Framework to NLP

Simple approach:

Variable Z can be seen as the sentence sematic(global feature, like topic)

Page 7: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

It seems okay, but:

Because RNN can express arbitrary distributions over the output sentences, so RNN can achieve optimal likelihood even without Z, so KL will fall down to zero, actually Z doesn’t be learned

How to alleviate:

Word dropout:

KL cost annealing:

W increases gradually from 0

Page 8: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference for Text Processing(1)

都是套路:

ICML 2016

Page 9: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference for Text Processing(2)

Scenario:Given a question 𝑞, a set of candidate answer (𝑎1, 𝑎2, … , 𝑎𝑛) and judgment (𝑦1, 𝑦2, … , 𝑦𝑛) where 𝑦𝑚 = 1 if 𝑎𝑚 is the answer. so each train data point is the triple(𝑝, 𝑎, 𝑦)

Page 10: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference for Text Processing(2)

Model specification :

Prior: Generative process:

ELBO:

Variational posterior:

Page 11: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference for generating dialogues

AAAI 2017

Page 12: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Inference for generating dialogues

Prior:

Generative process:

Variational posterior:

ELBO:

Page 13: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Topic Model

ICML 2017ICLR 2018 UnderReview

Document topic distribution is a multinomial(discrete),so just transform the Gaussian variable by

softmax, the rest is the same.

𝑔 𝑥 is the transform function:

ELBO:

Page 14: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Topic Model(non-parametric version)

Stick Breaking Process

Kumaraswamy distribution(similar to Beta distribution,more suitable for reparameterization trick)

Inverse CDF:

Page 15: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Neural Variational Topic Model(non-parametric version)

Generative Story: Stick breaking process

Here, we want to approximate the posterior distribution of

Page 16: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

Prior:

Variational posterior:

Likelihood:

ELBO:

Page 17: Neural Variational Inference - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Neural... · 1. Auto-Encoding Variational Bayes, ICLR 2014 2. Generating Sentences From a Continuous

References:

1. Auto-Encoding Variational Bayes, ICLR 2014

2. Generating Sentences From a Continuous Spaces, ICLR 2016

3. Neural Variational Inference for Text Processing, ICML 2016

4. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues, AAAI

2017

5. Discovering Discrete Latent Topics with Neural Variational Inference, ICML 2017

6. A Bayesian Nonparametric Topic Model with Variational Auto-Encoders, ICLR 2018 under

review

7. A Conditional Variational Framework for Dialog Generation, ACL 2017