Modeling Documents with a Deep Boltzman...
Transcript of Modeling Documents with a Deep Boltzman...
![Page 1: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/1.jpg)
Modeling Documents with a Deep Boltzman Machine
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton
Review By : Nitish Gupta
2nd December, 2016
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 1 / 22
![Page 2: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/2.jpg)
Topic Modeling
Model to find abstract ’topics’ in a collection of documents
Used to find hidden semantic structure of documents
Hypothesize that a document is composed of multiple topics
Topic Modeling builds a generative probabilistic model of the bag ofwords in a document
As inference, gives a distribution over topics
Most commonly used Topic Modeling Technique: LDA
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 2 / 22
![Page 3: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/3.jpg)
Topic Modeling
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 3 / 22
![Page 4: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/4.jpg)
RBM vs. DBM
RBM Pros : Can be efficiently trained and inferring posteriordistribution is exact
RBM Cons : Defines rigid implicit prior on hidden states
DBM Pros : Defines more flexible prior over hidden representations
DBM Cons : Training and performing inference in hard
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 4 / 22
![Page 5: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/5.jpg)
Contributions of the paper
Extends on Replicated Softmax model, a topic model of the RBMfamily
Introduces a Deep Boltzman Machine (DBM) to model documents
Argues that more hidden layers in DBM give more flexibility to thetopic priors, which
Helps better model short documents
Gives better document representation for document retrieval andclassification tasks.
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 5 / 22
![Page 6: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/6.jpg)
Contributions of the paper
Introduces 2 layer DBM : Over-Replicated Softmax Model
Give easy training and fast approximate inference methodology
Retains some level of flexibility in manipulating the prior
Shows efficacy of model as both a better generative model andfeature extractor for retrieval and classification tasks
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 6 / 22
![Page 7: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/7.jpg)
Replicated Softmax Model (Background)
K = Size of worddictionary
N = Number ofwords indocument
h ∈ {0, 1}F -Binary stochastichidden topicfeatures
V: N × Kobserved binarymatrix
E (V , h; θ) = −N∑i=1
F∑j=1
K∑k=1
Wijkhjvik −N∑i=1
K∑k=1
vikbik − NF∑j=1
hjaj (1)
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 7 / 22
![Page 8: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/8.jpg)
Replicated Softmax Model (Background)
E (V , h) =−F∑j=1
K∑k=1
Wjkhj v̂k−
K∑k=1
v̂kbk − NF∑j=1
hjaj
v̂k =N∑i=1
vki
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 8 / 22
![Page 9: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/9.jpg)
Conditional Distributions
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 9 / 22
![Page 10: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/10.jpg)
Over-Replicated Softmax Model
V - N Softmax Units
h(1) - Binary Hidden Layer with shared weights
H(2) - M Softmax Units - M × K binary matrix
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 10 / 22
![Page 11: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/11.jpg)
Over-Replicated Softmax Model
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 11 / 22
![Page 12: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/12.jpg)
Over-Replicated Softmax Model
Showed in ’A Better Way to Pretrain Deep Boltzmann Machines’:
The second-layer of DBM performs 12 pf modeling work as compared to
the first
Therefore if N ≪ M prior over h(1) will be dominated by second-layer
Therefore if M ≪ N, effect of second layer diminshes
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 12 / 22
![Page 13: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/13.jpg)
Learning
Maximize Log-Likelihood of observed data
Derivative of W w.r.t. log-likelihood is given by :
As Exact Maximum Likelihood learning is intractable, VariationalApproach is employed
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 13 / 22
![Page 14: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/14.jpg)
Learning - Variational Approach
Variational Evidence Lower-Bound :
Using Mean-Field Approximation :
µ = {µ1, µ2} - Mean Field Parameters
q(h1j = 1) = µ1
j
q(h1ik = 1) = µ2
k ,∑K
k=1 µ2k = 1
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 14 / 22
![Page 15: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/15.jpg)
Learning - Mean Field Parameters
Variational Evidence Lower-Bound in this case :
Update Rules for Mean Field Parameters :
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 15 / 22
![Page 16: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/16.jpg)
Learning - Model Parameters
Variational Bound is maximized using MCMC-based stochasticapproximation
Let θt and xt = {Vt , h(1)t , h
(2)t } be current parameters and state
1 Sample new state xt+1 using Gibbs sampling
2 Make gradient step using point estimate at sample xt+1 to find newparameters θt+1
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 16 / 22
![Page 17: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/17.jpg)
Pretraining
Exploiting the fact that weights are shared in the two layers,
1 Train RBM with bottom-up weights scaled by factor of 1 + MN
P(h(1)j = 1|V ) = σ
((1 +
M
N)
K∑k=1
vkWkj
)
2 Similar to training N+M observed units with M extra units set toempirical word distribution
3 Show in experiments that this further approximation works well inpractice
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 17 / 22
![Page 18: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/18.jpg)
Inference
Find P(h(1)|V ) : Latent Topic Structure of observed document
Correct Way : Use mean-field approximation as done using training
Fast Alternative : Multiply visible hidden weights by t((1 + MN ) and
approximate using equation in previous page
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 18 / 22
![Page 19: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/19.jpg)
Experiments - Perplexities
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 19 / 22
![Page 20: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/20.jpg)
Experiments - Document Retrieval
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 20 / 22
![Page 21: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/21.jpg)
Experiments - Effect of Document Size
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 21 / 22
![Page 22: Modeling Documents with a Deep Boltzman Machineswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide21.pdf(1);h(2)gbe current parameters and state 1 Sample new state x t+1](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dd06e4acb717a9c062fef/html5/thumbnails/22.jpg)
Thank you!
Questions?
Authors : Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey Hinton (CS, UIUC)Modeling Documents with a Deep Boltzman Machine 2nd December, 2016 22 / 22