New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12....
Transcript of New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12....
![Page 1: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/1.jpg)
Deep Boltzmann Machines
Ruslan Salakutdinov and Geoffrey E. Hinton
Amish Goel
University of Illinois Urbana Champaign
December 2, 2016
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 1 / 16
![Page 2: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/2.jpg)
Overview
1 IntroductionRepresentation of the model
2 Learning in Boltzmann MachinesVariational Lower Bound - Mean Field ApproximationStochastic Approximation Procedure - Persistent Markov Chains
3 Additional Tricks for DBMGreedy Pretraining of the ModelDiscriminative Finetuning
4 Simulation results
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 2 / 16
![Page 3: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/3.jpg)
Introduction
Boltzmann Machine - Pairwise Markov Random Fields. Consider a setof random variables as latent i.e. hidden (h) and others as visible (v).
The probability distribution for binary random variables is given by
Pθ(v, h) =1
Zθe−Eθ(v,h),θ = {L, J,W}
Eθ(v,h) = −1
2vTLv− 1
2hT Jh− vTWh,
Figure: Model for Boltzmann Machines
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 3 / 16
![Page 4: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/4.jpg)
Representation
While Boltzmann Machine is a powerful model over the data, it iscomputationally expensive to learn. So, one considers severalapproximations to Boltzmann machines.
Figure: Boltzmann Machines vs RBM
Deep Boltzmann Machine consider hidden nodes in several layers,with a layer being units that have no direct connections.
Figure: Model for Deep Boltzmann MachinesRuslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 4 / 16
![Page 5: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/5.jpg)
Learning in Boltzmann Machines
Model can be trained using Maximum Likelihood. The gradient of thelikelihood takes the following form -
ln(Lθ(v)) = ln(pθ(v)) = ln
(∑h
pθ(v,h)
)
=
ln∑h
exp(−Eθ(v,h))− ln∑v,h
exp(−Eθ(v,h))
;
∂ln(Lθ(v))
∂θ= −
∑h
p(h|v)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Data Dependent Expectation
+∑v,h
p(v,h)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Model Dependent Expectation
(1)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 5 / 16
![Page 6: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/6.jpg)
Learning in Boltzmann Machines
Model can be trained using Maximum Likelihood. The gradient of thelikelihood takes the following form -
ln(Lθ(v)) = ln(pθ(v)) = ln
(∑h
pθ(v,h)
)
=
ln∑h
exp(−Eθ(v,h))− ln∑v,h
exp(−Eθ(v,h))
;
∂ln(Lθ(v))
∂θ= −
∑h
p(h|v)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Data Dependent Expectation
+∑v,h
p(v,h)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Model Dependent Expectation
(1)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 5 / 16
![Page 7: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/7.jpg)
Learning in Boltzmann Machines
Model can be trained using Maximum Likelihood. The gradient of thelikelihood takes the following form -
ln(Lθ(v)) = ln(pθ(v)) = ln
(∑h
pθ(v,h)
)
=
ln∑h
exp(−Eθ(v,h))− ln∑v,h
exp(−Eθ(v,h))
;
∂ln(Lθ(v))
∂θ= −
∑h
p(h|v)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Data Dependent Expectation
+∑v,h
p(v,h)∂Eθ(v,h)
∂θ︸ ︷︷ ︸Model Dependent Expectation
(1)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 5 / 16
![Page 8: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/8.jpg)
Learning in Boltzmann Machines
Using gradient ascent by substituting Eθ(v,h) in the gradientobtained in previous equation, one can obtain the update for therespective parameters as,
∆W = α(EPdata[vhT ]− EPmodel
[vhT ]),
∆L = α(EPdata[vvT ]− EPmodel
[vvT ]),
∆J = α(EPdata[hhT ]− EPmodel
[hhT ]),
∆b = α(EPdata[v]− EPmodel
[v]),
∆c = α(EPdata[h]− EPmodel
[h]),
(2)
The parameters updates in the M.L.E. is very costly in the previoussteps as would need to sum over exponential number of terms tocompute both expectations. One needs Approximations.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 6 / 16
![Page 9: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/9.jpg)
Approximate Maximum Likelihood Learning in BoltzmannMachines
One approximation is to use a variational lower bound on thelog-likelihood-
ln (pθ(v)) = ln
(∑h
pθ(v,h)
)= ln
(∑h
qµ(h|v)
qµ(h|v)pθ(vi ,h)
)
≥∑h
qµ(h|vi )logpθ(v,h) + He(qµ) = L(qµ,θ)
(3)
where qµ(h|v) is an approximate posterior (variational) distributionand He(.) is the entropy function with natural logarithm.
Try to find the tightest lowerbound on the log-likelihood byoptimizing on the distributions qµ and parameters θ.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 7 / 16
![Page 10: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/10.jpg)
Approximate Maximum Likelihood Learning in BoltzmannMachines
One approximation is to use a variational lower bound on thelog-likelihood-
ln (pθ(v)) = ln
(∑h
pθ(v,h)
)= ln
(∑h
qµ(h|v)
qµ(h|v)pθ(vi ,h)
)≥∑h
qµ(h|vi )logpθ(v,h) + He(qµ) = L(qµ,θ)
(3)
where qµ(h|v) is an approximate posterior (variational) distributionand He(.) is the entropy function with natural logarithm.
Try to find the tightest lowerbound on the log-likelihood byoptimizing on the distributions qµ and parameters θ.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 7 / 16
![Page 11: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/11.jpg)
Approximate Maximum Likelihood Learning in BoltzmannMachines
One approximation is to use a variational lower bound on thelog-likelihood-
ln (pθ(v)) = ln
(∑h
pθ(v,h)
)= ln
(∑h
qµ(h|v)
qµ(h|v)pθ(vi ,h)
)≥∑h
qµ(h|vi )logpθ(v,h) + He(qµ) = L(qµ,θ)
(3)
where qµ(h|v) is an approximate posterior (variational) distributionand He(.) is the entropy function with natural logarithm.
Try to find the tightest lowerbound on the log-likelihood byoptimizing on the distributions qµ and parameters θ.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 7 / 16
![Page 12: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/12.jpg)
Variational Learning for Boltzmann Machines
For Boltzmann Machines, the lower bound can be rewritten as(ignoring the bias terms) -
L(qµ,θ) =∑h
qµ(h|v)(−Eθ(v,h))− log(Zθ) + He(qµ) (4)
Using Mean Field Approximation, qµ(h|v) =∏M
j=1 q(hj |v), and oneassumes that q(hj = 1) = µj . (M is the number of hidden units.)
=∑h
M∏i=1
qµ(hi |vi )(
1
2vTLv +
1
2hT Jh + vTWh
)− log(Zθ) + He(qµ)
=1
2vTLv +
1
2µTJµ + vTWµ− log(Zθ) +
M∑j=1
He(µj)
(5)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 8 / 16
![Page 13: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/13.jpg)
Variational Learning for Boltzmann Machines
For Boltzmann Machines, the lower bound can be rewritten as(ignoring the bias terms) -
L(qµ,θ) =∑h
qµ(h|v)(−Eθ(v,h))− log(Zθ) + He(qµ) (4)
Using Mean Field Approximation, qµ(h|v) =∏M
j=1 q(hj |v), and oneassumes that q(hj = 1) = µj . (M is the number of hidden units.)
=∑h
M∏i=1
qµ(hi |vi )(
1
2vTLv +
1
2hT Jh + vTWh
)− log(Zθ) + He(qµ)
=1
2vTLv +
1
2µTJµ + vTWµ− log(Zθ) +
M∑j=1
He(µj)
(5)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 8 / 16
![Page 14: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/14.jpg)
Variational Learning for Boltzmann Machines
For Boltzmann Machines, the lower bound can be rewritten as(ignoring the bias terms) -
L(qµ,θ) =∑h
qµ(h|v)(−Eθ(v,h))− log(Zθ) + He(qµ) (4)
Using Mean Field Approximation, qµ(h|v) =∏M
j=1 q(hj |v), and oneassumes that q(hj = 1) = µj . (M is the number of hidden units.)
=∑h
M∏i=1
qµ(hi |vi )(
1
2vTLv +
1
2hT Jh + vTWh
)− log(Zθ) + He(qµ)
=1
2vTLv +
1
2µTJµ + vTWµ− log(Zθ) +
M∑j=1
He(µj)
(5)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 8 / 16
![Page 15: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/15.jpg)
Variational EM Learning for Boltzmann Machines
Maximize lower bound iteratively by maximizing over the variationalparameters µ and θ iteratively - Typical EM learning idea.
E-step : supµL(qµ,θ) =
supµ12v
TLv + 12µ
TJµ + vTWµ− log(Zθ) +∑M
j=1 He(µj)
Using alternate maximization over each variate, one gets the update
µj ← σ
∑i
Wijvi +∑m 6=j
Jmjµm
,
where σ(.) denotes the sigmoid function.
After running these updates, the parameter µ converges to µ.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 9 / 16
![Page 16: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/16.jpg)
Variational EM Learning for Boltzmann Machines
Maximize lower bound iteratively by maximizing over the variationalparameters µ and θ iteratively - Typical EM learning idea.
E-step : supµL(qµ,θ) =
supµ12v
TLv + 12µ
TJµ + vTWµ− log(Zθ) +∑M
j=1 He(µj)
Using alternate maximization over each variate, one gets the update
µj ← σ
∑i
Wijvi +∑m 6=j
Jmjµm
,
where σ(.) denotes the sigmoid function.
After running these updates, the parameter µ converges to µ.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 9 / 16
![Page 17: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/17.jpg)
Stochastic Approximations or Persistent Markov Chains
M-step :supθL(qµ, θ) = supθ
12v
TLv+ 12µ
TJµ+vTWµ−log(Zθ)+∑M
j=1 He(µj)
MCMC Sampling and Persistent Markov Chains to approximategradient of log-partition function log(Zθ)
The parameter updates for one training example can be written as,
∆W = αt
([vµT ]−
N∑i=1
vhTi
),
∆L = αt
([vvT ]−
N∑i=1
vhTi
),
∆J = αt
([µµT ]−
N∑i=1
vhTi
),
(6)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 10 / 16
![Page 18: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/18.jpg)
Stochastic Approximations or Persistent Markov Chains
M-step :supθL(qµ, θ) = supθ
12v
TLv+ 12µ
TJµ+vTWµ−log(Zθ)+∑M
j=1 He(µj)
MCMC Sampling and Persistent Markov Chains to approximategradient of log-partition function log(Zθ)
The parameter updates for one training example can be written as,
∆W = αt
([vµT ]−
N∑i=1
vhTi
),
∆L = αt
([vvT ]−
N∑i=1
vhTi
),
∆J = αt
([µµT ]−
N∑i=1
vhTi
),
(6)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 10 / 16
![Page 19: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/19.jpg)
Stochastic Approximations or Persistent Markov Chains
M-step :supθL(qµ, θ) = supθ
12v
TLv+ 12µ
TJµ+vTWµ−log(Zθ)+∑M
j=1 He(µj)
MCMC Sampling and Persistent Markov Chains to approximategradient of log-partition function log(Zθ)
The parameter updates for one training example can be written as,
∆W = αt
([vµT ]−
N∑i=1
vhTi
),
∆L = αt
([vvT ]−
N∑i=1
vhTi
),
∆J = αt
([µµT ]−
N∑i=1
vhTi
),
(6)
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 10 / 16
![Page 20: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/20.jpg)
Overall Algorithm for Training Boltzmann Machines
Data: Training set Sn of N binary data vectors v and M, the number ofpersistent Markov chains
Initialize vector θ0 and M samples : {v0,1, h0,1}, ..., {v0,M , h
0,M};for t =0 to T (number of iterations) do
for each n ∈ Sn doRandomly initalize µn and run updates to obtain µn
µj ← σ(∑
i Wijvi +∑
m 6=j Jmjµm
)endfor m = 1 to M (number of persistent markov chains) do
Sample (vt+1,m, ht+1,m
) given (vt+1,m, ht+1,m
) by running Gibbssampler
endUpdate θ using equation (6) (adjusting for batch data) and decreasethe learning rate αt .
end
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 11 / 16
![Page 21: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/21.jpg)
Learning for Deep Boltzmann Machines
For Deep Boltzmann Machines, L = 0 and J would have manyzero-blocks as hidden unit interactions layered. So somecomputations simplified.
Gibbs sampling procedure is simplified as all units in one layer can besampled parallely.
But, learning observed slow, and Greedy Pretraining can result infaster convergence of parameters.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 12 / 16
![Page 22: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/22.jpg)
Pretraining in Deep Boltzmann Machines
Training each RBM separately, with some weight scaling.
Figure: Greedy Layerwise Pretraining for DBM
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 13 / 16
![Page 23: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/23.jpg)
Discriminative Finetuning in Deep Boltzmann Machines
Further, an additional step of finetuning is also considered to improvethe performance.
For example, for a 2 hidden layer DBM, an approximate posterior isused as an augmented input to a neural network with weights ofnetwork initialized using parameters of DBM.
Figure: Finetuning the parameters of DBM
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 14 / 16
![Page 24: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/24.jpg)
Some Experimental Results and Observations
Training a DBM for modeling handwritten digits in MNIST dataset.
(a) DBM Model used for Training (b) Examples of handwritten digits
Figure: An example of DBM used for MNIST data generation withtraining done for 60000 examples
Some interesting observations :- Without Greedy Pretraining, themodels were not producing good results.
Using Discriminative fine tuning, DBM gave 99.5% accuracy, best onMNIST dataset for recognition at that time.
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 15 / 16
![Page 25: New Deep Boltzmann Machinesswoh.web.engr.illinois.edu/.../handout/fall2016_slide18.pdf · 2016. 12. 3. · Overview 1 Introduction Representation of the model 2 Learning in Boltzmann](https://reader036.fdocuments.us/reader036/viewer/2022071217/604e061319f0d452d373da1b/html5/thumbnails/25.jpg)
Thank You
Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel (UIUC)Deep Boltzmann Machines December 2, 2016 16 / 16