On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li...

31
On Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 1 / 31

Transcript of On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li...

Page 1: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

On Restrict Boltzmann Machine Learning

Yingzhen Li

University of Cambridge

June 10, 2014

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 1 / 31

Page 2: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Overview

Restricted Boltzmann Machine

Contrastive Divergence

Expectation Propagation

paper in preparation

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 2 / 31

Page 3: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Deep Learning

From feature engineering to feature learning

Layer-wise training of very deep networks

Promising for AI?

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 3 / 31

Page 4: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Commercialization

Microsoft’s breakthrough in speechrecognition

‘Google Brain’

Baidu’s Institute of Deep Learning

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 4 / 31

Page 5: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Neural Networks

Discriminative models

feed-forward networks (1950 - 1980s)

Generative models

Bayes belief networks (1985)Sigmoid belief nets (1996)

Helmholtz machine (1995)

Undirected graphical models

Markov random field (1980)Boltzmann machine (1986)

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 5 / 31

Page 6: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Neural Networks

Figure: Feed-forward Nets

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 6 / 31

Page 7: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Neural Networks

Figure: Bayes Nets

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 7 / 31

Page 8: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Neural Networks

Figure: Helmholtz machine

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 8 / 31

Page 9: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Neural Networks

Figure: Boltzmann machine

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 9 / 31

Page 10: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Restricted Boltzmann Machine

P(x,h|Θ) =1

Z (Θ)exp (−E (x,h; Θ)) (1)

E (x,h; Θ) = −hTW x− bTx− cTh

Θ = {W ,b, c}Z (Θ) =

∑x,h exp (−E (x,h; Θ)) is often intractable

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 10 / 31

Page 11: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Restricted Boltzmann Machine

Restricted Boltzmann Machine

Maximum likelihood learning

Θ∗ = arg maxΘ

logP(x|Θ), x ∼ PD (2)

Gradient descent

∇WijlogP(x|Θ) = 〈hixj〉PD

− 〈hixj〉P (3)

Difficult to sample from P DIRECTLY

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 11 / 31

Page 12: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

Contrastive Divergence (Geoff Hinton)

Difficult to sample from P DIRECTLY

⇒ try to approximate that expectation!

Gibbs sampling for k sweeps

k = 1 (CD-k) works well

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 12 / 31

Page 13: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

MCMC: Gibbs Sampling

Iteratively ”alternate” between states

xti ∼ P(xi |xt1, ..., xti−1, xt−1i+1 , ..., x

t−1n )

no rejectionthe chain is ergodic (aperiodic + positive recurrent) andirreduciblecan reach the equilibrium distribution P∞ = P

denote the sampling procedure as the transition operator T :

xk ∼ T kPD

Pk = T kPD

P∞ = T∞PD

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 13 / 31

Page 14: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

“Truncated” Chain (Geoff Hinton)

Run the chain for k transitions (CD-k)

Block Gibbs sampling:

P(hi |x,Θ) =sigm(ci + Wi ·x)

P(xj |h,Θ) =sigm(bj + hTW·j)(4)

sigm(x) = (1 + exp−x)−1

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 14 / 31

Page 15: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

“Truncated” Chain (Geoff Hinton)

Run the chain for k transitions (CD-k)

∇WijlogP(x|Θ) ≈ 〈hixj〉PD

− 〈hixj〉Pk

〈hixj〉PD≈ 1

M

∑m h(m)

i x(m)j

〈hixj〉Pk≈ 1

M

∑m h(m,k)

i x(m,k)j

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 15 / 31

Page 16: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

“Truncated” Chain (Geoff Hinton)

Run the chain for k transitions (CD-k)

We are approximately minimizing the objective

KL(PD ||P)− KL(Pk ||P∞)

This tells us the DIRECTION to the (local) optimum

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 16 / 31

Page 17: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

Minimizing the Energy (Yann LeCun)

High probability at x = low energy at x

“wake” gradients 〈hixj〉PD: reduce the energy at the datapoints

“sleep” gradients −〈hixj〉Pk: pull up the energy elsewhere

Hopefully x(m,k) will be far away from x(m)

(when the train mixes quickly)

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 17 / 31

Page 18: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

(Fast) Persistent Contrastive Divergence

Variance increases with k

CD-1 can overfit when the chain’s mixing rate is low

⇒ just run the chain without restart!the model changes very slightly between each iterations

Use fast weights to improve mixing

use ∇ logP(x|Θ + Θfast) instead of ∇ logP(x|Θ)use large weight decay of Θfast

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 18 / 31

Page 19: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

Advanced Models & Techniques

Models

Deep belief netsDeep Boltzmann machinesGaussian RBM, Gated RBM ...Classification RBMDeep Gaussian Process

Training Methods

Annealed importance samplingDropoutHybrid objective (discriminative +generative) Figure: Deep belief nets

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 19 / 31

Page 20: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Contrastive Divergence

Possible Drawbacks (Literature, Rich & Me)

Overfitting

Hard to remove modes far away from the dataNo idea about the volume of the mode

Fail to capture the uncertainty

when there’s lot of missing data

Small reconstruction error 6= high data likelihood!

Can walk away from the true model

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 20 / 31

Page 21: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Bayesian Inference

Learning with Bayes Rule:

P(Θ|D) ∝ P0(Θ)P(D|Θ)

Bayesian Inference:

P(x∗) =

∫P(x∗|Θ)P(Θ|D)dΘ

The posterior of an RBM’s parameters is intractable

⇒ approximate that posteriorVariational inferenceExpectation propagation

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 21 / 31

Page 22: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Factor Graph

p(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)fC (x4) (5)

p(xS) =∑x\xS

p(x), ∀S ⊂ {x1, x2, x3, x4}

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 22 / 31

Page 23: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Expectation Propagation (Tom Minka)

Define some “simple” q(x) to approximate p:

q(x1, x2, x3, x4) = f̃A(x1, x2)f̃B(x2, x3, x4)f̃C (x4) (6)

Iteratively update f̃i by minimizing KL(q\i fi ||qnew )q\i = q/f̃i , i ∈ {A,B,C}

Moment Matching: qnew ← moments[q\i fi ]

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 23 / 31

Page 24: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Expectation Propagation (Tom Minka)

Figure: EP moment matching (fig by Rich Turner)

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 24 / 31

Page 25: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Bayesian Inference (RBM)

True posterior:

P(H ,Θ|D) ∝P0(Θ)∏m

P(x(m),h(m)|Θ)

=P0(Θ)∏m

1

Z (Θ)exp(−E (x(m),h(m); Θ)

) (7)

Approximated posterior:

Q(H ,Θ) = P0(Θ)∏m

1

Z̃ (Θ)f̃ (x(m),h(m); Θ) (8)

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 25 / 31

Page 26: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

EP-k (Rich & Me)

Don’t touch Q(Θ) until a good estimation of Q(H)

...by updating Q(H) with EP for k times

An analogy to CD-k

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 26 / 31

Page 27: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Bayesian Inference (RBM)

x(m)j

bj

h(m)i

ciP0(c)

P0(b)

Wij

P0(W )

m = 1 : N

Figure: Restricted Boltzmann Machine as a factor graph. We separate thegraph into three subgraph (dashed rectangles) for EP approximation.

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 27 / 31

Page 28: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Bayesian Inference (RBM)

Bayesian point estimate (BPE)

P(h∗|D, x∗) ≈ P(h∗|x∗; Θpost), Θpost ∼ Q(Θ) (9)

Approximate model averaging

P(h∗|D, x∗) ≈∫

Θ

P(h∗|x∗; Θpost)Q(Θ)dΘ (10)

⇒ approximate this predictive distribution by EP again!

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 28 / 31

Page 29: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Expectation Propagation

Future Works

Finish the experiments on biased RBM and get it published!

My PhD thesis would be:

theoretical analysis of deep learning in a Bayesian flavourdenoising & filling the missing datafast & parallel algorithms (like that of TrueSkillTM)extension to continuous hidden states deep models

Deep Gaussian ProcessBoltzmann machines with other continuous energy functions,e.g. Gaussian CDF

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 29 / 31

Page 30: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Selected References

Bengio, Y. (2009), ’Learning Deep Architectures for AI.’,Foundations and Trends in Machine Learning 2 (1) , 1-127.Bengio, Y.; Courville, A. C. & Vincent, P. (2013), ’RepresentationLearning: A Review and New Perspectives.’, IEEE Trans. PatternAnal. Mach. Intell. 35 (8) , 1798-1828.Dayan, P.; Hinton, G. E.; Neal, R. N. & Zemel, R. S. (1995), ’TheHelmholtz Machine’, Neural Computation 7 , 889-904.Hinton, G. E.; Osindero, S. & Teh, Y. W. (2006), ’A Fast LearningAlgorithm for Deep Belief Nets’, Neural Computation 18 , 1527-1554.

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 30 / 31

Page 31: On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li University of Cambridge June 10, 2014 Yingzhen Li (University of Cambridge) On

Selected References

Minka, T. (2001), ’A family of algorithms for approximate Bayesianinference’, PhD thesis, Massachusetts Institute of Technology.Qi, Y.; Szummer, M. & Minka, T. (2005), ’Bayesian ConditionalRandom Fields’, Proc. AISTATS 2005.Tieleman, T. (2008), Training restricted Boltzmann machines usingapproximations to the likelihood gradient., in ’ICML’ , ACM, , pp.1064-1071.Tieleman, T. & Hinton, G. E. (2009), Using fast weights to improvepersistent contrastive divergence., in ’ICML’ , ACM, , pp. 130.

Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 31 / 31