On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li...
Transcript of On Restrict Boltzmann Machine Learning - Yingzhen Li Restrict Boltzmann Machine Learning Yingzhen Li...
On Restrict Boltzmann Machine Learning
Yingzhen Li
University of Cambridge
June 10, 2014
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 1 / 31
Overview
Restricted Boltzmann Machine
Contrastive Divergence
Expectation Propagation
paper in preparation
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 2 / 31
Restricted Boltzmann Machine
Deep Learning
From feature engineering to feature learning
Layer-wise training of very deep networks
Promising for AI?
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 3 / 31
Restricted Boltzmann Machine
Commercialization
Microsoft’s breakthrough in speechrecognition
‘Google Brain’
Baidu’s Institute of Deep Learning
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 4 / 31
Restricted Boltzmann Machine
Neural Networks
Discriminative models
feed-forward networks (1950 - 1980s)
Generative models
Bayes belief networks (1985)Sigmoid belief nets (1996)
Helmholtz machine (1995)
Undirected graphical models
Markov random field (1980)Boltzmann machine (1986)
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 5 / 31
Restricted Boltzmann Machine
Neural Networks
Figure: Feed-forward Nets
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 6 / 31
Restricted Boltzmann Machine
Neural Networks
Figure: Bayes Nets
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 7 / 31
Restricted Boltzmann Machine
Neural Networks
Figure: Helmholtz machine
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 8 / 31
Restricted Boltzmann Machine
Neural Networks
Figure: Boltzmann machine
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 9 / 31
Restricted Boltzmann Machine
Restricted Boltzmann Machine
P(x,h|Θ) =1
Z (Θ)exp (−E (x,h; Θ)) (1)
E (x,h; Θ) = −hTW x− bTx− cTh
Θ = {W ,b, c}Z (Θ) =
∑x,h exp (−E (x,h; Θ)) is often intractable
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 10 / 31
Restricted Boltzmann Machine
Restricted Boltzmann Machine
Maximum likelihood learning
Θ∗ = arg maxΘ
logP(x|Θ), x ∼ PD (2)
Gradient descent
∇WijlogP(x|Θ) = 〈hixj〉PD
− 〈hixj〉P (3)
Difficult to sample from P DIRECTLY
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 11 / 31
Contrastive Divergence
Contrastive Divergence (Geoff Hinton)
Difficult to sample from P DIRECTLY
⇒ try to approximate that expectation!
Gibbs sampling for k sweeps
k = 1 (CD-k) works well
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 12 / 31
Contrastive Divergence
MCMC: Gibbs Sampling
Iteratively ”alternate” between states
xti ∼ P(xi |xt1, ..., xti−1, xt−1i+1 , ..., x
t−1n )
no rejectionthe chain is ergodic (aperiodic + positive recurrent) andirreduciblecan reach the equilibrium distribution P∞ = P
denote the sampling procedure as the transition operator T :
xk ∼ T kPD
Pk = T kPD
P∞ = T∞PD
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 13 / 31
Contrastive Divergence
“Truncated” Chain (Geoff Hinton)
Run the chain for k transitions (CD-k)
Block Gibbs sampling:
P(hi |x,Θ) =sigm(ci + Wi ·x)
P(xj |h,Θ) =sigm(bj + hTW·j)(4)
sigm(x) = (1 + exp−x)−1
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 14 / 31
Contrastive Divergence
“Truncated” Chain (Geoff Hinton)
Run the chain for k transitions (CD-k)
∇WijlogP(x|Θ) ≈ 〈hixj〉PD
− 〈hixj〉Pk
〈hixj〉PD≈ 1
M
∑m h(m)
i x(m)j
〈hixj〉Pk≈ 1
M
∑m h(m,k)
i x(m,k)j
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 15 / 31
Contrastive Divergence
“Truncated” Chain (Geoff Hinton)
Run the chain for k transitions (CD-k)
We are approximately minimizing the objective
KL(PD ||P)− KL(Pk ||P∞)
This tells us the DIRECTION to the (local) optimum
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 16 / 31
Contrastive Divergence
Minimizing the Energy (Yann LeCun)
High probability at x = low energy at x
“wake” gradients 〈hixj〉PD: reduce the energy at the datapoints
“sleep” gradients −〈hixj〉Pk: pull up the energy elsewhere
Hopefully x(m,k) will be far away from x(m)
(when the train mixes quickly)
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 17 / 31
Contrastive Divergence
(Fast) Persistent Contrastive Divergence
Variance increases with k
CD-1 can overfit when the chain’s mixing rate is low
⇒ just run the chain without restart!the model changes very slightly between each iterations
Use fast weights to improve mixing
use ∇ logP(x|Θ + Θfast) instead of ∇ logP(x|Θ)use large weight decay of Θfast
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 18 / 31
Contrastive Divergence
Advanced Models & Techniques
Models
Deep belief netsDeep Boltzmann machinesGaussian RBM, Gated RBM ...Classification RBMDeep Gaussian Process
Training Methods
Annealed importance samplingDropoutHybrid objective (discriminative +generative) Figure: Deep belief nets
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 19 / 31
Contrastive Divergence
Possible Drawbacks (Literature, Rich & Me)
Overfitting
Hard to remove modes far away from the dataNo idea about the volume of the mode
Fail to capture the uncertainty
when there’s lot of missing data
Small reconstruction error 6= high data likelihood!
Can walk away from the true model
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 20 / 31
Expectation Propagation
Bayesian Inference
Learning with Bayes Rule:
P(Θ|D) ∝ P0(Θ)P(D|Θ)
Bayesian Inference:
P(x∗) =
∫P(x∗|Θ)P(Θ|D)dΘ
The posterior of an RBM’s parameters is intractable
⇒ approximate that posteriorVariational inferenceExpectation propagation
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 21 / 31
Expectation Propagation
Factor Graph
p(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)fC (x4) (5)
p(xS) =∑x\xS
p(x), ∀S ⊂ {x1, x2, x3, x4}
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 22 / 31
Expectation Propagation
Expectation Propagation (Tom Minka)
Define some “simple” q(x) to approximate p:
q(x1, x2, x3, x4) = f̃A(x1, x2)f̃B(x2, x3, x4)f̃C (x4) (6)
Iteratively update f̃i by minimizing KL(q\i fi ||qnew )q\i = q/f̃i , i ∈ {A,B,C}
Moment Matching: qnew ← moments[q\i fi ]
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 23 / 31
Expectation Propagation
Expectation Propagation (Tom Minka)
Figure: EP moment matching (fig by Rich Turner)
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 24 / 31
Expectation Propagation
Bayesian Inference (RBM)
True posterior:
P(H ,Θ|D) ∝P0(Θ)∏m
P(x(m),h(m)|Θ)
=P0(Θ)∏m
1
Z (Θ)exp(−E (x(m),h(m); Θ)
) (7)
Approximated posterior:
Q(H ,Θ) = P0(Θ)∏m
1
Z̃ (Θ)f̃ (x(m),h(m); Θ) (8)
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 25 / 31
Expectation Propagation
EP-k (Rich & Me)
Don’t touch Q(Θ) until a good estimation of Q(H)
...by updating Q(H) with EP for k times
An analogy to CD-k
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 26 / 31
Expectation Propagation
Bayesian Inference (RBM)
x(m)j
bj
h(m)i
ciP0(c)
P0(b)
Wij
P0(W )
m = 1 : N
Figure: Restricted Boltzmann Machine as a factor graph. We separate thegraph into three subgraph (dashed rectangles) for EP approximation.
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 27 / 31
Expectation Propagation
Bayesian Inference (RBM)
Bayesian point estimate (BPE)
P(h∗|D, x∗) ≈ P(h∗|x∗; Θpost), Θpost ∼ Q(Θ) (9)
Approximate model averaging
P(h∗|D, x∗) ≈∫
Θ
P(h∗|x∗; Θpost)Q(Θ)dΘ (10)
⇒ approximate this predictive distribution by EP again!
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 28 / 31
Expectation Propagation
Future Works
Finish the experiments on biased RBM and get it published!
My PhD thesis would be:
theoretical analysis of deep learning in a Bayesian flavourdenoising & filling the missing datafast & parallel algorithms (like that of TrueSkillTM)extension to continuous hidden states deep models
Deep Gaussian ProcessBoltzmann machines with other continuous energy functions,e.g. Gaussian CDF
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 29 / 31
Selected References
Bengio, Y. (2009), ’Learning Deep Architectures for AI.’,Foundations and Trends in Machine Learning 2 (1) , 1-127.Bengio, Y.; Courville, A. C. & Vincent, P. (2013), ’RepresentationLearning: A Review and New Perspectives.’, IEEE Trans. PatternAnal. Mach. Intell. 35 (8) , 1798-1828.Dayan, P.; Hinton, G. E.; Neal, R. N. & Zemel, R. S. (1995), ’TheHelmholtz Machine’, Neural Computation 7 , 889-904.Hinton, G. E.; Osindero, S. & Teh, Y. W. (2006), ’A Fast LearningAlgorithm for Deep Belief Nets’, Neural Computation 18 , 1527-1554.
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 30 / 31
Selected References
Minka, T. (2001), ’A family of algorithms for approximate Bayesianinference’, PhD thesis, Massachusetts Institute of Technology.Qi, Y.; Szummer, M. & Minka, T. (2005), ’Bayesian ConditionalRandom Fields’, Proc. AISTATS 2005.Tieleman, T. (2008), Training restricted Boltzmann machines usingapproximations to the likelihood gradient., in ’ICML’ , ACM, , pp.1064-1071.Tieleman, T. & Hinton, G. E. (2009), Using fast weights to improvepersistent contrastive divergence., in ’ICML’ , ACM, , pp. 130.
Yingzhen Li (University of Cambridge) On Restrict Boltzmann Machine Learning June 10, 2014 31 / 31