CSC2535 Lecture 4 Boltzmann Machines, Sigmoid Belief Nets and Gibbs sampling
Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches...
Transcript of Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches...
![Page 1: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/1.jpg)
Unsupervised Learning of Hierarchical Models
Marc'Aurelio Ranzato Geoff Hintonin collaboration with Josh Susskind and Vlad Mnih
Advanced Machine Learning, 9 March 2011
![Page 2: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/2.jpg)
Example: facial expression recognition
sadnessfear
anger
happiness neutral
Unlabeled face images Few labeled face images
What is the expression?(unseen occluded images)
![Page 3: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/3.jpg)
Example: facial expression recognition
Input image
layer 1
Ranzato, Susskind, Mnih, Hinton “On deep generative models..” CVPR11
1) training on unlabeled data: maximum likelihood
layer 2
![Page 4: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/4.jpg)
Example: facial expression recognition
Ranzato, Susskind, Mnih, Hinton “On deep generative models..” CVPR11
1) training on unlabeled data: maximum likelihood2) use the generative model to inpute missing values
..
.
..
.
![Page 5: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/5.jpg)
Example: facial expression recognition
Ranzato, Susskind, Mnih, Hinton “On deep generative models..” CVPR11
1) training on unlabeled data: maximum likelihood2) use the generative model to inpute missing values3) use the latent representation as features4) train classifier on labeled features
![Page 6: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/6.jpg)
Ranzato et al. “Sparse feature learning for Deep Belief Nets” NIPS07
input image
MNIST digits1 0 0 0 0 0 0 0 0 0
Receptive field of 2nd layer units
The power of hierarchical representations
2nd layer sparse code
1st layer sparse code
![Page 7: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/7.jpg)
Ranzato et al. “Sparse feature learning for Deep Belief Nets” NIPS07
input image
MNIST digits0 1 0 0 0 0 0 0 0 0
Receptive field of 2nd layer units
The power of hierarchical representations
2nd layer sparse code
1st layer sparse code
![Page 8: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/8.jpg)
Ranzato et al. “Sparse feature learning for Deep Belief Nets” NIPS07
input image
MNIST digits
Receptive fields of 2nd layer units
Hierarchical representations can discover complex dependencies!
The power of hierarchical representations
2nd layer sparse code
1st layer sparse code
![Page 9: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/9.jpg)
Unsupervised Learning
- we will learn hierarchical representations in a greedy fashion using unlabeled data (little if any assumption on the input)
- we can focus on single layer unsupervised algorihtms- how can we interpret unsupervised algorithms?- how can they learn representations?- how can we compare them?
![Page 10: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/10.jpg)
Two Approaches to Unsupervised Learning
1st strategy: constrain latent representation & optimize score only at training samples
- K-Means - sparse coding
- use lower dimensional representations Ranzato et al. NIPS 06, Ranzato et al. CVPR 07, Ranzato et al. NIPS 07, Kavukcuoglu et al. CVPR 09
Ranzato et al. ICML 08
![Page 11: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/11.jpg)
Two Approaches to Unsupervised Learning
1st strategy: constrain latent representation & optimize score only at training samples
- K-Means - sparse coding
- use lower dimensional representations
2nd strategy: optimize score for training samples while normalizing the score over the whole space (maximum likelihood)
Ranzato et al. AISTATS 10, Ranzato et al. CVPR 10, Ranzato et al. NIPS 10, Ranzato et al. CVPR 11
p(x)
x
![Page 12: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/12.jpg)
Two Approaches to Unsupervised Learning
Pros Cons
1st strategy(constrain code)
2nd strategy(maximum likelihood)
- efficient learning- defined through objective function(monitor training)
- choice of functions- not robust to missing inputs- not good for generation
- intuitive design- it can generate- compositionality
- hard to train(variational inference, MCMC, etc.)
![Page 13: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/13.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 14: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/14.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 15: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/15.jpg)
Means and Covariances: PPCA
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h
p x specifies covariance across all data points
p x∣h≠ p x
![Page 16: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/16.jpg)
Means and Covariances: PPCA
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h
p x∣h≠ p x
p x∣ht specifies distribution for each “data point”h t~p h∣x t
![Page 17: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/17.jpg)
Means and Covariances: PPCA
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h
p x =∫hp x∣h p h
p x∣h specifies distribution for each “data point”
![Page 18: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/18.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h=N mean h , D
- examples: PPCA, Factor Analysis, ICA, Gaussian RBM
p x∣h
![Page 19: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/19.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
- examples: PPCA, Factor Analysis, ICA, Gaussian RBM
Poor model of dependency between input variables
p x∣h=N mean h , D
p x∣h
![Page 20: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/20.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h=N 0,Covarianceh
- examples: PoT, cRBM
Welling et al. NIPS 2003, Ranzato et al. AISTATS 10
p x∣h
![Page 21: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/21.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
- examples: PoT, cRBM
Poor model of mean intensity
p x∣h=N 0,Covarianceh
Welling et al. NIPS 2003, Ranzato et al. AISTATS 10
p x∣h
![Page 22: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/22.jpg)
input image
![Page 23: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/23.jpg)
Which image has same covariance (but different mean)?
Which image has similar mean (but different covariance)?
input image
![Page 24: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/24.jpg)
same covariance but different mean
similar mean but different covariance
Equivalent image according to PoT Equivalent image according to FA
input image
![Page 25: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/25.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
p x∣h=N meanh ,Covariance h
- this is what we propose: mcRBM, mPoT
Ranzato et al. CVPR 10, Ranzato et al. NIPS 2010, Ranzato et al. CVPR 11
p x∣h
![Page 26: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/26.jpg)
Conditional Distribution Over Input
INPUT SPACE LATENT SPACE
Training sample Latent vector
- this is what we propose: mcRBM, mPoT
p x∣h=N meanh ,Covariance h
Ranzato et al. CVPR 10, Ranzato et al. NIPS 2010, Ranzato et al. CVPR 11
p x∣h
![Page 27: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/27.jpg)
p x ,hm , hc ∝exp−E x ,hm , hc
x
xx
xmcRBM
Ranzato Hinton CVPR 10
- two sets of latent variables to modulate mean and covariance of the conditional distribution over the input
- energy-based model
x∈ℝD
hc∈{0,1 }M
hm∈{0,1}N
![Page 28: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/28.jpg)
- easy generation- slow inference
E=12x−m ' −1
x−m
hm hc
x
p x∣hm , hc=N mhm , hc
![Page 29: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/29.jpg)
- less easy generation- fast inference
p x∣hm , hc=N hc
−1 m hm
, hc
E=12
x ' −1 x−m x
E=12x−m ' −1
x−m
hm hc
x
p x∣hm , hc=N mhm , hc hm hc
x
- easy generation- slow inference
![Page 30: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/30.jpg)
Geometric interpretation
If we multiply them, we get...
![Page 31: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/31.jpg)
If we multiply them, we get...a sharper and shorter Gaussian
Geometric interpretation
p x∣hm , hc
![Page 32: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/32.jpg)
E x , hc , hm=
12
x ' −1 x
Covariance part of the energy function:
pair-wise MRFx p xq
Ranzato Hinton CVPR 10
x∈ℝD
−1∈ℝD×D
![Page 33: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/33.jpg)
E x , hc , hm=
12
x ' C C ' x
Covariance part of the energy function:
pair-wise MRFx p xq
factorization
Ranzato Hinton CVPR 10
x∈ℝD
C∈ℝD×F
![Page 34: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/34.jpg)
E x , hc , hm=
12
x ' C [diag hc]C ' x
Covariance part of the energy function:
gated MRFx p xq
factorization + hiddens
hkc
CC
Ranzato Hinton CVPR 10
x∈ℝD
C∈ℝD×F
hc∈{0,1 }F
![Page 35: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/35.jpg)
E x , hc , hm=
12
x ' C [diag P hc]C ' x
Covariance part of the energy function:
gated MRFx p xq
factorization + hiddens
hkc
CC
P
Ranzato Hinton CVPR 10
x∈ℝD
C∈ℝD×F
hc∈{0,1 }M
P∈ℝF ×M
![Page 36: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/36.jpg)
E x , hc , hm=
12
x ' C [diag P hc]C ' x
12
x ' x− x ' W hm
Overall energy function:
gated MRFx p xq
covariance part
hkc
CC
P
mean part
h jm
W
Ranzato Hinton CVPR 10
x∈ℝD
W ∈ℝD×N
hm∈{0,1}N
![Page 37: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/37.jpg)
Overall energy function:
gated MRF
covariance part
hkc
CC
P
mean part
h jm
W
p x∣hc , hm=N Whm ,
−1=C diag [P hc
]C 'I
Ranzato Hinton CVPR 10
E x , hc , hm=
1
2x ' C [diag P hc
]C ' x12
x ' x− x ' W hm
x p xq
![Page 38: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/38.jpg)
Overall energy function:
gated MRF
covariance part
hkc
CC
P
h jm
W
inference phkc=1 ∣x=−
12
Pk C ' x2bk
ph jm=1 ∣x = W j ' xb j
mean part
Ranzato Hinton CVPR 10
E x , hc , hm=
1
2x ' C [diag P hc
]C ' x12
x ' x− x ' W hm
x p xq
![Page 39: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/39.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 40: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/40.jpg)
Learning
- maximum likelihood
- Fast Persistent Contrastive Divergence - Hybrid Monte Carlo to draw samples
x i x j
hnm
hkc
P
CCW
p x =∫hm , hc e−E x , hm , hc
∫x ,h m , hc e−E x ,h m ,hc
E=1
2x ' C [ diag P hc
]C ' x−x ' W hm...
![Page 41: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/41.jpg)
∂ F∂ w
training sample
F= - log(p(x)) + K
Learning
![Page 42: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/42.jpg)
HMC dynamics∂ F
∂ w
training sample
Learning
![Page 43: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/43.jpg)
HMC dynamics∂ F
∂ w
training sample
Learning
F= - log(p(x)) + K
![Page 44: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/44.jpg)
HMC dynamics∂ F
∂ w
training sample
Learning
F= - log(p(x)) + K
![Page 45: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/45.jpg)
HMC dynamics∂ F
∂ w
training sample
Learning
F= - log(p(x)) + K
![Page 46: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/46.jpg)
HMC dynamics∂ F
∂ w
training sample
Learning
F= - log(p(x)) + K
![Page 47: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/47.jpg)
HMC dynamics∂ F
∂ w
Learning
F= - log(p(x)) + K
![Page 48: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/48.jpg)
∂ F∂ w
weight update∂ F
∂ w
sample datapoint
w w− − ∂ F∂ w ∣sample
∂ F∂ w ∣data
Learning
F= - log(p(x)) + K
![Page 49: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/49.jpg)
∂ F∂ w
weight update∂ F
∂ w
sample datapoint
w w− − ∂ F∂ w ∣sample
∂ F∂ w ∣data
Learning
F= - log(p(x)) + K
![Page 50: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/50.jpg)
w w− − ∂ F∂ w ∣sample
∂ F∂ w ∣data
Start dynamics from previous point
datapoint
sample
Learning
F= - log(p(x)) + K
![Page 51: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/51.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 52: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/52.jpg)
Speech Recognition on TIMIT
<-235 ms-> Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
INPUT- frame: 25ms Hamming window - 10ms overlap between frames- 39 log-filterbank outputs + energy per frame- 15 frames - PCA whitening → 384 components- no deltas, no delta-deltas...
![Page 53: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/53.jpg)
Speech Recognition on TIMIT
<-235 ms->
gated MRF with 1024 precision and 512 mean unitstrained on independent windows
gMRF
![Page 54: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/54.jpg)
Speech Recognition on TIMIT
<-235 ms->
gMRF
gated MRF with 1024 precision and 512 mean unitstrained on independent windows
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 55: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/55.jpg)
Speech Recognition on TIMIT
gMRF
gated MRF with 1024 precision and 512 mean unitstrained on independent windows
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 56: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/56.jpg)
Speech Recognition on TIMIT
- For each training sample, infer latent variables of gated MRF - Train 2nd layer binary-binary RBM (2048 units)
<-235 ms->
gMRF
RBM
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 57: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/57.jpg)
Speech Recognition on TIMIT
- Train 1st layer gated MRF - Train 2nd layer binary-binary RBM (2048 units)- Train 3rd layer binary-binary RBM (2048 units)
<-235 ms->
gMRF
RBM
RBM
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 58: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/58.jpg)
Speech Recognition on TIMIT
8 layer (deep) model
<-235 ms->
gMRF
RBM
RBM
...
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 59: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/59.jpg)
Speech Recognition on TIMIT
<-235 ms->
gMRF
RBM
RBM
...
Supervised training: predict states of aligned HMM
8 layer
(deep
) m
od
el
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 60: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/60.jpg)
Speech Recognition on TIMIT
<-235 ms->
gMRF
RBM
RBM
...
Test: predict states of HMM → decoding (bigram language model)
8 layer
(deep)
model
Dahl, Ranzato, Mohamed, Hinton, NIPS 2010
![Page 61: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/61.jpg)
CRF 34.8% Large-Margin GMM 33.0% CD-HMM 27.3% Augmented CRF 26.6% RNN 26.1% Bayesian Triphone HMM 25.6% Triphone HMM discrim. trained 22.7% DBN with gated MRF 20.5%
METHOD PER
Speech Recognition on TIMIT
![Page 62: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/62.jpg)
CRF 34.8% 2008Large-Margin GMM 33.0% 2006CD-HMM 27.3% 2009Augmented CRF 26.6% 2009RNN 26.1% 1994 Bayesian Triphone HMM 25.6% 1998Triphone HMM discrim. trained 22.7% 2009DBN with gated MRF 20.5% 2010
METHOD PER
Speech Recognition on TIMITYear
![Page 63: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/63.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 64: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/64.jpg)
Generation natural image patches
Natural images
mcRBM
GRBM
S-RBM + DBNfrom Osindero and Hinton NIPS 2008
from Osindero and Hinton NIPS 2008
Ranzato and Hinton CVPR 2010
![Page 65: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/65.jpg)
From Patches to High-Resolution Images
Training by picking patches at random
![Page 66: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/66.jpg)
But we could also take them from a grid
This is not a good way to extend the model to big images: block artifacts
From Patches to High-Resolution Images
![Page 67: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/67.jpg)
IDEA: have one subset of filters applied to these locations,
From Patches to High-Resolution Images
![Page 68: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/68.jpg)
IDEA: have one subset of filters applied to these locations, another subset to these locations
From Patches to High-Resolution Images
![Page 69: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/69.jpg)
IDEA: have one subset of filters applied to these locations, another subset to these locations, etc.
Gregor LeCun arXiv 2010Ranzato, Mnih, Hinton NIPS 2010
Train jointly all parameters.
No block artifacts Reduced redundancy
From Patches to High-Resolution Images
![Page 70: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/70.jpg)
Sampling High-Resolution ImagesGaussian model marginal wavelet
from Simoncelli 2005
![Page 71: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/71.jpg)
Gaussian model marginal wavelet
from Simoncelli 2005
Pair-wise MRF FoE
from Schmidt, Gao, Roth CVPR 2010
Sampling High-Resolution Images
![Page 72: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/72.jpg)
Gaussian model marginal wavelet
from Simoncelli 2005
Pair-wise MRF FoE
from Schmidt, Gao, Roth CVPR 2010
Mean Covariance Model
Ranzato, Mnih, Hinton NIPS 2010
Sampling High-Resolution Images
![Page 73: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/73.jpg)
Gaussian model marginal wavelet
from Simoncelli 2005
Pair-wise MRF FoE
from Schmidt, Gao, Roth CVPR 2010
Mean Covariance Model
Ranzato, Mnih, Hinton NIPS 2010
Sampling High-Resolution Images
![Page 74: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/74.jpg)
from Simoncelli 2005
from Schmidt, Gao, Roth CVPR 2010
Gaussian model marginal wavelet
Pair-wise MRF FoE
Mean Covariance Model
Ranzato, Mnih, Hinton NIPS 2010
Sampling High-Resolution Images
![Page 75: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/75.jpg)
Making the model.. “DEEPER “
Treat these units as data to train a similar model on the top
SECOND STAGEField of binary RBM's. Each hidden unit has a receptive field of 30x30 pixels in input space.
![Page 76: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/76.jpg)
Sampling from the DEEPER model
- Sample from 2nd layer- project sample in image space using 1st layer p(x|h)
RBM RBM RBM RBM
gMRF
...
1st layer
2nd layer
E h1 , h2 =−h1 ' W h2
ph j2=1 ∣h1
= W j ' h 1b j
2
phk1=1 ∣h2
= W k h2bk
1
h 1
h2
![Page 77: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/77.jpg)
Samples from Deep Generative Model
1st stage model3rd stage model
![Page 78: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/78.jpg)
Samples from Deep Generative Model
1st stage model3rd stage model
![Page 79: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/79.jpg)
Samples from Deep Generative Model
1st stage model3rd stage model
![Page 80: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/80.jpg)
Samples from Deep Generative Model
1st stage model3rd stage model
![Page 81: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/81.jpg)
Sampling High-Resolution Images
from Simoncelli 2005from Schmidt, etal CVPR 2010
Gaussian model marginal waveletFoE
Deep - 1 layer
Ranzato, Mnih, Hinton NIPS 2010
from Simoncelli 2005
Deep - 3 layers
Ranzato, et al. CVPR 2011
![Page 82: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/82.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 83: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/83.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
anger
Ranzato, et al. CVPR 2011
![Page 84: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/84.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
disgust
![Page 85: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/85.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
fear
![Page 86: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/86.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
happiness
![Page 87: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/87.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
neutral
![Page 88: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/88.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
sadness
![Page 89: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/89.jpg)
Facial Expression Recognition
Toronto Face Dataset (J. Susskind et al. 2010) ~ 100K unlabeled faces from different sources ~ 4K labeled images Resolution: 48x48 pixels 7 facial expressions
surprise
![Page 90: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/90.jpg)
Facial Expression Recognition
- 1st layer using local (not shared) connectivity- layers above are fully connected- 5 layers in total
- Result
- Linear Classifier on raw pixels 71.5%
- Gaussian RBF SVM on raw pixels 76.2%
- Gabor + PCA + linear classifier 80.1% Dailey et al. J. Cog. Science 2002
- Sparse coding 74.6% Wright et al. PAMI 2008
- DEEP model (3 layers): 82.5%
![Page 91: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/91.jpg)
Facial Expression Recognition
- We can draw samples from the model (5th layer with 128 hiddens)
RBM RBM RBM RBM
gMRF
...
1st layer
5th layer
RBM
...
![Page 92: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/92.jpg)
Facial Expression Recognition
- We can draw samples from the model (5th layer with 128 hiddens)
![Page 93: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/93.jpg)
Facial Expression Recognition
RBM ...5th layer
gMRF
RBM
...
RBM
gMRF
RBM
...
... RBMRBM
- We can draw samples from the model (5th layer with 128 hiddens)
![Page 94: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/94.jpg)
Facial Expression Recognition
- We can draw samples from the model (5th layer with 128 hiddens)
![Page 95: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/95.jpg)
Facial Expression Recognition
- We can draw samples from the model (5th layer with 128 hiddens)
![Page 96: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/96.jpg)
Facial Expression Recognition
- 7 synthetic occlusions- use generative model to fill-in (conditional on the known pixels)
...
gMRF
RBM
RBM
RBM RBM
RBM
RBM
gMRF gMRF
RBM
RBM
RBM RBM
RBM
RBM
gMRF gMRF
RBM
RBM
RBM
![Page 97: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/97.jpg)
Facial Expression Recognition
originals
Type 1 occlusion: eyes
Restored images
![Page 98: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/98.jpg)
Facial Expression Recognition
originals
Type 2 occlusion: mouth
Restored images
![Page 99: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/99.jpg)
Facial Expression Recognition
originals
Type 3 occlusion: right half
Restored images
![Page 100: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/100.jpg)
Facial Expression Recognition
originals
Type 4 occlusion: bottom half
Restored images
![Page 101: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/101.jpg)
Facial Expression Recognition
originals
Type 5 occlusion: top half
Restored images
![Page 102: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/102.jpg)
Facial Expression Recognition
originals
Type 6 occlusion: nose
Restored images
![Page 103: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/103.jpg)
Facial Expression Recognition
originals
Type 7 occlusion: 70% of pixels at random
Restored images
![Page 104: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/104.jpg)
Facial Expression Recognition
Original
Input
![Page 105: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/105.jpg)
Facial Expression Recognition
Original
Input 1st layer
gMRF gMRF
![Page 106: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/106.jpg)
Facial Expression Recognition
Original
Input 1st layer
2nd layer
gMRF
RBM RBM
gMRF
![Page 107: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/107.jpg)
Facial Expression Recognition
Original
Input 1st layer
2nd layer
3rd
layer
gMRF
RBM RBM
gMRF
RBM RBM
![Page 108: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/108.jpg)
Facial Expression Recognition
Original
Input 1st layer
2nd layer
3rd
layer4th
layer
gMRF
RBM
RBM
RBM RBM
RBM
RBM
gMRF
![Page 109: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/109.jpg)
Facial Expression Recognition
Original Input 1st layer
2nd layer
3rd
layer4th
layer 10 times
..
.gMRF
RBM
RBM
RBM RBM
RBM
RBM
gMRF gMRF
RBM
RBM
RBM RBM
RBM
RBM
gMRF gMRF
RBM
RBM
RBM
![Page 110: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/110.jpg)
Facial Expression Recognition
CASE 1: original images for training, occluded for test
Ranzato, et al. CVPR 2011Wright, et al. PAMI 2008
Dailey, et al. J. Cog. Neuros. 2003
![Page 111: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/111.jpg)
Facial Expression Recognition
CASE 2: occluded images for both training and test
Wright, et al. PAMI 2008
Dailey, et al. J. Cog. Neuros. 2003
Ranzato, et al. CVPR 2011
![Page 112: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/112.jpg)
Outline
- mathematical formulation of the model
- training
- learning acoustic features for spech recognition
- generation of natural images
- recognition of facial expression under occlusion
- conclusion
![Page 113: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/113.jpg)
Summary
- Unsupervised Learning: regularize & learn representations
- Deep Generative Model
- 1st layer: gated MRF- Higher layers: binary RBM's
- fast inference
- Realistic generation: natural images - Applications:
- facial expression recognition robust to occlusion- speech recognition
![Page 114: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/114.jpg)
THANK YOU
![Page 115: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/115.jpg)
Learned Filters
v p v q
h jm
hkc
P
CC
W
N
M
F
![Page 116: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/116.jpg)
h jm
hkc
P
CC
W
N
M
Learned Filters
F
![Page 117: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/117.jpg)
Using -Energy to Score Images
>
>
>
less likelytest images
![Page 118: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/118.jpg)
Using Energy to Score Images
>
>
>
>
>
>
Upside-down images
![Page 119: Unsupervised Learning of Hierarchical Modelshinton/csc2535/notes/lecture_mPoT.pdf · Two Approaches to Unsupervised Learning Pros Cons 1st strategy (constrain code) 2nd strategy (maximum](https://reader034.fdocuments.us/reader034/viewer/2022043005/5f8d3bb701dcc959580d2a85/html5/thumbnails/119.jpg)
Using Energy to Score Images
Average of those images for which difference of energy is higher