Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua...
Transcript of Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua...
![Page 1: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/1.jpg)
Deep Learning tutorial
Nicolas Le Roux
Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng
DL/UFL workshop
16/12/11
Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng (DL/UFL workshop)Deep Learning tutorial 16/12/11 1 / 31
![Page 2: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/2.jpg)
Disclaimer
A lot of the slides of this tutorial have been taken from
presentations from:
Honglak Lee
Yann LeCun
Geoff Hinton
Andrew Ng
Yoshua BengioNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 2 / 31
![Page 3: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/3.jpg)
Outline1 Introduction
What is deep?
Training problems2 Unsupervised pretraining
Pretraining
The denoising autoencoder
Predictive sparse coding
The deep belief network3 ConclusionNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 3 / 31
![Page 4: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/4.jpg)
What does deep mean?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 4 / 31
![Page 5: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/5.jpg)
What does deep mean?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 4 / 31
![Page 6: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/6.jpg)
What does deep mean?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 4 / 31
![Page 7: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/7.jpg)
Properties of deep models
Many non-linearities = One non-linearity
More constrained space of transformations
Deep model = prior on the type of transformations
May need fewer computational units for the same function
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 5 / 31
![Page 8: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/8.jpg)
Properties of deep models
Many non-linearities = One non-linearity
More constrained space of transformations
Deep model = prior on the type of transformations
May need fewer computational units for the same function
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 5 / 31
![Page 9: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/9.jpg)
Properties of deep models
Many non-linearities = One non-linearity
More constrained space of transformations
Deep model = prior on the type of transformations
May need fewer computational units for the same function
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 5 / 31
![Page 10: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/10.jpg)
Features for face recognition
Image courtesy of Honglak Lee
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 6 / 31
![Page 11: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/11.jpg)
Features for face recognition
Image courtesy of Honglak Lee
“First-level” features are not
task-dependent
“First-level” features are easy to
learn
“Higher-level” features are
easier to learn given the
low-level features
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 6 / 31
![Page 12: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/12.jpg)
A deep neural network
H1 = g(W1X)
H2 = g(W2H1)
Y = W3H2
g = sigmoid or tanh
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 7 / 31
![Page 13: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/13.jpg)
A deep neural network
H1 = g(W1X)
H2 = g(W2H1)
Y = W3H2
g = sigmoid or tanh
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 7 / 31
![Page 14: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/14.jpg)
A deep neural network
H1 = g(W1X)
H2 = g(W2H1)
Y = W3H2
g = sigmoid or tanh
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 7 / 31
![Page 15: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/15.jpg)
A deep neural network
H1 = g(W1X)
H2 = g(W2H1)
Y = W3H2
g = sigmoid or tanh
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 7 / 31
![Page 16: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/16.jpg)
Training issues
W3 is a lot easier to learn than W2 and W1
Compare
Given set of features, find combination
Given combination, find set of features
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 8 / 31
![Page 17: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/17.jpg)
Training issues
W3 is a lot easier to learn than W2 and W1
Compare
Given set of features, find combination
Given combination, find set of features
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 9 / 31
![Page 18: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/18.jpg)
Practical results
Parameters randomly initialized
They do not change much (bad local minima)
A random transformation is not very good
Bengio et al., 2007
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 10 / 31
![Page 19: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/19.jpg)
Practical results
Parameters randomly initialized
They do not change much (bad local minima)
A random transformation is not very good
Bengio et al., 2007
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 10 / 31
![Page 20: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/20.jpg)
Practical results - TIMIT
Mohamed et al., Deep Belief Networks for phone recognition
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 11 / 31
![Page 21: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/21.jpg)
Outline1 Introduction
What is deep?
Training problems2 Unsupervised pretraining
Pretraining
The denoising autoencoder
Predictive sparse coding
The deep belief network3 ConclusionNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 12 / 31
![Page 22: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/22.jpg)
The gist of pretraining
1 We want (potentially) deep networks
2 Training the lower layers is hard.
Solution: optimize each layer without relying on the layersabove.
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 13 / 31
![Page 23: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/23.jpg)
The gist of pretraining
1 We want (potentially) deep networks
2 Training the lower layers is hard.
Solution: optimize each layer without relying on the layersabove.
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 13 / 31
![Page 24: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/24.jpg)
The gist of pretraining
1 Learn W1 first
2 Given W1, learn W2
3 Given W1 and W2, learn W3
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 14 / 31
![Page 25: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/25.jpg)
The gist of pretraining
1 Learn W1 first
2 Given W1, learn W2
3 Given W1 and W2, learn W3
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 14 / 31
![Page 26: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/26.jpg)
The gist of pretraining
1 Learn W1 first
2 Given W1, learn W2
3 Given W1 and W2, learn W3
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 14 / 31
![Page 27: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/27.jpg)
The gist of pretraining
1 Learn W1 first
2 Given W1, learn W2
3 Given W1 and W2, learn W3
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 14 / 31
![Page 28: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/28.jpg)
The gist of pretraining
1 Learn W1 first
2 Given W1, learn W2
3 Given W1 and W2, learn W3
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 14 / 31
![Page 29: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/29.jpg)
Greedy learning
1 Train each layer in a supervised way
2 Use alternate cost hoping that it will be useful.
Alternate cost: modeling the data well.
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 15 / 31
![Page 30: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/30.jpg)
Greedy learning
1 Train each layer in a supervised way
2 Use alternate cost hoping that it will be useful.
Alternate cost: modeling the data well.
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 15 / 31
![Page 31: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/31.jpg)
Denoising autoencoders
Vincent et al, 2008
Corrupt the input (e.g. set 25% of inputs to 0)
Reconstruct the uncorrupted input
Use uncorrupted encoding as input to next level
A representation is good if it works well with noisy data
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 16 / 31
![Page 32: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/32.jpg)
Denoising autoencoders
Vincent et al, 2008
Corrupt the input (e.g. set 25% of inputs to 0)
Reconstruct the uncorrupted input
Use uncorrupted encoding as input to next levelA representation is good if it works well with noisy dataNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 16 / 31
![Page 33: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/33.jpg)
DAE - A graphic explanation
Projects a “corrupted point” onto its position on the manifold
Two corrupted versions of the same point have the same representation
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 17 / 31
![Page 34: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/34.jpg)
Stacking DAEs
1 Start with the lowest level and stack upwards
2 Train each layer on the intermediate code (features) from the layer below
3 Top layer can have a different output (e.g., softmax non-linearity) to provide an output for
classification
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 18 / 31
![Page 35: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/35.jpg)
Predictive sparse coding
Objective function for sparse coding:
Z (X ) = argminZ12‖X − DZ‖2
2 + λ‖Z‖1
Z must:
Be sparse
Reconstruct X well
Be well predicted by a feedforward model
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 19 / 31
![Page 36: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/36.jpg)
Predictive sparse coding
Objective function for sparse coding:
Z (X ) = argminZ12‖X − DZ‖2
2 + λ‖Z‖1 + α‖Z − F (X , θ)‖22
F (X , θ) = Gtanh(WeX )
Z must:
Be sparse
Reconstruct X well
Be well predicted by a feedforward modelNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 19 / 31
![Page 37: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/37.jpg)
Stacking PSDs
Z (X ) = argminZ12‖X − DZ‖2
2 + λ‖Z‖1 + α‖Z − F1(X , θ)‖22
F1(X , θ1) = Gtanh(W 1e X )
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 38: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/38.jpg)
Stacking PSDs
Z (X ) = argminZ12‖X − DZ‖2
2 + λ‖Z‖1 + α‖Z − F1(X , θ)‖22
F1(X , θ1) = Gtanh(W 1e X )
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 39: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/39.jpg)
Stacking PSDs
Z (H1) = argminZ12‖H1 − DZ‖2
2 + λ‖Z‖1 + α‖Z − F2(H1, θ)‖22
F2(H1, θ2) = Gtanh(W 2e H1)
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 40: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/40.jpg)
Stacking PSDs
Z (H1) = argminZ12‖H1 − DZ‖2
2 + λ‖Z‖1 + α‖Z − F2(H1, θ)‖22
F2(H1, θ2) = Gtanh(W 2e H1)
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 41: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/41.jpg)
Stacking PSDs
Z (H1) = argminZ12‖H1 − DZ‖2
2 + λ‖Z‖1 + α‖Z − F2(H1, θ)‖22
F2(H1, θ2) = Gtanh(W 2e H1)
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 42: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/42.jpg)
Stacking PSDs
Z (H1) = argminZ12‖H1 − DZ‖2
2 + λ‖Z‖1 + α‖Z − F2(H1, θ)‖22
F2(H1, θ2) = Gtanh(W 2e H1)
1 Train the first layer using PSD on X
2 Use H1 = |F1(X , θ)| as features
3 Train the second layer using PSD on H1
4 Use H2 = |F2(H1, θ)| as features
5 Keep going...
6 Use H37 as input to your final classifier
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 20 / 31
![Page 43: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/43.jpg)
The mothership: the DBN
Unsupervised pretraining works by reconstructing the data
What if we tried to have a full generative model of the data?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 21 / 31
![Page 44: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/44.jpg)
The mothership: the DBN
Unsupervised pretraining works by reconstructing the data
What if we tried to have a full generative model of the data?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 21 / 31
![Page 45: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/45.jpg)
Restricted Boltzmann Machine (RBM)
E(x,h) = −∑
ij
Wijxihj = −x>Wh
P(x,h) =exp
[x>Wh
]∑x0,h0
exp[x>
0 Wh0]
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 22 / 31
![Page 46: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/46.jpg)
Type of variables in an RBM
P(x,h) =exp
[x>Wh
]∑x0,h0
exp[x>
0 Wh0]
This formulation is correct when x and h are binary
variables
You have no choice over the type of x (this is your data)
You can choose what type h is
There are other formulations when x is not binary
They typically do not work that well (with notable
exceptions)
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 23 / 31
![Page 47: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/47.jpg)
Training an RBM
P(x,h) =exp
[x>Wh
]∑x0,h0
exp[x>
0 Wh0]
We have a joint probability of data and “features”
Maximizing P(x) will give us meaningful features
There are efficient ways to do that (check my webpage)
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 24 / 31
![Page 48: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/48.jpg)
Stacking RBMs
Learn the first layer by maximizing P(x) =∑
h1 P(x,h1)
Once trained, the features of x are EP [h1|x]
Learn the second layer by maximizing
P(h1) =∑
h2 P(h1,h2)
Keep going...Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 25 / 31
![Page 49: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/49.jpg)
Results - 1
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 26 / 31
![Page 50: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/50.jpg)
Results - 2
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 27 / 31
![Page 51: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/51.jpg)
Outline1 Introduction
What is deep?
Training problems2 Unsupervised pretraining
Pretraining
The denoising autoencoder
Predictive sparse coding
The deep belief network3 ConclusionNicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 28 / 31
![Page 52: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/52.jpg)
Take-home messages - Deep learning
Building a hierarchy of features can be beneficial
Training all layers at once is very hard
Pretraining alleviates the problem of bad local minima
Regularizes the model
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 29 / 31
![Page 53: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/53.jpg)
Take-home messages - Pretraining
Pretraining requires building blocks
ALL pretraining methods share the same idea: model the
input while having somewhat invariant features
By no means is this necessarily the best method
And now for some applications...
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 30 / 31
![Page 54: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/54.jpg)
Take-home messages - Pretraining
Pretraining requires building blocks
ALL pretraining methods share the same idea: model the
input while having somewhat invariant features
By no means is this necessarily the best method
And now for some applications...
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 30 / 31
![Page 55: Nicolas Le Roux - WordPress.com › ...Deep Learning tutorial Nicolas Le Roux Adam Coates, Yoshua Bengio, Yann LeCun, Andrew Ng DL/UFL workshop 16/12/11 Nicolas Le Roux Adam Coates,](https://reader034.fdocuments.us/reader034/viewer/2022052613/5f14a8e4ed0a62113619e60f/html5/thumbnails/55.jpg)
This is the end...
Questions?
Nicolas Le Roux (DL/UFL workshop) Deep Learning tutorial 16/12/11 31 / 31