Deep Learning Bkk 03 03
-
Upload
richard-floyd -
Category
Documents
-
view
216 -
download
0
Transcript of Deep Learning Bkk 03 03
-
7/29/2019 Deep Learning Bkk 03 03
1/16
Stacking RBMs and Auto-encoders
for Deep ArchitecturesReferences:[Bengio, 2009], [Vincent et al., 2008]
2011/03/03
-
7/29/2019 Deep Learning Bkk 03 03
2/16
Introduction
Deep architectures for various levels of representations
Implicitly learn representations Layer-by-layer unsupervised training
Generative model
Stack Restricted Boltzmann Machines (RBMs) Forms a Deep belief network (DBN)
Discriminative model
Stack Auto-encoders (AEs)
Multi-layered classifier
-
7/29/2019 Deep Learning Bkk 03 03
3/16
Generative Model
Given a training set {xi}n,
Construct a generative model that produces samples of thesame distribution
Start with sigmoid belief networks
Need parameters for eachcomponent of the top-most layer:i.e. Bernoulli priors
-
7/29/2019 Deep Learning Bkk 03 03
4/16
Deep Belief Network
Same as sigmoid BN, but with different top-layer structure
Use RBM to model the top layer
Restricted Boltzmann Machine: (More on next slide)
Divided into hidden and visible layers (2 levels)
Connection forms a bipartite graph
Called Restricted because noconnection among same-layer units
-
7/29/2019 Deep Learning Bkk 03 03
5/16
Restricted Boltzmann Machines
Energy-based model for hidden-visible joint distribution
Or express as a distribution of the visible variable:
-
7/29/2019 Deep Learning Bkk 03 03
6/16
RBMs (Contd)
How posteriors factorize: notice how the energy is of the form
Then,
-
7/29/2019 Deep Learning Bkk 03 03
7/16
More on Posteriors
Using the same factorization trick, we can compute the posterior:
Posterior on visible units can be derived similarly
Due to factorization, Gibbs sampling is easy:
This is just the sigmoid functionfor binomial h
-
7/29/2019 Deep Learning Bkk 03 03
8/16
Training RBMs
Given parameters ={W, b, c}
Compute log-likelihood gradient for steepest ascent method
The first term is OK, but the second term is intractable, due topartition function
Use k-step Gibbs sampling to approximately sample forsecond term
k=1 performs well empirically
),~()~,(
),()|(
)(log hxExhp
hxExhp
xp
hh
-
7/29/2019 Deep Learning Bkk 03 03
9/16
Training DBNs
Every time we see a sample x, we lower the energy of the
distribution at that point Start from the bottom layer and move up and train unsupervised
Each layer has its own set of parameters
*Q(.) is the RBM
posterior for thehidden variables
-
7/29/2019 Deep Learning Bkk 03 03
10/16
How to sample from DBNs
1. Sample a visible hl-1 from the top-level RBM (using Gibbs)
2. For k = l 1 to 1Sample hk-1 ~ P(. | hk) from the DBN model
3. x = h0 is the final sample
-
7/29/2019 Deep Learning Bkk 03 03
11/16
Discriminative Model
Receive input xto classify
Unlike DBNs, which didnt have inputs
Multi-layer neural network should do
Use auto-encoders to discover compact representations
Use denoising AEs to add robustness to corruption
-
7/29/2019 Deep Learning Bkk 03 03
12/16
Auto-encoders
A neural network where Input = Output
Hence its name auto But has one hidden layer for input representation
y
z
d-dimensional
d'-dimensional(lower dimensionalrepresentation -d < d is necessary
to avoid learningidentity function)
x
-
7/29/2019 Deep Learning Bkk 03 03
13/16
AE Mechanism
Parameterize each layer with parameter ={W, b}
Aim to reconstruct the input by minimizing reconstruction error
where,
Can train in an unsupervised way
for any x in training set, train AE to reconstruct x
)''()(
)()(
bxWsxgz
bWxsxfy
2),( zxzxL
-
7/29/2019 Deep Learning Bkk 03 03
14/16
Denoising Auto-encoders
Also need to be robust to missing data
Same structure as regular AE But train against corrupted inputs
Arbitrarily remove a fixed portion of input component
Rationale: Latent structure learning is important for re-building
missing data The hidden layer will learn the structure representation
-
7/29/2019 Deep Learning Bkk 03 03
15/16
Training Stacked DAEs
Stack the DAEs to form a deep architecture
Take each DAEs hidden layer This hidden layer becomes the next layer
Training is simple. Given training set {(xi, yi)},
Initialize each layer (sequentially) in an unsupervised fashion
Each layers output is fed as inputs to the next layer
Finally tune the entire architecture with supervised learningusing training set
-
7/29/2019 Deep Learning Bkk 03 03
16/16
References
[Bengio, 2009] Yoshua Bengio. Learning deep architectures for AI.
Foundations and Trends in Machine Learning. Vol. 2, No. 1, 2009.
[Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, YoshuaBengio, and Pierre-Antoine Manzagol. Extracting and composingrobust features with denoising autoencoders. In proceedings of
ICML 2008.