Deep Learning Bkk 03 03

7/29/2019 Deep Learning Bkk 03 03

1/16

Stacking RBMs and Auto-encoders

for Deep ArchitecturesReferences:[Bengio, 2009], [Vincent et al., 2008]

2011/03/03


2/16

Introduction

Deep architectures for various levels of representations

Implicitly learn representations Layer-by-layer unsupervised training

Generative model

Stack Restricted Boltzmann Machines (RBMs) Forms a Deep belief network (DBN)

Discriminative model

Stack Auto-encoders (AEs)

Multi-layered classifier


3/16

Generative Model

Given a training set {xi}n,

Construct a generative model that produces samples of thesame distribution

Start with sigmoid belief networks

Need parameters for eachcomponent of the top-most layer:i.e. Bernoulli priors


4/16

Deep Belief Network

Same as sigmoid BN, but with different top-layer structure

Use RBM to model the top layer

Restricted Boltzmann Machine: (More on next slide)

Divided into hidden and visible layers (2 levels)

Connection forms a bipartite graph

Called Restricted because noconnection among same-layer units


5/16

Restricted Boltzmann Machines

Energy-based model for hidden-visible joint distribution

Or express as a distribution of the visible variable:


6/16

RBMs (Contd)

How posteriors factorize: notice how the energy is of the form

Then,


7/16

More on Posteriors

Using the same factorization trick, we can compute the posterior:

Posterior on visible units can be derived similarly

Due to factorization, Gibbs sampling is easy:

This is just the sigmoid functionfor binomial h


8/16

Training RBMs

Given parameters ={W, b, c}

Compute log-likelihood gradient for steepest ascent method

The first term is OK, but the second term is intractable, due topartition function

Use k-step Gibbs sampling to approximately sample forsecond term

k=1 performs well empirically

),~()~,(

),()|(

)(log hxExhp

hxExhp

xp

hh


9/16

Training DBNs

Every time we see a sample x, we lower the energy of the

distribution at that point Start from the bottom layer and move up and train unsupervised

Each layer has its own set of parameters

*Q(.) is the RBM

posterior for thehidden variables


10/16

How to sample from DBNs

1. Sample a visible hl-1 from the top-level RBM (using Gibbs)

2. For k = l 1 to 1Sample hk-1 ~ P(. | hk) from the DBN model

3. x = h0 is the final sample


11/16

Discriminative Model

Receive input xto classify

Unlike DBNs, which didnt have inputs

Multi-layer neural network should do

Use auto-encoders to discover compact representations

Use denoising AEs to add robustness to corruption


12/16

Auto-encoders

A neural network where Input = Output

Hence its name auto But has one hidden layer for input representation

y

z

d-dimensional

d'-dimensional(lower dimensionalrepresentation -d < d is necessary

to avoid learningidentity function)

x


13/16

AE Mechanism

Parameterize each layer with parameter ={W, b}

Aim to reconstruct the input by minimizing reconstruction error

where,

Can train in an unsupervised way

for any x in training set, train AE to reconstruct x

)''()(

)()(

bxWsxgz

bWxsxfy

2),( zxzxL


14/16

Denoising Auto-encoders

Also need to be robust to missing data

Same structure as regular AE But train against corrupted inputs

Arbitrarily remove a fixed portion of input component

Rationale: Latent structure learning is important for re-building

missing data The hidden layer will learn the structure representation


15/16

Training Stacked DAEs

Stack the DAEs to form a deep architecture

Take each DAEs hidden layer This hidden layer becomes the next layer

Training is simple. Given training set {(xi, yi)},

Initialize each layer (sequentially) in an unsupervised fashion

Each layers output is fed as inputs to the next layer

Finally tune the entire architecture with supervised learningusing training set


16/16

References

[Bengio, 2009] Yoshua Bengio. Learning deep architectures for AI.

Foundations and Trends in Machine Learning. Vol. 2, No. 1, 2009.

[Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, YoshuaBengio, and Pierre-Antoine Manzagol. Extracting and composingrobust features with denoising autoencoders. In proceedings of

ICML 2008.

Deep Learning Bkk 03 03

Documents

Transcript of Deep Learning Bkk 03 03