Deep Belief Networks Learning Advanced...

Advanced Machine Learning / Deep Belief NetworksDaniel Ulbricht

Agenda

● Short History in Machine Learning● What's Deep Learning?● Derive Learning in General● Energy Based Models / Restricted

Boltzmann Machines● Bring things together

● You will Implement a core algorithm to apply learning

History: 1th wave

"The Perceptron"Frank Rosenblatt explained the Perceptron in the year 1958

● Simple problems like XOR could be solved

History: 2nd wave

"Backpropagation"

Developed by Paul Werbos in 1974● Complex non-linear problems could be

solved

History: 3rd wave

"Deep Belief Networks"

The "magic" behind will be explained in this talk

Developed mainly 2006 by Geoff Hinton

History: Deep Learning

● An automatic way to learn representations (descriptors) from given data

● Attempt to learn multiple levels of representations on increasing complexity

Lent from Andrew Ng

History: Deep Learning

● BackPropagation is already an attempt to perform Deep Learning

● But there are some problems○ Gradient is progressively getting diluted○ Initialization of weights○ How to label all the given data

Decreasing update strength

Machine Learning in General

Output

Goal:Find weights W to maximize the probability of a certain outputs given some input vectors

Maximize:

Weights

Machine Learning in General

Learning can be performed using:● Gradient Ascent on: log P● Gradient Descent on: - log P

From the optimization theory we know many downhill optimization algorithms● (Stochastic) Gradient Descent● Conjugant Gradient● Dog leg

Max the Log likelihood of System

Log likelihood of Data

Average log likelihood per pattern log likelihood of Normalization Term

Rules for Gradient Computation

Sum Rule:

Product Rule:

Gradient of first part

Gradient of first part:

Sum over posterior ;-)

Rule 1

Sum Rule

Reorder

Product Rule

Rule 2

Gradient of second part

Gradient of second part:

Sum over joint

Full Gradient

Full Gradient:

Two averages around the same term

Therefore we can write:

Hebbian /Positive phase

Anti - Hebbian /Negative phase

Gradient in Sigmoid Belief Nets

Apply this knowledge on normal Sigmoid Nets● Used in backpropagation

Joint is automatically normalized:

This leads the second gradient term to be zero:

Full gradient:

The well known: Delta Rule

Energy Based Models (EBM)

Energy Based Probabilistic Models define a probability distribution as follows:

High Probability -> Low EnergyLow Probability -> High Energy-> Minimize the energy

Partition function (Normalization term)

Energy Based Models with Hidden Units

In reality with can't observe the full state of our data and/or we are not aware of indirect influences -> Therefore we add Hidden Units to to increase the expression power of the model.

Restricted Boltzmann Machine

Fancy name for a simple bidirectional graph:● No connection inside the same layer● No loops● Energy function is used to perform

transitions

Alternate Gibbs Sampling

Computing the average of the posterior and the joint is very expensive

To overcome this Gibbs Sampling is used

Gibbs Sampling inside Energy Based Models leads to the simple sigmoid function

Poof is easy to do but would take too longUse the normal Gibbs algorithm and put in the Energy term for the distribution. You can find it on my webpage

Alternate Gibbs Sampling:

● Sample Up (visible to hidden)● Sample Down (Hidden to visible)● continue...

● Running it infinite iterations would give exact gradient (~ Monte Carlo Markov Chain)

● Surprisingly even a single iteration works very well in practice○ Geoff Hinton tried this in 2006 and recognized that

the system converges well even with a single iteration

○ Called Contrastive Divergence

Alternate Gibbs Sampling:

● Start with training vector● Sample Up (visible to hidden)● Sample Down (Hidden to visible)● Sample Up

Hebbian Anti- Hebbian

Bring things together

For simplification we use from now on binary input and output units● Terms get much easier to compute● Its also the common way using in practical

applications

● Hebbian Part (Up Step):Sum over all visible units multiplied with the according hidden unit weight

Sigmoid function - We can use it due to the fact that gibbs sampling inside an EBM is sigmoid

Make the output Stochastic.Simplifies the next steps

bias for hidden unit

● Hebbian Part (Down Step):

Same as Up Step only using a different bias

bias for visible unit

● Anti - Hebbian Part (Up Step):

Instead of using the reconstructed output now the probability is used.

● Full Gradient:

Don't forget the bias:

Average over posterior Average over joint

So Far We Have

● The knowledge to train a Restricted Boltzmann Machine

● No need for labels -> Our labels are the equilibrium level of the Energy Function

Open Question:● How to perform Deep Learning without

the factorial behaviour

Stacking RBM's

To perform Deep Learning we stack multiple RBM's but learn them layer per layer

Stacking RBM's

Hidden

Stacking RBM's

Hidden

W1 <- Fixed (We don't update anymore)

Hidden

Now we have

● A network which learns○ Without labels

■ Labels are the equilibrium level of energy term○ Every layer learns a significant amount

■ Due to be independent from every other layer

Get hands on:

Download the example matlab/octave files from my homepage

You will recognize that calling runRBM● will do nothing so far● It misses the implementation of

"Contrastive Divergence"● Try to implement the "Contrastive

Divergence"○ Solution can be found also on my homepage

Thank you for Listening

Deep Belief Networks Learning Advanced...

Documents

Transcript of Deep Belief Networks Learning Advanced...

Unsupervised feature learning for audio classification using convolutional deep belief networks

Convolutional Deep Belief Networks for Scalable ...cv.znu.ac.ir/afsharchim/lectures/ML/CNNExample.pdf · Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical

Lecture 13 - Deep Belief Networks · 2012. 12. 12. · Lecture 13 Deep Belief Networks Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen IBM T.J. Watson Research Center Yorktown

Learning to Play Jazz with Deep Belief Networks - Conference Version

Anomaly Detection with Deep Belief Networks · 3.2 Deep Belief Networks Deep Belief Networks (DBN) are models composed of multiple layers of RBMs stacked on top of one another, with

CS 678 – Deep Learning1 Deep Learning Early Work Why Deep Learning Stacked Auto Encoders Deep Belief Networks.

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Belief Networks

Deep Belief Networks for Real-Time Extraction of Tongue ...jjberry/ICPR.pdf · Deep Belief Networks for Real-Time Extraction of Tongue Contours from Ultrasound During Speech Ian Fasel

Deep Belief Networks

LEARNING FEATURES FROM MUSIC AUDIO WITH DEEP BELIEF NETWORKSmusicweb.ucsd.edu/~sdubnov/Mu270d/DeepLearning/... · LEARNING FEATURES FROM MUSIC AUDIO WITH DEEP BELIEF NETWORKS Philippe

Low-Energy Deep Belief Networks Using Intrinsic Sigmoidal ...cal.ucf.edu/journal/GLSVLSI.pdf · Low-Energy Deep Belief Networks Using Intrinsic Sigmoidal Spintronic-based Probabilistic

Preserving differential privacy in convolutional deep ...ix.cs.uoregon.edu/~dou/research/papers/MLJ2017.pdfPreserving differential privacy in convolutional deep belief networks ...

Emotion Detection using Deep Belief Networksmeeden/cs81/s14/papers/KevinVince… · Emotion Detection using Deep Belief Networks ... possible to e ciently train arti cial neural networks

Belief Networks Sparse Feature Learning for Deep · R250 Presentation By: Vikash Singh March 8, 2019 . Brief Autoencoder Review. Deep Belief Networks Repeating units of visible/hidden

Multiresolution Deep Belief Networksproceedings.mlr.press/v22/tang12/tang12.pdfMultiresolution Deep Belief Networks 2 Gaussian-Binary Restricted Boltzmann Machine Before describing

Convolutional*Deep*Belief*Networks*for*Scalable ...swadhin/reading_group/... · Convolutional*Deep*Belief*Networks*for*Scalable*Unsupervised*Learning*of*Hierarchical*Representations

Artificial Intelligence - UCSByuxiangw/classes/CS165A-2019...Belief Networks (a.k.a. Bayesian Networks) a.k.a. Probabilistic networks, Belief nets, Bayes nets, etc. •Belief network

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …blei/papers/ChenPolatkanSapiroBleiDunsonCarin2013.pdfworks [1], convolutional networks [2], deep belief networks (DBNs) [3], hierarchies

Convolutional Deep Belief Networks for Scalable ... · •Convolutional Restricted Boltzmann Machine –Probabilistic max-pooling •Convolutional Deep Belief Networks –Scalable

ConvolutionalDeepBeliefNetworksforScalable ...swadhin/reading_group/... · ConvolutionalDeepBeliefNetworksforScalableUnsupervisedLearningofHierarchical*Representations