Deep Learning

39
Deep Learning

description

Deep Learning. Why?. Source: Huang et al., Communications ACM 01/2014. - PowerPoint PPT Presentation

Transcript of Deep Learning

Page 1: Deep  Learning

Deep Learning

Page 2: Deep  Learning

Why?

Page 3: Deep  Learning

1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012

1.0%

10.0%

100.0%

Speech recognition

Read Conversational

Year

Wor

d er

ror r

ate

Source: Huang et al., Communications ACM 01/2014

Page 4: Deep  Learning

ISI OXFORD_VGG XRCE/INRIA University of Amsterdam

LEAR-XRCE SuperVision0%

5%

10%

15%

20%

25%

30%

35%

Large Scale Visual Recognition Challenge 2012Er

ror r

ate

Page 5: Deep  Learning
Page 6: Deep  Learning

the 2013 International Conference on Learning Representations, the 2013 ICASSP’s special session on New

Types of Deep Neural Network Learning for Speech Recognition and Related Applications, the 2013 ICML

Workshop for Audio, Speech, and Language Processing, the 2012, 2011, and 2010 NIPS Workshops on Deep Learning and

Unsupervised Feature Learning, 2013 ICML Workshop on Representation Learning Challenges, 2013 Intern. Conf. on

Learning Representations, 2012 ICML Workshop on Representation Learning, 2011 ICML Workshop on Learning Architectures, Representations, and Optimization for Speech and Visual Information Processing, 2009 ICML Workshop on Learning Feature Hierarchies, 2009 NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2012 ICASSP deep learning tutorial, the special section on Deep Learning for Speech and Language Processing in IEEE

Trans. Audio, Speech, and Language Processing (January 2012), the special issue on Learning Deep Architectures in

IEEE Trans.. Pattern Analysis and Machine Intelligence (2013)

Page 7: Deep  Learning

”A fast learning algorithm fordeep belief nets”

-- Hinton et al., 2006

”Reducing the dimensionality of data with neural networks”-- Hinton & Salakhutdinov

Geoffrey HintonUniversity of Toronto

Page 8: Deep  Learning

How?

Page 9: Deep  Learning

Shallow learning• SVM• Linear & Kernel Regression• Hidden Markov Models (HMM)• Gaussian Mixture Models (GMM)• Single hidden layer MLP• ...

Limited modeling capability of conceptsCannot make use of unlabeled data

Page 10: Deep  Learning

Neuronal Networks• Machine Learning

• Knowledge from high dimensional data

• Classification• Input: features of data• supervised vs unsupervised• labeled data

• Neurons

Page 11: Deep  Learning

hidden

output

inputi

j

k

vij

[ X1 , X2 , X3 ] zea

1

1

00

1

z

a

[ Y1 , Y2 ]

wjk

• Multiple Layers• Feed Forward• Connected Weights• 1-of-N Output

iji

iwxz j

Multi Layer Perceptron

Page 12: Deep  Learning

i

j

k

vij

wjk

• Minimize error of calculated output

• Adjust weights• Gradient Descent

• Procedure• Forward Phase• Backpropagation

of errors

• For each sample, multiple epochs

Backpropagation

Page 13: Deep  Learning

Best Practice• Normalization

• Prevent very high weights, Oscillation

• Overfitting/Generalisation• Validation Set, Early Stopping

• Mini-Batch Learning• update weights with multiple

input vectors combined

Page 14: Deep  Learning

Problems with Backpropagation• Multiple hidden Layers• Get stuck in local optima

• start weights from random positions

• Slow convergence to optimum• large training set needed

• Only use labeled data• most data is unlabeled

Generative Approach

Page 15: Deep  Learning

Restricted Boltzmann Machines

hidden

i

j

visible

visiijiwv

j

e

hp )(1

1)( 1

• Unsupervised• Find complex regularities in

training data

• Bipartite Graph• visible, hidden layer

• Binary stochastic units• On/Off with probability

• 1 Iteration• Update Hidden Units• Reconstruct Visible Units

• Maximum Likelihood of training data

wij

Page 16: Deep  Learning

Restricted Boltzmann Machines• Training Goal:

Best probable reproduction• unsupervised data

• find latent factors of data set

• Adjust weights to get maximum probability of input data

hidden

i

j

visible

visiijiwv

j

e

hp )(1

1)( 1

wij

Page 17: Deep  Learning

Training: Contrastive Divergence

t = 0 t = 1 reconstructiondata

i

j

i

j • Start with a training vector on the visible units.

• Update all the hidden units in parallel.

• Update the all the visible units in parallel to get a “reconstruction”.

• Update the hidden units again.

Page 18: Deep  Learning

Example: Handwritten 2s

50 binary neurons that learn features

16 x 16 pixel image

Increment weights between an active pixel and an active feature

Decrement weights between an active pixel and an active feature

data (reality)

reconstruction

50 binary neurons that learn features

16 x 16 pixel image

Page 19: Deep  Learning
Page 20: Deep  Learning
Page 21: Deep  Learning
Page 22: Deep  Learning
Page 23: Deep  Learning
Page 24: Deep  Learning

The final 50 x 256 weights: Each unit grabs a different feature

Page 25: Deep  Learning

Reconstruction from activated binary featuresData Data

New test image from the digit class that the model was trained on

Image from an unfamiliar digit class The network tries to see every image as a 2.

Reconstruction from activated binary features

Example: Reconstruction

Page 26: Deep  Learning

Deep Architecture• Backpropagation, RBM as building blocks• Multiple hidden layers• Motivation (why go deep?)

• Approximate complex decision boundary• Fewer computational units for

same functional mapping• Hierarchical Learning

• Increasingly complex features• work well in different domains

• Vision, Audio, …

Page 27: Deep  Learning

Hierarchical Learning• Natural progression from

low level to high level structure as seen in natural complexity

• Easier to monitor what is being learnt and to guide the machine to better subspaces

Page 28: Deep  Learning

Stacked RBMs

1W

2W2h

1h

1h

v

1W

2W

2h

1h

v

copy binary state for each v

Compose the two RBM models to make a single DBN model

Train this RBM first

Then train this RBM

• First learn one layer at a time by stacking RBMs.

• Treat this as “pre-training” that finds a good initial set of weights which can then be fine-tuned by a local search procedure.

• Backpropagation can be used to fine-tune the model to be better at discrimination.

out

Page 29: Deep  Learning

UsesDimensionality reduction

Page 30: Deep  Learning

Dimensionality reduction

• Use a stacked RBM as deep auto-encoder

1. Train RBM with images as input & output2. Limit one layer to few dimensions

Information has to pass through middle layer

Page 31: Deep  Learning

Original

Deep RBN

PCA

Dimensionality reduction

Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)

Page 32: Deep  Learning

Dimensionality reduction

PCA

Deep RBN

804’414 Reuters news stories, reduction to 2 dimensions

Page 33: Deep  Learning

UsesClassification

Page 34: Deep  Learning

Unlabeled dataUnlabeled data is readily available

Example: Images from the web

1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN

70% better than previous state of the art

Building High-level Features Using Large Scale Unsupervised LearningQuoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng

Page 35: Deep  Learning

UsesAI

Page 36: Deep  Learning

Artificial intelligence

Enduro, Atari 2600

Expert player: 368 points Deep Learning: 661 pointsPlaying Atari with Deep Reinforcement LearningVolodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

Page 37: Deep  Learning

UsesGenerative (Demo)

Page 38: Deep  Learning

How to use it

Page 39: Deep  Learning

How to use it• Home page of Geoffrey Hinton

https://www.cs.toronto.edu/~hinton/

• Portalhttp://deeplearning.net/

• Accord.NEThttp://accord-framework.net/