Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209...

54
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome [email protected] http://cedric.cnam.fr/vertigo/Cours/ml2/ Département Informatique Conservatoire Nationnal des Arts et Métiers (Cnam)

Transcript of Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209...

Page 1: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

Apprentissage, réseaux de neurones et modèles graphiques(RCP209)

Neural Networks and Deep Learning

Nicolas [email protected]

http://cedric.cnam.fr/vertigo/Cours/ml2/

Département InformatiqueConservatoire Nationnal des Arts et Métiers (Cnam)

Page 2: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Outline

1 Deep Learning Strengths

2 Deep Learning Weaknesses

3 Deep Learning History

[email protected] RCP209 / Deep Learning 2/ 54

Page 3: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

MLP: Universal Function Approximators

• Neural network with one single hidden layer ⇒ universal approximatorCan represent any function on compact subsets of Rn [Cyb89]

Approximate any continuous function to any desired precision

Ex pour regression: any function can be interpolated

[email protected] RCP209 / Deep Learning 3/ 54

Page 4: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

MLP: Universal Function Approximators

• Neural network with one single hidden layer ⇒ universal approximatorCan represent any function on compact subsets of Rn [Cyb89]Ex pour classification: any decision boundaries can be expressed

⇒ very rich modeling capacities

[email protected] RCP209 / Deep Learning 4/ 54

Page 5: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

MLP: Universal Function Approximators

• 2 layers, i.e. one hidden layer, is enough

• Challenge is NOT fitting training dataSimple models already have very large (infinite) modeling power

• Challenge: optimization, overfitting

[email protected] RCP209 / Deep Learning 5/ 54

Page 6: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

MLP: Universal Function Approximators

• 2 layers, i.e. one hidden layer, is enough ... theoretically:BUT: exponential number of hidden units [Bar93]

[email protected] RCP209 / Deep Learning 6/ 54

Page 7: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Models: Universal Function Approximators

• Deeper Models: less units required to represent the desired functionFunctions representable compactly with k layers may require exponentially size with k − 1layers [Has89, Ben09]

• Same modeling power, fewer parameters⇒ better generalization!

[email protected] RCP209 / Deep Learning 7/ 54

Page 8: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Models

Depth improves generalization: multi-digit recognition, from [GBC16]

[email protected] RCP209 / Deep Learning 8/ 54

Page 9: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Local vs Distributed Representations

• Local Representations: one neuron ↔ one concept• Deep Learning ⇒ Distributed Representations:

Each concept ↔ many neurons, each neuron ↔ many concepts

⇒ Exponentially more efficient than local representations

From [BD11]

[email protected] RCP209 / Deep Learning 9/ 54

Page 10: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Learning & Distributed Representations

• DL architectures: distributed representations shared across classes

Credit: M.A Ranzato

[email protected] RCP209 / Deep Learning 10/ 54

Page 11: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep ConvNets

• Deep models: hierarchy of sequential layers

• Layers: fully connected,

convolution + non linearity³¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹·¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹µ

convolution layer , pooling• Supervised training for classification⇒ Representation Learning

[email protected] RCP209 / Deep Learning 11/ 54

Page 12: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Error Back-Propagation ConvNets

• Convolution: example for 1d scalar conv with mask w = [w1 w2 w3]T

• Shared weights: simple chain rule application

⇒ Sum gradients for every region yk !

∂L

∂w=

N

k=1

∂L

∂yk

∂yk∂w

−∂yk∂w

= [xk−1 xk xk+1]T

[email protected] RCP209 / Deep Learning 12/ 54

Page 13: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Error Back-Propagation with ConvNets

• Pooling: example for pooling area of size 2L + 1:• yk = f (xk−L, ..., xk+L)

∂L∂xh=

∂L∂yk

∂yk∂xh= δk ∂yk

∂xh

[email protected] RCP209 / Deep Learning 13/ 54

Page 14: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Error Back-Propagation for Average pooling

• Average pooling: yk = 1N

k+L∑

h=k−Lxh

∂L∂xh= δk ∂yk

∂xh=

1N δ

k

⇒ Gradient propagated through each input node ( 1N)

[email protected] RCP209 / Deep Learning 14/ 54

Page 15: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Error Back-Propagation for Max pooling

• Max pooling: yk = maxh′

xh

∂L∂xh

=

⎧⎪⎪⎪⎨⎪⎪⎪⎩

δk if xh = maxh′∈{k−L,...,k+L}

xh′

0 otherwise

⇒ Gradient propagated through arg max input node

[email protected] RCP209 / Deep Learning 15/ 54

Page 16: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNets & Prior Distribution

• Prior: imposing distribution on fully connected parameters• Weak prior: high entropy (uncertainty), strong prior: low entropy• Infinitely strong prior: zero probability on some parameters

[email protected] RCP209 / Deep Learning 16/ 54

Page 17: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNet as Infinitely Strong Prior

• ConvNet ∼ Infinitely Strong Prior on Fully Connected net weights• Convolution: local interactions, shared weights ⇒ zero probability elsewhere

[email protected] RCP209 / Deep Learning 17/ 54

Page 18: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNet for Learning Representation

• ConvNet ∼ Infinitely Strong Prior on Fully Connected net weights• N.B.: weights adjusted for classification with back-prop• Convolution ⇒ support learning translation-equivariant features• Pooling ⇒ support features invariant (stable) wrt local translations

[email protected] RCP209 / Deep Learning 18/ 54

Page 19: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNet for Learning Representation

• ConvNet Infinitely Strong Prior: adapted to data with local interactions, e.g. image,speech, etc

• Very rich modeling capacities: local interactions ⇒ global with depth• Significantly reduce # parameters ⇒ reducing over-fitting

From [GBC16]

[email protected] RCP209 / Deep Learning 19/ 54

Page 20: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNet for Learning Compositions

• Conv/Pool hierarchies: feature compositionDepth: gradual complexity, larger spatial extendIntuitive processing for hierarchical information modelingBiological foundations: simple cells, complex cells

[email protected] RCP209 / Deep Learning 20/ 54

Page 21: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

ConvNet for Learning Compositions

Credit: M.A Ranzato

• Hierarchical CompositionsLow-level: edges, colorMid-level: corner, partsHigher levels: objects, scene concepts

• Distributed Representations: sharingLower-level: shared by many classesHigher-levels: more class specific

[email protected] RCP209 / Deep Learning 21/ 54

Page 22: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Representation Learning & Deep Learning

• X-class classification, K classes: last hidden layer size L→ K

• Classification layer: linear projection + soft-max activationIn RL space, linear separation between classesDeep Learning (backprop) supports learning representations thatgradually project data to RL spaces where linear separation possible

[email protected] RCP209 / Deep Learning 22/ 54

Page 23: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Learning & Manifold Untangling

• DL: gradually projecting data to RL spaces where linear separation possible• This is the definition of manifold untangling!

• ConvNets: manifold untangling easier!Conv/Pool prior : stability

[email protected] RCP209 / Deep Learning 23/ 54

Page 24: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Manifold Untangling Visualization

• We want to visualize each layer activation for each class• high-dimensional visualization?⇒ Projection to lower (e.g. 2d) dimensions

[email protected] RCP209 / Deep Learning 24/ 54

Page 25: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-distributed Stochastic Neighbor Embedding (t-SNE)

• t-SNE [vdMH08]:non linear projection

• Intuitively: close distances in initial space⇒ close distances in projected (2d) space

Distance preservationNeighborhood preservation i.e. small distancepreservation

[email protected] RCP209 / Deep Learning 25/ 54

Page 26: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-SNE [vdMH08]

• Similarity between points (xi , xj) in initial space, e.g. Rd :

pij =e−

∣∣xi−xj ∣∣22σ2

k≠le−

∣∣xk−xl ∣∣22σ2

P = {pij}(i,j)∈N×N

• Similarity between points (yi , yj) in projected space, e.g. R2 :

qij =(1 + ∣∣yi − yj ∣∣2)−1

k≠l(1 + ∣∣yk − yl ∣∣2)−1

Q = {qij}(i,j)∈N×N

• Loss function: Kullback-Leiber divergence KL(P ∣∣Q)

C =∑

i

KL(P ∣∣Q) =∑

i

j

pij logpijqij

[email protected] RCP209 / Deep Learning 26/ 54

Page 27: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-SNE Visualization: MNIST example

• MNIST dataset: 28 × 28 grayscale images of digits• 10 classes ⇔ digit number ∈ {0;9}• Input space dimension: 282

= 784• Projection in 2d (3d) space for visualization

• t-SNE for computing projection: gradient descent∂C

∂yi= 4∑

i

(pij − qij)(yi − yj)(1 + ∣∣yi − yj ∣∣2)−1

• Optimization (projection) for a given closed dataset⇒ transductive learning

[email protected] RCP209 / Deep Learning 27/ 54

Page 28: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-SNE Visualization: MNIST example

• Application of t-SNE in the test set of MNIST (10000) images• Color ⇔ class ID

[email protected] RCP209 / Deep Learning 28/ 54

Page 29: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-SNE Visualization: MNIST example

• Classes visually appear in 2d space, BUT overlap• How to measure class separability?

Neighborhood Hit [PNML08]: NH = # pts in knn of the same class# pts in knn

[email protected] RCP209 / Deep Learning 29/ 54

Page 30: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

t-SNE Visualization: MNIST example

• How to measure class separability?Fitting ellipses to each class pointsEllipses non-overlap ⇒ linear separability

[email protected] RCP209 / Deep Learning 30/ 54

Page 31: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Outline

1 Deep Learning Strengths

2 Deep Learning Weaknesses

3 Deep Learning History

[email protected] RCP209 / Deep Learning 31/ 54

Page 32: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Weaknesses & Drawbacks

Criticisms at two main levels

1 Modeling level: Neural Networks ⇔ Black Boxes2 Training level: ad hoc, expertise, efficiency, guaranty

[email protected] RCP209 / Deep Learning 32/ 54

Page 33: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Black Boxes

• Lack of explainability: why this decision?Hidden units not directly interpretable ≠ others, e.g. decision trees, expert systems

⇒ Challenges: Human machine interaction, failure analysis

[email protected] RCP209 / Deep Learning 33/ 54

Page 34: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Black Boxes

• Lack of confidence estimate (uncertainty): how (un)certain about decision?• Uncertainty estimates often vital, e.g. medicine, autonomous driving

[email protected] RCP209 / Deep Learning 34/ 54

Page 35: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Black Boxes

• Lack of theory for architecture design• How many layers, neurons?• Layer type: fully connected, convolution, pooling?• Trial/test: optimize architecture on validation set⇒ Ad hoc, no theory to guide you

[email protected] RCP209 / Deep Learning 35/ 54

Page 36: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Training Issues

• Optimization: non convex objectiveNo guaranty to reach global optimumSolution dependent on initialization

Importance of (random) initialization⇒ training reproducibilityExpertise: ad hoc hyper-parameter tuning:# epochs, decay, etcCostly Tuning

[email protected] RCP209 / Deep Learning 36/ 54

Page 37: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Training Issues

• Optimization: stochastic trainingStochastic training for big data:⇒ gradient approximation⇒ ↑ difficulty for tuning

[email protected] RCP209 / Deep Learning 37/ 54

Page 38: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Training Issues

• Big Data

• Deep models need huge annotated datasets⇒ Huge models, huge computational demand⇒ Long be impossible to train such models with existing resources

[email protected] RCP209 / Deep Learning 38/ 54

Page 39: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Networks: Training Issues

• Generalization: deep models need huge annotated datasets• Smaller datasets: inferior predictive performances

Small models: not enough expressive powerLarge models overfit⇒ Performances ↓ handcrafted features

[email protected] RCP209 / Deep Learning 39/ 54

Page 40: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Neural Network Weaknesses: Conclusion

• Deep learning Weaknesses:Black box modelsTraining challenges at many levels

• Drawbacks vs NN strength over DL History

[email protected] RCP209 / Deep Learning 40/ 54

Page 41: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Outline

1 Deep Learning Strengths

2 Deep Learning Weaknesses

3 Deep Learning History

[email protected] RCP209 / Deep Learning 41/ 54

Page 42: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Deep Learning: Trends and methods in the last four decades

Slide credit: https://www.slideshare.net/deview/251-implementing-deep-learning-using-cu-dnn

[email protected] RCP209 / Deep Learning 42/ 54

Page 43: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

80’s: 1st Convolutionnal Neural Networks

• LeNet 5 Model [LBD+89], trained using back-prop

• Input: 32x32 pixel image. Largest character is 20x20• 2 successive blocks [Convolution + Sigmoid + Pooling (+sigmoid)]Cx: Convolutional layer, Sx: Subsampling layer

• C5: convolution layer ∼ fully connected• 2 Fully connected layers Fx

[email protected] RCP209 / Deep Learning 43/ 54

Page 44: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

C1 Layer

• Convolutional layer with 6 5x5 filters ⇒ 6 feature maps of size 28x28 (nopadding).

• # Parameters: 52 per filer + bias ⇒ (5 ∗ 5 + 1) ∗ 6 = 156If it was fully connected: (32*32+1)*(28*28)*6 parameters 5 ∼ 106 !

[email protected] RCP209 / Deep Learning 44/ 54

Page 45: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

S2 Layer

• Subsampling layer = pooling layer• Pooling area : 2x2 in C1• Pooling stride: 2 ⇒ 6 features maps of size 14x14• Pooling type : sum, multiplied by a trainable param + bias⇒ 2 parameters per channel

• Total # Parameters: 2 ∗ 6 = 12

[email protected] RCP209 / Deep Learning 45/ 54

Page 46: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

C3 Layer: Convolutional

• C3: 16 filters ⇒ 16 feature maps of size 10x10 (no padding)

• 5x5 filters connected to a subset of S2 maps⇒ 0-5 connected to 3, 6-14 to 4, 15 connected to 6

• # Parameters: 1516(5 ∗ 5 ∗ 3 + 1) ∗ 6 + (5 ∗ 5 ∗ 4 + 1) ∗ 9 + (5 ∗ 5 ∗ 6 + 1) = 456 + 909 + 151

[email protected] RCP209 / Deep Learning 46/ 54

Page 47: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

S4 Layer

• Subsampling layer = pooling layer• Pooling area : 2x2 in C3• Pooling stride: 2 ⇒ 16 features maps of size 5x5• Pooling type : sum, multiplied by a trainable param + bias⇒ 2 parameters per channel

• Total # Parameters: 2 ∗ 16 = 32

[email protected] RCP209 / Deep Learning 47/ 54

Page 48: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

C5 Layer: Convolutionnal layer

• 120 5x5x16 filters ⇒ whole depth of S4 (≠ C3)• Each maps in S4 is 5x5 ⇒ single value for each C5 maps• C5 120 features map of size 1x1 (vector of size 120)⇒ spatial information lost, ∼ to a fully connected layer

• Total # Parameters: (5 ∗ 5 ∗ 16 + 1) ∗ 120 = 48210

[email protected] RCP209 / Deep Learning 48/ 54

Page 49: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

F6 Layer: Fully Connected layer

• 84 fully connected units.• # Parameters: 84*(120+1)=10164

F7 Layer (output): Fully Connected layer

• 10 (# classes) fully connected units.• # Parameters: 10*(84+1)=850

[email protected] RCP209 / Deep Learning 49/ 54

Page 50: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

Case Study: LeNet 5 Model

• Evaluation on MNIST• Total # parameters ∼ 60000

60,000 original datasets: test error: 0.95%540,000 artificial distortions + 60,000 original: Testerror: 0.8%

• Successful deployment for postal code reading in the US

[email protected] RCP209 / Deep Learning 50/ 54

Page 51: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

LeNet 5 Model: Manifold Untangling

Input space Latent space

[email protected] RCP209 / Deep Learning 51/ 54

Page 52: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

LeNet 5 Model: Manifold Untangling

Latent space MLP Latent space LeNet

[email protected] RCP209 / Deep Learning 52/ 54

Page 53: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

80’s: Deep Learning Sucees

• ConvNet trainable with backprop, able to untangle class manifold• Very good performances for digit classification, industrial transfer• 2nd winter of Deep Learning⇒ following!

Source: Amazon

[email protected] RCP209 / Deep Learning 53/ 54

Page 54: Apprentissage,réseauxdeneuronesetmodèlesgraphiques (RCP209 ...cedric.cnam.fr/vertigo/Cours/ml2/docs/coursDeep3.pdf · Deep Learning (backprop) supports learning representations

DL Strengths DL Weaknesses History

References I

A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, Information

Theory, IEEE Transactions on 39 (1993), no. 3, 930–945.

Yoshua Bengio and Olivier Delalleau, On the expressive power of deep architectures, Proceedings of the

22Nd International Conference on Algorithmic Learning Theory (Berlin, Heidelberg), ALT’11,Springer-Verlag, 2011, pp. 18–36.

Yoshua Bengio, Learning deep architectures for ai, Found. Trends Mach. Learn. 2 (2009), no. 1, 1–127.

George Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of control, signals

and systems 2 (1989), no. 4, 303–314.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT Press, 2016,http://www.deeplearningbook.org.

Johan Hastad, Almost optimal lower bounds for small depth circuits, RANDOMNESS ANDCOMPUTATION, JAI Press, 1989, pp. 6–20.

Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and

Lawrence D Jackel, Backpropagation applied to handwritten zip code recognition, Neural computation 1(1989), no. 4, 541–551.

Fernando Vieira Paulovich, Luis Gustavo Nonato, Rosane Minghim, and Haim Levkowitz, Least square

projection: A fast high-precision multidimensional projection technique and its application to documentmapping, IEEE Trans. Vis. Comput. Graph. 14 (2008), no. 3, 564–575.

Laurens van der Maaten and Geoffrey E. Hinton, Visualizing high-dimensional data using t-sne, Journal of

Machine Learning Research 9 (2008), 2579–2605.

[email protected] RCP209 / Deep Learning 54/ 54