BACKGROUND AND THEORY OF DEEP...
Transcript of BACKGROUND AND THEORY OF DEEP...
![Page 1: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/1.jpg)
BACKGROUND AND THEORY OF DEEP
LEARNING
DEEP LEARNING SEMINAR
Tel Aviv University
October 30, 2016 1
RAJA GIRYES SHAI AVIDAN
![Page 2: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/2.jpg)
SEMINAR DETAILS
• Slides in English, talk in Hebrew.
• Other lectures: each week two pairs of students will present:40 minutes for presentation + 10 minutes for discussion
• Each student is required to complete the deep learning Udacity course. This course teaches how to use tensor-flow.
• A self-learning ability is expected from the students.
• Final project (also in pairs) based on recently published 1-3 papers. Projects will be assigned within a month.
• The project will include:• A final report (10-20 pages) summarizing these papers, their contributions,
and your own findings (open questions, simulation results, etc.).
• A Power-point presentation of the project in a mini-workshop on March 20.
2
![Page 3: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/3.jpg)
HISTORY AND INTRODUCTION TO
DEEP LEARNING
3
![Page 4: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/4.jpg)
WHAT IS HARDER TO SOLVE?
4
12456334679234123 × 125074832
79684921 × 24679748
Solve this equation:
Describe what you see:
![Page 5: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/5.jpg)
FIRST LEARNING PROGRAM
5
1956
“field of study that gives computers the ability to learn without being explicitly programmed”. [Arthur Samuel, 1959]
![Page 6: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/6.jpg)
BEST LEARNING SYSTEM IN NATURE
6
The human brain
![Page 7: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/7.jpg)
IMITATING THE BRAIN
7Wiesel and Hubel, 1959
![Page 8: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/8.jpg)
HUMAN VISUAL SYSTEM
8
In the visual cortex there are two types of neurons:
Simple and complex
![Page 9: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/9.jpg)
IMITATING THE HUMAN BRAIN
9
Fukushima 1980
![Page 10: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/10.jpg)
CONVOLUTIONAL NEURAL NETWORKS
• Introduction of convolutional neural networks [LeCun et. al. 1989]
• Training by backpropagation
10
![Page 11: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/11.jpg)
FIRST GENERATION OF NEURAL NETWORKS
11
![Page 12: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/12.jpg)
WHAT ATTRACTS OUR EYES?
12
![Page 13: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/13.jpg)
DESIGNING HAND CRAFTED FEATURES
13
![Page 14: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/14.jpg)
DEEP NETWORK STRUCTURE
• What each layer of the network learns?
[Krizhevsky, Sutskever & Hinton, 2012]
14
![Page 15: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/15.jpg)
LAYERS STRUCTURE
• First layers detect simple patterns that corresponds to simple objects
•
[Zeiler & Fergus, 2014]15
![Page 16: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/16.jpg)
LAYERS STRUCTURE
• Deeper layers detects more complex patterns corresponding to more complex objects.
•
[Zeiler & Fergus, 2014]16
![Page 17: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/17.jpg)
LAYERS STRUCTURE
•
[Zeiler & Fergus, 2014] 17
![Page 18: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/18.jpg)
WHY THINGS WORK BETTER TODAY?
• More data – larger datasets, more access (internet)
• Better hardware (GPU)
• Better learning regularization (dropout)
• Deep learning impact and success is not unique only to image classification.
• But it is still unclear why deep neural networks are so remarkably successful and how they are doing it.
18
![Page 19: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/19.jpg)
DEEP NEURAL NETWORKS (DNN)
• One layer of a neural net
• Concatenation of the layers creates the whole net
Φ(𝑋1, 𝑋2, … , 𝑋𝐾) = 𝜓 𝜓 𝜓 𝑉𝑋1 𝑋2 …𝑋𝐾
𝑉 ∈ ℝ𝑑 𝑿 𝝍 𝝍(𝑽𝑿) ∈ ℝ𝒎
𝑋 is a linear operation
𝝍 is a non-linear function
𝑉𝑋
𝑽 ∈ ℝ𝒅 𝑿𝟏 𝝍 𝑿𝒊 𝝍 𝑿𝑲 𝝍
19
![Page 20: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/20.jpg)
CONVOLUTIONAL NEURAL NETWORKS (CNN)
• In many cases, 𝑋 is selected to be a convolution.
• This operator is shift invariant.
• CNN are commonly used with images as they are typically shift invariant.
𝑽 ∈ ℝ𝒅 𝑿 𝝍 𝝍(𝑽𝑿) ∈ ℝ𝒎
𝑋 is a linear operation
𝐹 is a non-linear function
𝑉𝑋
20
![Page 21: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/21.jpg)
THE NON-LINEAR PART
• Usually 𝜓 = 𝑔 ∘ 𝑓.
• 𝑓 is the (point-wise) activation function
• 𝑔 is a pooling or an aggregation operator.
ReLU 𝑓(x) = max(x, 0)
Sigmoid
𝑓 𝑥 =1
1 + 𝑒−𝑥
Hyperbolic tangent
𝑓 𝑥 = tanh(𝑥)
𝑉1 𝑉2 𝑉𝑟𝑉3 𝑉4 … … … …
max𝑖𝑉𝑖
Max pooling Mean pooling
1
𝑛 𝑖=1
𝑛
𝑉𝑖
𝑙𝑝 pooling𝑝
𝑖=1
𝑛
𝑉𝑖𝑝
𝑿 𝝍
21
![Page 22: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/22.jpg)
A SAMPLE OF EXISTING THEORY FOR
DEEP LEARNING
22
![Page 23: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/23.jpg)
PRE 2012 THEORYMAIN FOCUS ON REPRESENTATION POWER
23
![Page 24: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/24.jpg)
REPRESENTATION POWER
• Neural nets serve as a universal approximation for any measurable Borel functions [Cybenko 1989, Hornik 1991].
• In particular, let the non-linearity 𝜓 be a bounded, non-constant continuous function, 𝐼𝑑 be the 𝑑-dimensional hypercube, and 𝐶 𝐼𝑑 be the space of continuous functions on 𝐼𝑑. Then for any 𝑓 ∈ 𝐶 𝐼𝑑and 𝜖 > 0, there exists 𝑚 > 0, and 𝑋 ∈ ℝ𝑑×𝑚, 𝐵 ∈ ℝ𝑚, 𝑊 ∈ ℝ𝑚 such that the neural network
𝐹 𝑉 = 𝜓 𝑉𝑋 + 𝐵 𝑊𝑇
approximates 𝑓 with a precision 𝜖:
𝐹 𝑉 − 𝑓 𝑉 < 𝜖, ∀𝑉 ∈ ℝ𝑑
24
![Page 25: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/25.jpg)
ESTIMATION ERROR
• The estimation error of a function f by a neural networks scales as [Barron 1994].
𝑶𝑪𝒇
𝑵+𝑶𝑵𝒅
𝑳𝐥𝐨𝐠(𝑳)Smoothness of
approximated function
Number of neurons in the
DNN
Number of training
examples
Input dimension
25
![Page 26: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/26.jpg)
POST 2012 THEORYREPRESENTATION POWER IS NOT ENOUGH
26
![Page 27: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/27.jpg)
DEPTH OF THE NETWORK
• Depth allow representing shallow restrictedBoltzmann machines, which has an exponentialnumber of parameters, compared to the deep one[Montúfar & Morton, 2015]
• Each DNN layer with ReLU divides the space by ahyper-plane, folding one part of it.
• Thus, the depth of the network folds the spaceinto an exponential number of sets compared tothe number of parameters [Montúfar, Pascanu, Cho &
Bengio, 2014]
27
![Page 28: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/28.jpg)
DEPTH EFFICIENCY OF CNN
• Function realized by CNN, with ReLU and max-pooling, of polynomial size requires super-polynomial size for being approximated by shallow network [Telgarsky 2016 ,Cohen et al., 2016].
• Standard convolutional network design has learning bias towards statistics of natural images [Cohen et al., 2016].
28
![Page 29: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/29.jpg)
ROLE OF POOLING
• The pooling stage provides shift invariance [Boureau et
al. 2010], [Bruna, LeCun & Szlam, 2013].
• A connection is drawn between the pooling stage and the phase retrieval methods [Bruna, Szlam & LeCun, 2014].
• This allows calculating Lipchitz constants of each DNN layer 𝜓(∙ 𝑋) and empirically recovering the input of a layer from its output.
• However, the Lipchitz constants calculated are very loose and no theoretical guarantees are given for the recovery.
29
![Page 30: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/30.jpg)
STABILITY OF DEEP LEARNING
• Common DNN initialization is with randomGaussian weights.
• DNN with Random Gaussian weights perform astable embedding of the data
• It is possible to recover the input from the output
• DNN with Random Gaussian weights are good forclassifying the average points in the data.
• Training separates the boundary points betweenthe different classes in the data. [Giryes et al. 2016]
30
![Page 31: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/31.jpg)
RECOVERY FROM DNN OUTPUT
[Mahendran and Vedaldi, 2015].31
![Page 32: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/32.jpg)
DNN AND HASHING
• A single layer performs a locally sensitive hashing.
• Deep network with random weights may bedesigned to do better [Choromanska et al., 2016].
• It is possible to train DNN for hashing, whichprovides cutting-edge results [Masci et al., 2012],[Lai et al., 2015].
32
![Page 33: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/33.jpg)
SCATTERING TRANSFORMS
• Scattering transform - a cascade of wavelet transform convolutions with nonlinear modulus and averaging operators.
• Scattering coefficients are stable encodings of geometry and texture [Bruna & Mallat, 2013]
33
Original image with 𝑑 pixels
Recovery from first scattering moments: 𝑂 log𝑑 coefficients
Recovery from 1st & 2nd
scattering moments:
𝑂 log2 𝑑 coefficientsImages from slides of Joan Bruna in ICCV 2015 tutorial
![Page 34: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/34.jpg)
SCATTERING TRANSFORMS AND DNN
• More layers create features that can be made invariant to increasingly more complex deformations.
• Deep layers in DNN encode complex, class-specific geometry.
• Deeper architectures are able to better capture invariant properties of objects and scenes in images[Bruna & Mallat, 2013], [Wiatowski & Bölcskei, 2016]
34
![Page 35: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/35.jpg)
SCATTERING TRANSFORMS AS A METRIC
• Scattering transforms may be used as a metric.
• Inverse problems can be solved by minimizing distance at the scattering transform domain.
• Leads to remarkable results in super-resolution[Bruna, Sprechmann & Lecun, 2016]
35
![Page 36: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/36.jpg)
SCATTERING SUPER RESOLUTION
Original Best Linear Estimate State-of-the-art Scattering estimate
Images from slides of Joan Bruna in CVPR 2016 tutorial
[Bruna, Sprechmann & Lecun, 2016]36
![Page 37: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/37.jpg)
GENERALIZATION ERROR (GE)
• In training, we reduce the classification error ℓtraining of the training data as the number of
training examples 𝐿 increases.
• However, we are interested to reduce the error ℓtest of the (unknown) testing data as 𝐿 increases.
• The difference between the two is the generalization error
GE = ℓtraining − ℓtest
• It is important to understand the GE of DNN
37
![Page 38: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/38.jpg)
REGULARIZATION TECHNIQUES
• Weight decay – penalizing DNN weights [Krogh & Hertz, 1992].
• Dropout - randomly drop units (along with their connections) from the neural network during training [Hinton et al., 2012], [Baldi & Sadowski, 2013], [Srivastava et al., 2014].
• DropConnect – dropout extension [Wan et al., 2013]
• Batch normalization [Ioffe & Szegedy, 2015].
• Stochastic gradient descent (SGD) [Hardt, Recht & Singer, 2016].
• Path-SGD [Neyshabur et al., 2015].
• Jacobian based regularization [Sokolić, Giryes, Sapiro & Rodrigues, 2016]
• And more [Rifai et al., 2011], [Salimans & Kingma, 2016], [Sun et al, 2016].
38
![Page 39: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/39.jpg)
A SAMPLE OF GE BOUNDS
• Using the VC dimension it can be shown that
GE ≤ 𝑂 DNN params ∙log 𝐿
𝐿
[Shalev-Shwartz and Ben-David, 2014].
• The GE was bounded also by the DNN weights
GE ≤1
𝐿2𝐾 𝑤 2
𝑖
𝑋𝑖2,2
[Neyshabur et al., 2015].
• The GE can be bounded also by the covering number of the data and the input margin
GE ≤ 𝑁𝛾2Υ 𝐿
[Sokolić, Giryes, Sapiro & Rodrigues, 2016]
39
L is number of training examples
Xi and w are network weight matrices
![Page 40: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/40.jpg)
SUFFICIENT STATISTIC AND INVARIANCE
• Given a certain task at hand:
• Minimal sufficient statistic guarantees that we can replace raw data with a representation with smallest complexity and no performance loss.
• Invariance guarantees that the statistic is constant with respect to uninformative transformations of the data.
• CNN are shown to have these properties for many tasks [Soatto & Chiuso, 2016].
• Good structures of deep networks can generate representations that are good for learning with a small number of examples [Anselmi et al., 2016].
40
![Page 41: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/41.jpg)
INVARIANCE
• Designing invariant DNN reduce the GE
[Sokolić, Giryes, Sapiro & Rodrigues, 2016]
41
![Page 42: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/42.jpg)
MINIMIZATION
• The local minima in deep networks are not far from the global minimum.
• saddle points are the main problem of deepLearning optimization.
• Deeper networks have more local minima but less saddle points. [Saxe, McClelland & Ganguli, 2014], [Dauphin, Pascanu, Gulcehre, Cho, Ganguli & Bengio, 2014] [Choromanska, Henaff, Mathieu, Ben Arous & LeCun, 2015]
42
[Choromanska et al., 2015]
![Page 43: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/43.jpg)
GLOBAL OPTIMALITY IN DEEP LEARNING
• Deep learning is a positively homogeneous factorization problem, i.e., ∃𝑝 ≥ 0 such that ∀𝛼 ≥ 0 DNN obey
Φ 𝛼𝑋1, 𝛼𝑋2, … , 𝛼𝑋𝐾 = 𝛼𝑝Φ 𝑋1, 𝑋2, … , 𝑋𝐾 .
• With proper regularization, local minima are global.
• If the network is large enough, global minima can be found by local descent.
Guarantees of proposed framework
[Haeffele & Vidal, 2015]43
![Page 44: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/44.jpg)
• Accelerating iterative solvers for inverse problems [Gregor & LeCun, 2010], [Sprechmann, et al., 2015], [Remez et al., 2015], [Tompson at al., 2016].
• Convergence analysis [Giryes et al., 2016], [Moreau & J.
Bruna, 2016], [Xin et al., 2016], [Borgerding & Schniter, 2016].
LISTA CONVERGENCE
44
![Page 45: BACKGROUND AND THEORY OF DEEP LEARNINGweb.eng.tau.ac.il/deep_learn/wp-content/uploads/2016/11/... · 2016-11-06 · SEMINAR DETAILS •Slides in English, talk in Hebrew. •Other](https://reader034.fdocuments.us/reader034/viewer/2022042119/5e98c72e3ac2da2da45f6b44/html5/thumbnails/45.jpg)
WISH YOU A SUCCESSFUL SEMESTER
45