presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn «...
Transcript of presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn «...
![Page 1: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/1.jpg)
Basics of deep learning
By
Pierre-Marc Jodoin and Christian Desrosiers
![Page 2: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/2.jpg)
Lets start with a simple example
From Wikimedia Commonsthe free media repository
![Page 3: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/3.jpg)
( temp, freq) diagnostic
(37.5, 72) Healthy
(39.1, 103) Sick
(38.3, 100) Sick
(…) …
(36.7, 88) Healthy
Freq
temp
Sick
Healthy
Patient 1
Patient 2
Patient 3
Patient N
D
x t
D
Lets start with a simple example
![Page 4: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/4.jpg)
Freq
temp
Sick
Healthy
A new patient comes to the hospitalHow can we determine if he is sick or not?
From Wikimedia Commonsthe free media repository
Lets start with a simple example
![Page 5: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/5.jpg)
Freq
temp
Divide the feature space in 2 regions : sick and healthy
Solution
From Wikimedia Commonsthe free media repository
![Page 6: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/6.jpg)
Freq
temp
Sick
Healthy
Divide the feature space in 2 regions : sick and healthy
Solution
From Wikimedia Commonsthe free media repository
![Page 7: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/7.jpg)
Freq
temp
t=Sick
t=Healthy
otherwise
region blue in the is ifealthy
Sick
xHxy
More formally
From Wikimedia Commonsthe free media repository
![Page 8: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/8.jpg)
How to split the featurespace?
![Page 9: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/9.jpg)
Definition … a line!
x
ybmxy
x
y
x
ym
slope bias
b, bias
![Page 10: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/10.jpg)
x
ybmxy
x
y bxx
yy
x
ym
xbyxxy
xbxyyx 0
Definition … a line!
b, bias
![Page 11: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/11.jpg)
xbxyyx 0
1w 2w 0w
Rename variables
![Page 12: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/12.jpg)
x
y
1x
2x
Rename variables
0210 wywxw
022110 wxwxw
![Page 13: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/13.jpg)
1x
2x
02211 wxwxwxyw
0xyw
0xyw
0xyw
Classification function
),( 21 wwN
![Page 14: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/14.jpg)
1x
2x
002211 wxwxw
0.4
0.2
0.1
0
2
1
w
w
w
)3,2(
)6,1(
0702211 wxwxw
)1,4(
0602211 wxwxw
Classification function
02211 wxwxwxyw
![Page 15: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/15.jpg)
1x
2x
02211 wxwxwxyw
0xyw
0xyw
0xyw
Classification function
),( 21 wwN
21210
02211
,,1,, xxwww
wxwxwxyw
DOT
product
'x
w
![Page 16: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/16.jpg)
1x
2x
02211 wxwxwxyw
0xyw
0xyw
0xyw
Classification function
),( 21 wwN
'
,,1,, 21210
02211
xw
xxwww
wxwxwxy
T
w
DOT
product
![Page 17: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/17.jpg)
To simplify notation
xwxy Tw
linear classification = dot product with bias included
![Page 18: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/18.jpg)
Freq
0tSick
Healthy
With the training dataset D
the GOAL is to
find the parameters that would best separate the two classes.
210 ,, www 0xyw
0xyw
Learning
![Page 19: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/19.jpg)
How do we know if a model is good?
![Page 20: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/20.jpg)
Freq
0t
FreqFreq
0t0t
0, DxyL w
0, DxyL w
0, DxyL w
Loss function
Good! Medium BAD!
![Page 21: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/21.jpg)
4. Training : find that minimize
So far…
1. Training dataset: D
2. Classification function (a line in 2D) :
3. Loss function:
02211 wxwxwxyw
210 ,, www
DxyL w ,
DxyL w ,
![Page 22: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/22.jpg)
Perceptron Logistic regressionMulti-layer perceptronConv Nets
Today
22
![Page 23: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/23.jpg)
Perceptron
27
Rosenblatt, Frank (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, Psychological Review, v65, No. 6, pp. 386–408
![Page 24: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/24.jpg)
Perceptron
1
1w
2w
0w
(2D and 2 classes)
)(xyw
1x
2x
w
1x
2x0)( xyw
0)( xyw
0)( xyw
xw
xwxwwxyw
T
12110
)(
xw T
28
![Page 25: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/25.jpg)
sign }1,1{)( xyw
Activation function
xw T
xwsignxyw
T
Perceptron(2D and 2 classes)
29
1x
2x0)( xyw
0)( xyw
0)( xyw
w
1
1w
2w
0w
1x
2x
![Page 26: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/26.jpg)
NeuronDot product + activation function
sgxw T 1,1T xwsignxyw
Perceptron(2D and 2 classes)
30
w
1x
2x0)( xyw
0)( xyw
0)( xyw
1
1w
2w
0w
1x
2x
![Page 27: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/27.jpg)
So far…
1. Training dataset: D
2. Classification function (a line in 2D) :
3. Loss function:
02211 wxwxwxyw
DxyL w ,
![Page 28: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/28.jpg)
4. Training : find that minimize
So far…
1. Training dataset: D
2. Classification function (a line in 2D) :
3. Loss function:
210 ,, www
DxyL w ,
DxyL w ,
sg
1
1w
2w
0w
1x
2x xw T xyW
![Page 29: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/29.jpg)
Linear classifiers have limits
![Page 30: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/30.jpg)
Non-linearly separable training data
Linear classifier = large error rate
![Page 31: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/31.jpg)
Three classical solutions
1. Acquire more observations
2. Use a non-linear classifier
3. Transform the data
Non-linearly separable training data
![Page 32: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/32.jpg)
Three classical solutions
1. Acquire more observations
2. Use a non-linear classifier
3. Transform the data
Non-linearly separable training data
![Page 33: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/33.jpg)
D
x
( temp, freq, headache) Diagnostic
(37.5, 72, 2) healthy
(39.1, 103, 8) sick
(38.3, 100, 6) sick
(…) …
(36.7, 88, 0) healthy
Patient 1
Patient 2
Patient 3
Patient N
D
x
t
( temp, freq) diagnostic
(37.5, 72) healthy
(39.1, 103) sick
(38.3, 100) sick
(…) …
(36.7, 88) healthy
Patient 1
Patient 2
Patient 3
Patient N
D
t
Acquire more data
![Page 34: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/34.jpg)
10
8
6
4
2
0
02211 wxwxwxyw
(line)
0332211 wxwxwxwxyw
(plane)
Non-linearly separable training data
![Page 35: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/35.jpg)
39
sgxw T 1,1T xwsignxyw
Perceptron(2D and 2 classes)
02211 wxwxwxyw
(line)
1x
2x0)( xyw
0)( xyw
0)( xyw
w
1
1w
2w
0w
1x
2x
![Page 36: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/36.jpg)
40
Perceptron(3D and 2 classes)
sg
1
1w
2w
0w
1x
2x
xw T 1,1T xwsignxyw
1)( xyw
1)( xyw
Example 3D
0332211 wxwxwxwxyw
(plane)
3x 3w
![Page 37: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/37.jpg)
41
Perceptron(K-D and 2 classes)
sg
1
1w
2w
0w
xw T 1,1T xwsignxyw
0332211 wxwxwxwxwxy KKw
(hyperplane)
1Kw
Kw
(…)
1x
2x
1Kx
Kx
![Page 38: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/38.jpg)
Learning a machine
The goal: with a set of training data , estimate so:
In other words, minimize the training loss
Optimization problem
ntxy nnw
NN txtxtxD ,,...,,,, 2211
,,1
N
nnnww txylDxyL
42
w
![Page 39: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/39.jpg)
Freq
0t
FreqFreq
0t0t
Loss function
0, DxyL w
0, DxyL w
0, DxyL w
![Page 40: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/40.jpg)
44
DxyL w ,
1w2w
![Page 41: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/41.jpg)
Perceptron
Question: how to find the best solution?
45
0, DxyL w
Random initialization
DxyL w ,
1w2w
[w1,w2]=np.random.randn(2)
![Page 42: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/42.jpg)
Gradient descent
Question: how to find the best solution?
DxyLηww [k]w
[k]][k ,1
Gradient of the loss function
Learning rate
47
0, DxyL w
![Page 43: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/43.jpg)
Perceptron Criterion (loss)
A wrongly classified sample is when
Consequently is ALWAYS positive for wrongly classified samples
.1et 0
or
1et 0
T
T
nn
nn
txw
txw
Tnntxw
Observation
50
![Page 44: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/44.jpg)
Perceptron Criterion (loss)
15.464, DxyL w
samples classified wrongly ofset theis V ere wh, T
Vx
nnw
n
txwDxyL
51
![Page 45: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/45.jpg)
So far…
1. Training dataset: D
2. Linear classification function:
3. Loss function:
02211 wxwxwxwxy MMw
, T
Vx
nnw
n
txwDxyL
![Page 46: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/46.jpg)
4. Training : find that minimizes
So far…
1. Training dataset: D
2. Linear classification function:
3. Loss function:
w DxyL w ,
sg
1
1w2w
0w
1x
2x
xw T 1,1T xwsignxyw
3x
Kx
3w
Kw
(…)
DxyLw ww
,minarg
0, DxyL w
, T
Vx
nnw
n
txwDxyL
![Page 47: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/47.jpg)
Optimisation
Lkkk ][][]1[ ww
Gradient of the loss function
learning rate
Stochastic gradient descent (SGD)
Initk=0DO k=k+1
FOR n = 1 to N
UNTIL every data is well classified or k== MAX_ITER
nk xLww
][
54
w
![Page 48: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/48.jpg)
Perceptron gradient descent
Vx
nnw
n
xtDxyL
,
NOTE : learning rate :
• Too low => slow convergence• Too large => might not converge (even diverge)• Can decrease at each iteration (e.g. )kcstk /][ 55
, T
Vx
nnw
n
txwDxyL
Stochastic gradient descent (SGD)
Initk=0DO k=k+1
FOR n = 1 to NIF THEN /* wrongly classified */
UNTIL every data is well classified OR k=k_MAX
nnxtww
0T nntxw
w
![Page 49: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/49.jpg)
Similar loss functions
N
nn
Tnw xwt,DxyL
1
0max,
“Hinge Loss” or “SVM” Loss
samples classified wrongly ofset thes V ere wh, T itxwDxyLVx
nnw
n
N
nn
Tnw xwt,DxyL
1
10max,
58
![Page 50: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/50.jpg)
Multiclass Perceptron
1 1,0w
2,0w
0,0w
2221120222
2211110111
2201100000
xwxwwxw)x(y
xwxwwxw)x(y
xwxwwxw)x(y
,,,T
,w
,,,T
,w
,,,T
,w
(2D and 3 classes)
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xy iWi
,maxarg
1x
xwT 0
xwT 1
xwT 2
59
2x
max is )(0, xyw
max is )(2, xyw
max is )(1, xyw
![Page 51: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/51.jpg)
xWxyW
T)(
2
1
2,21,20,2
2,11,10,1
2,01,00,0 1
)(
x
x
www
www
www
xyW
1 1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xy iWi
,maxarg
xwT 0
xwT 1
xwT 2
Multiclass Perceptron(2D and 3 classes)
max is )(0, xyw
max is )(2, xyw
max is )(1, xyw
2x
1x
![Page 52: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/52.jpg)
Multiclass Perceptron(2D and 3 classes)
Example
2
1.1
1
9.446
1.44.24
5.06.32
)(xyW
Class 0
Class 1
(1.1, -2.0)
2.8
6.9
9.6
Class 2
61
![Page 53: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/53.jpg)
Multiclass PerceptronLoss function
xwxwDxyLVx
nTtn
TjW
n
n
,
Sum over all wronglyclassified samples
Score of the wrong class
Score of the true class
,
Vx
nw
n
xDxyL
66
![Page 54: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/54.jpg)
Multiclass Perceptron
Stochastic gradient descent (SGD)
init Wk=0, i=0DO k=k+1
FOR n = 1 to N
IF THEN /* wrongly classified sample */
UNTIL every data is well classified or k > K_MAX.
ntt
njj
xww
xww
nn
Wmaxarg nT xj
itj
67
![Page 55: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/55.jpg)
Perceptron
Advantages:• Very simple• Does NOT assume the data follows a Gaussian distribution.• If data is linearly separable, convergence is guaranteed.
Limitations:• Zero gradient for many solutions => several “perfect solutions”• Data must be linearly separable
Will never convergeSuboptimal solutionMany “optimal”
solutions
![Page 56: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/56.jpg)
Two famous ways of improving the Perceptron
1. New activation function + new Loss
1. New network architectureLogistic regression
Multilayer Perceptron / CNN
73
![Page 57: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/57.jpg)
Logistic regression
74
![Page 58: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/58.jpg)
]1,0[)( xyw
Sigmoid activation function
Logistic regression(2D, 2 classes)
New activation function: sigmoid
te
t
1
11
1w
2w
0w
1x
2x
xw T
75
![Page 59: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/59.jpg)
xw
Tw T
exwσ)x(y
1
1
te
t
1
1
Neuron
1
1w
2w
0w
1x
2x
xwT ]1,0[)( xyw
76
Logistic regression(2D, 2 classes)
New activation function: sigmoid
![Page 60: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/60.jpg)
1
1w
2w
0w
1x
2x
xwxy Tw
)(
77
w
1x
2x5.0)( xyw
0)( xyw
1)( xyw
Neuron
xwT ]1,0[)( xyw
Logistic regression(2D, 2 classes)
New activation function: sigmoid
![Page 61: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/61.jpg)
Example 5.0,6.3,0.2,0.1,4.0 wxn
125.01
194.1)(
94.1
exyw
94.15.04.0*6.32 nT xw
1
6.3
5.0
2
4.0
1
Since 0.125 is lower than 0.5, is behind the plan.nx
xw T
78
Logistic regression(2D, 2 classes)
![Page 62: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/62.jpg)
Like the Perceptron the logistic regression accomodates for K-D vectors
2x
(...)
1
Kw
1w
2w
0w
Kx
xwT
1x
79
Logistic regression(K-D, 2 classes)
xwxy Tw
)(
![Page 63: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/63.jpg)
80
xwxy Tw
)(
What is the loss function?
2x
(...)
1
Kw
1w
2w
0w
Kx
xwT
1x
![Page 64: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/64.jpg)
With a sigmoid, we can simulate a conditional probabilityof c1 GIVEN x
81
xcPxwxy Tw
|)( 1
2x
(...)
1
Kw
1w
2w
0w
Kx
xwT
1x
![Page 65: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/65.jpg)
With a sigmoid, we can simulate a conditional probabilityof c1 GIVEN x
82
xcPxwxy Tw
|)( 1
)(1,|0 xyxwcP w
2x
(...)
1
Kw
1w
2w
0w
Kx
xwT
1x
![Page 66: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/66.jpg)
n
N
nnnw
w xtxywd
DxydL
1
,
Cost function is –ln of the likelihood
We can also show that
2 Class Cross entropy
As opposed to the Perceptronthe gradient does not dependon the wrongly classified samples
nwn
N
nnwnw xytxytDxyL
1ln1ln,1
![Page 67: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/67.jpg)
Logistic Network
Advantages:• More stable than the Perceptron• More effective when the data is non separable
Per
cept
ron
Log
istic
net
87
![Page 68: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/68.jpg)
And for K>2 classes?New activation function : Softmax
exp
normexp
exp
)(0
xyw
c
xw
xw
w Tc
Ti
i e
exy
)(
xWcP
,|0
)(1
xyw
xWcP
,|1
)(2
xyw
xWcP
,|2
1 1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xwT 0
xwT 1
xwT 2
88
2x
1x
![Page 69: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/69.jpg)
Softmax
exp
normexp
exp
)(0
xyw
xWcP
,|0
)(1
xyw
xWcP
,|1
)(2
xyw
xWcP
,|2
1 1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xwT 0
xwT 1
xwT 2
And for K>2 classes?New activation function : Softmax
c
xw
xw
w Tc
Ti
i e
exy
)(
2x
1x
![Page 70: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/70.jpg)
'airplane' t 1000000000
'automobile' t 0100000000
'bird' t 0010000000
'cat' t 0001000000
'deer' t 0000100000
'dog' t 0000010000
'frog' t
0000001000
'horse' t 0000000100
'ship' t 0000000010
'truck' t 0000000001
Class labels : one-hot vectors
And for K>2 classes?
Cifar10
![Page 71: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/71.jpg)
Cross entropy Loss
K>2 classes
91
N
n
K
knWknW xytDxyL
k1 1
ln,
![Page 72: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/72.jpg)
K-Class cross entropy loss
92
N
nknnWn txyxL
1
N
n
K
knWknW xytDxyL
k1 1
ln,
![Page 73: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/73.jpg)
Regularization
31,31,31
0,0,1
0.1,0.1,0.1
2
1
T
T
w
w
x
Different weights may give the same score
121 xwxw TT Maximum a Solution:
Maximum a posteriori
93
![Page 74: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/74.jpg)
,minarg DxyL wW
Loss function
Regularization
Constant
21
Wou WRIn general L1 or L2
WR
Maximum a posterioriRegularization
94
![Page 75: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/75.jpg)
Wow! Loooots of information!
Lets recap…
95
![Page 76: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/76.jpg)
Neural networks2 classes
xcPxyw
|)( 1
Sign activation Sigmoid activation(...)
1
1w
2w
Kw
0w
sgxw T
1,1 xyw
xw T
96
Perceptron Logistic regression
2x
Kx
1x
(...)
1
1w
2w
Kw
0w
2x
Kx
1x
![Page 77: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/77.jpg)
K-Class Neural networks
Softmax activation
exp
normexp
exp
)(0
xyw
xcP
|0
)(1
xyw
xcP
|1
)(2
xyw
xcP
|2
1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xwT 0
xwT 1
xwT 2
xyiW
i
maxarg
1 1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xT 0w
xT 1w
xT 2w
Perceptron
Logistic regression
2x
1x
1
2x
1x
![Page 78: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/78.jpg)
Loss functions
samples classified wrongly ofset theis V ere wh, T
Vx
nnw
n
xwtDxyL
N
nnnw xwtDxyL
1
T,0max,
N
nnnw xwtDxyL
1
T1,0max,
“Hinge Loss” or “SVM” Loss
2 classes
Cross entropy loss
98
nwn
N
nnwnw xytxytDxyL
1ln1ln,1
![Page 79: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/79.jpg)
Loss functions
K classes
Cross entropy loss with a Softmax
99
samples classified wrongly ofset thes V where, ixwxwDxyLVx
nTtn
Tjw
n
n
N
n jn
Ttn
Tjw xwxw,DxyL
n1
10max,
N
n jn
Ttn
Tjw xwxw,DxyL
n1
0max,
N
n
K
knWknw xytDxyL
k1 1
ln,
“Hinge Loss” or “SVM” Loss
![Page 80: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/80.jpg)
Maximum a posteriori
Loss function
Regularization
Constante
,,1
N
nnnWw txylDxyL
21
W or WWR
WR
100
![Page 81: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/81.jpg)
Optimisation
Lkkk ][][]1[ ww
Gradient of the loss function
learning rate
Stochastic gradient descent (SGD)
Initk=0DO k=k+1
FOR n = 1 to N
UNTIL every data is well classified or k== MAX_ITER
nk xLww
][
101
w
![Page 82: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/82.jpg)
Now, lets goDEEPER
![Page 83: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/83.jpg)
Three classical solutions
1. Acquire more data
2. Use a non-linear classifier
3. Transform the data
Non-linearly separable training data
![Page 84: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/84.jpg)
Three classical solutions
1. Acquire more data
2. Use a non-linear classifier
3. Transform the data
Non-linearly separable training data
![Page 85: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/85.jpg)
2D, 2Classes, Linear logistic regression
105
1x
1
2w
1w
0w
2x
Input layer(3 “neurons”)
Output layer(1 neuron with sigmoid)
Rxwσ)x(y Tw
xwT
![Page 86: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/86.jpg)
Let’s add 3 neurons
1 1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
Input layer(3 “neurons”)
First layer(3 neurons)
xwT 0
xwT 1
xwT 2
RxwT
0
RxwT
1
RxwT
22x
1x
![Page 87: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/87.jpg)
NOTE: The output of the first layer is a vector of 3 real values
1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xwT 0
xwT 1
xwT 2
RxwT
0
RxwT
1
RxwT
2
3
2
1
2,21,20,2
2,11,10,1
2,01,00,0 1
R
x
x
www
www
www
1
2x
1x
![Page 88: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/88.jpg)
[0]W x
NOTE: The output of the first layer is a vector of 3 real values
1,0w
2,0w
0,0w
1,1w
2,1w
0,1w
1,2w
2,2w
0,2w
xwT 0
xwT 1
xwT 2
RxwT
0
RxwT
1
RxwT
2
1
2x
1x
![Page 89: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/89.jpg)
If we want a 2-class Classification via a logistic regression (a cross entropy loss) we must add an output neuron.
Input layer(3 « neurons »)
hidden layer(3 neurons)
Output layer(1 neuron)
2-D, 2-Class, 1 hidden layer
1
]0[2,0w
]0[2,2w
]0[2,1w
]0[0,0w
]0[0,2w
]0[0,1w
]0[1,0w
]0[1,2w
]0[1,1w
]1[0w
]1[1w
]1[2w
xw T ]0[0
xw T ]1[1
xw T ]2[2
xWw T ]0[]1[ RxWwxy TW
]0[]1[
2x
1x
![Page 90: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/90.jpg)
1
Visualsimplification
[0]W
1
]0[2,0w
]0[2,2w
]0[2,1w
]0[0,0w
]0[0,2w
]0[0,1w
]0[1,0w
]0[1,2w
]0[1,1w
]1[0w
]1[1w
]1[2w
xw T ]0[0
xw T ]1[1
xw T ]2[2
xWw T ]0[]1[ RxWwxy TW
]0[]1[
2x
1x
1x
0x
]1[w
xWwxy TW
]0[]1[
2-D, 2-Class, 1 hidden layer
![Page 91: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/91.jpg)
Inputlayer
Hiddenlayer
Outputlayer
1
bias neuron.
This network contains a total of 13 parameters
Input layerHidden layer
3x3 1x4
Hidden layerOutput layer
1
[0]W
1x
0x
]1[w
xWwxy TW
]0[]1[
2-D, 2-Class, 1 hidden layer
![Page 92: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/92.jpg)
112
1
1
Increasing the number of neurons = increasing the capacity of the model
This network has 5x3+1x6=21 parameters
2x
1x
Inputlayer
Hiddenlayer
Outputlayer
2-D, 2-Class, 1 hidden layer
xWwxy TW
]0[]1[
![Page 93: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/93.jpg)
No hidden neuron
http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
12 hidden neurons 60 hidden neurons
Nb neurons VS Capacity
Linear classificationUnderfitting(low capacity)
Non linear classificationGood result
(good capacity)
Non linear classificationOver fitting
(too large capacity)
![Page 94: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/94.jpg)
114
1
Increasing the dimensionality of the data = more columns in
This network has 5x(k+1)+1x6 parameters
kD, 2Classes, 1 hidden layer
1x
1
2x
[0]W
(...)
kx
1kx
Inputlayer
Hiddenlayer
Outputlayer
xWwxy TW
]0[]1[
![Page 95: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/95.jpg)
115
1
Adding an hidden layer = Adding a matrix multiplication
This network has 5x(k+1)+6x3 + 1x4 parameters
kD, 2Classes, 2 hidden layers
[0]W
1
[1]W
Inputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
1x
1
2x
(...)
kx
1kx
xWWwxy TW
]0[]1[]2[
Tw ]2[4]2[
63]1[
15]0[
Rw
RW
RW k
![Page 96: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/96.jpg)
kD, 2 Classes, 4 hidden layer network
116
1
kx
1
[0]W
1
[1]W
(...)
1
1
[3]W
[2]W
This network has 5x(k+1)+6x3 + 4x4+7x5+1x8 parameters
8]4[
57]3[
44]2[
63]1[
15]0[
Rw
RW
RW
RW
RW k
xWWWWwxy TW
]0[]1[]2[]3[]4[
]4[w
Inputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
HiddenLayer 3
HiddenLayer 4
1x
2x
3x
![Page 97: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/97.jpg)
kD, 2 Classes, 4 hidden layer network
1
1x
kx
1
2x
[0]W
1
[1]W
(...)
1
1
[3]W
[2]W
8]4[
57]3[
44]2[
63]1[
15]0[
Rw
RW
RW
RW
RW k
xWWWWwxy TW
]0[]1[]2[]3[]4[
]4[w
Inputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
HiddenLayer 3
HiddenLayer 4
NOTE : More hidden layers = deeper network = more capacity.
3x
![Page 98: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/98.jpg)
Multilayer Perceptron
1
1x
2x
1
1
(...)
1
1
Kx
xyW
xf
x
3x
![Page 99: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/99.jpg)
Example
1
1x
2x
1
1
Input data
x
Output of the last layer Output of the network
)(xyW
![Page 100: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/100.jpg)
A K-Class neural networkhas K output neurons.
![Page 101: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/101.jpg)
kD, 4 Classes, 4 hidden layer network
1
kx
1
[0]W
1
[1]W
[4]W
(...)
1
1
[3]W
[2]W
,1Wy x
,0Wy x
,3Wy x
,2Wy x
[4] [3] [2] [1] [0]Wy x W W W W W x
[4]0W
[4]1W
[4]2W
[4]3W
48]4[
57]3[
44]2[
63]1[
15]0[
RW
RW
RW
RW
RW kInputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
HiddenLayer 3
HiddenLayer 4
1x
2x
3x
![Page 102: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/102.jpg)
kD, 4 Classes, 4 hidden layer network
1
kx
1
[0]W
1
[1]W
[4]W
(...)
1
1
[3]W
[2]W
,1Wy x
,0Wy x
,3Wy x
,2Wy x
[4] [3] [2] [1] [0]Wy x W W W W W x
Hinge loss
[4]0W
[4]1W
[4]2W
[4]3W
Inputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
HiddenLayer 3
HiddenLayer 4
1x
2x
3x
![Page 103: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/103.jpg)
Softmax
1
kx
1
[0]W
1
[1]W
[4]W
(...)
1
1
[3]W
[2]W exp
exp
exp
exp
,1Wy xN
ORM
,0Wy x
,3Wy x
,2Wy x
[4] [3] [2] [1] [0]softmaxWy x W W W W W x
Cross entropy
kD, 4 Classes, 4 hidden layer networkInputlayer
HiddenLayer 1
Outputlayer
HiddenLayer 2
HiddenLayer 3
HiddenLayer 4
1x
2x
3x
![Page 104: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/104.jpg)
How to make a prediction?
126
![Page 105: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/105.jpg)
Forward pass
127
1
1x
2x
1
1
x
![Page 106: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/106.jpg)
Forward pass
128
1
1
1
xW]0[
1x
2x
![Page 107: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/107.jpg)
Forward pass
129
1
1
1
xW]0[
1x
2x
![Page 108: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/108.jpg)
Forward pass
130
1
1
1
xWW]0[]1[
1x
2x
![Page 109: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/109.jpg)
Forward pass
131
1
1
1
xWW]0[]1[
1x
2x
![Page 110: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/110.jpg)
Forward pass
132
1
1
1
xWWwT ]0[]1[
1x
2x
![Page 111: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/111.jpg)
Forward pass
133
1
1
1
xWWwxy TW
]0[]1[
1x
2x
![Page 112: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/112.jpg)
Forward pass
134
xytxyttxyl WWW
1ln)1(ln,
1
1
1
txyl W ,
1x
2x
![Page 113: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/113.jpg)
How to optimize the network?
135
0- From
Choose a regularization function
,minarg1
WRtxylWN
nnnW
W
21
or WWWR
![Page 114: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/114.jpg)
How to optimize the network?
1- Choose a loss for example
Hinge lossCross entropy
Do not forget to adjust the output layer withthe loss you have choosen.
cross entropy => Softmax
nnW txyl ,
![Page 115: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/115.jpg)
How to optimize the network?
137
2- Compute the gradient of the loss with respect to each parameter
and launch a gradient descent algorithm to update the parameters.
][
,
1
,
cba
N
nnnW
W
WRtxyl
][
,
1][,
][,
,
cba
N
nnnW
cba
cba W
WRtxyl
WW
![Page 116: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/116.jpg)
How to optimize the network?
138
][
,
1
,
cba
N
nnnW
W
WRtxyl
Backpropagation
1
0x
1x
1
1
xWWwxyW
]0[]1[]2[
![Page 117: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/117.jpg)
139
xytxyttxyl WWW
1ln)1(ln,
1
0x
1x
1
1
txyl W ,
xWWwxy TW
]0[]1[
![Page 118: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/118.jpg)
140
1
0x
1x
1
1
txyl
Fxy
DwF
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A
![Page 119: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/119.jpg)
141
1
0x
1x
1
1
txyl
Fxy
DwF
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A B
![Page 120: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/120.jpg)
142
1
0x
1x
1
1
txyl
Fxy
DwF
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A B
C
![Page 121: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/121.jpg)
143
1
0x
1x
1
1
txyl
Fxy
DwF
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A B
CD
![Page 122: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/122.jpg)
144
1
0x
1x
1
1
txyl
Fxy
DwE
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A B
CD
E
![Page 123: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/123.jpg)
145
1
0x
1x
1
1
txyl
Exy
DwE
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
xWWwxy TW
]0[]1[
A B
CD
E
![Page 124: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/124.jpg)
146
1
0x
1x
1
1
txyl W ,
txyl
Exy
DwE
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
xWWwxy TW
]0[]1[
A B
CD
E
![Page 125: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/125.jpg)
147
][
1
,
l
N
nnnW
W
txyl
Chain rule
![Page 126: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/126.jpg)
Chain rule recap
148
xxv
vvu
uuf
/1)(
2)(
)( 2
2
122
xu
x
v
v
u
u
f
x
f
?
x
f
![Page 127: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/127.jpg)
149
1
0x
1x
1
1
txyl
Exy
DwE
CD
BWC
AB
xWA
W
W
T
,
]1[
]0[
A B
CD
E
]0[]0[
, ,
W
A
A
B
B
C
C
D
D
E
E
xy
xy
txyl
W
txyl W
W
WW
![Page 128: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/128.jpg)
150
1
0x
1x
1
1
txyl W ,
]0[]0[
, ,
W
B
A
B
B
C
C
D
D
E
E
xy
xy
txyl
W
txyl W
W
WW
Back propagation
A B
CD
E xyW
![Page 129: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/129.jpg)
Activation functions
151
![Page 130: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/130.jpg)
Activation functions
Sigmoïde
1
1 xx
e
3 Problems :
• Gradient saturates when input is large
• Not zero centered
• exp() is an expensive operation
![Page 131: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/131.jpg)
Activation functions
Tanh(x)
• Output is zero-centered• Small gradient when input is large
[LeCun et al., 1991]
![Page 132: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/132.jpg)
Activation functions ReLU max 0,x x
ReLU(x)(Rectified Linear Unit)
• Large gradient for x>0• Super fast
• Output non centered at zero• No gradient when x<0?
[Krizhevsky et al., 2012]
![Page 133: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/133.jpg)
Activation functions LReLU max 0.01 ,x x x
Leaky ReLU(x)
• no gradient saturation • Super fast
• 0.01 is an hyperparameter
[Mass et al., 2013][He et al., 2015]
![Page 134: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/134.jpg)
Activation functions PReLU max ,x x x
Parametric ReLU(x)
• no gradient saturation • Super fast
• learn with back prop
[Mass et al., 2013][He et al., 2015]
![Page 135: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/135.jpg)
In practice
• By default, people use ReLU.
• Try Leaky ReLU / PReLU / ELU
• Try tanh but might be sub-optimal
• Do not use sigmoïde except at the output of a 2 class net.
![Page 136: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/136.jpg)
How to classify an image?
https://ml4a.github.io/ml4a/neural_networks/
![Page 137: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/137.jpg)
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
Pixel 11
Pixel 12
Pixel 13
Pixel 14
Biashttps://ml4a.github.io/ml4a/neural_networks/
How to classify an image?
![Page 138: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/138.jpg)
Many parameters(7850 in Layer 1)
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
Pixel 11
Pixel 12
Pixel 13
Pixel 14
biashttps://ml4a.github.io/ml4a/neural_networks/
![Page 139: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/139.jpg)
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
Pixel 11
Pixel 12
Pixel 13
Pixel 14
Pixel 65536https://ml4a.github.io/ml4a/neural_networks/
256x256
Too many parameters(655,370 in Layer 1)
![Page 140: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/140.jpg)
Pixel 1
Pixel 2
Pixel 3
Pixel 4
Pixel 5
Pixel 6
Pixel 7
Pixel 8
Pixel 9
Pixel 10
Pixel 11
Pixel 12
Pixel 13
Pixel 14
Pixel 16Mhttps://ml4a.github.io/ml4a/neural_networks/
256x256x256
Waaay too many parameters(160M in Layer 1)
![Page 141: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/141.jpg)
Full connections are too many
(...)
(...) (...)
1,01 xF
S
S
S
S
1,02 xF
1,03 xF
1,04 xF
150-D input vector with 150 neurons in Layer 1 => 22,500 parameters!!
![Page 142: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/142.jpg)
No full connection
(...) (...) (...)
1,01 xF
S
S
S
S
1,02 xF
1,03 xF
1,04 xF
150-D input vector with 150 neurons in Layer 1 => 450 parameters!!
![Page 143: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/143.jpg)
Share weights
150-D input vector with 150 neurons in Layer 1 => 3 parameters!!
(...)
(...)
(...)
1,01 xF
S
S
S
S
1,02 xF
1,03 xF
1,04 xF
(...)
10w
11w
12w
11w
10w
12w
11w
12w
10w
(...)
20w
21w
22w
22w
21w
20w
21w
22w
20w
1- Learning convolution filters! 2- Small number of parameters = can make it deep!
![Page 144: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/144.jpg)
(...)(...)
10w
11w
12w
(...)
![Page 145: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/145.jpg)
(...)(...)
10w
11w
12w
(...)
3.2
7.1
0.4
8.2
7.5
4.4
8.2
7.5
4.4
![Page 146: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/146.jpg)
(...)(...)
1.02.0
3.0
(...)
3.2
7.1
0.4
8.2
7.5
4.4
8.2
7.5
4.4
![Page 147: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/147.jpg)
(...)(...)
1.02.0
3.0
(...)
3.2
7.1
0.4
8.2
7.5
4.4
25.02.134.23.
8.2
7.5
4.4
![Page 148: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/148.jpg)
(...)(...)
1.02.0
3.0
(...)
3.2
7.1
0.4
8.2
7.5
4.4
25.0
45.084.8.17.
8.2
7.5
4.4
![Page 149: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/149.jpg)
(...)(...)
1.02.0
3.0
(...)
3.2
7.1
0.4
8.2
7.5
4.4
25.0
45.0
50.017.56.4.
8.2
7.5
4.4
![Page 150: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/150.jpg)
(...)(...)
1.02.0
3.0
(...)
3.2
7.1
0.4
8.2
7.5
4.4
25.0
45.0
50.0
8.2
7.5
4.4
50.0
Convolution!
![Page 151: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/151.jpg)
Each neuron of layer 1 is connected to 3x3 pixels, layer 1 has 9 parameters!!
![Page 152: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/152.jpg)
![Page 153: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/153.jpg)
Feature map
]0[WxF
Convolution operation
![Page 154: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/154.jpg)
]0[0Wx ]0[
1Wx
]0[2Wx
]0[3Wx ]0[
4Wx
![Page 155: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/155.jpg)
5-feature mapconvolution layer
![Page 156: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/156.jpg)
![Page 157: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/157.jpg)
K-feature mapconvolution layer
![Page 158: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/158.jpg)
Conv layer 1POOLING
LAYER
![Page 159: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/159.jpg)
Pooling layer
181
Goals
• Reduce the spatial resolution of featuremaps
• Lower memory and computation requirements
• Provide partial invariance to position, scale and rotation
(stride = 1) (stride = 1)
Images: https://pythonmachinelearning.pro/introduction-to-convolutional-neural-networks-for-vision-tasks/
![Page 160: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/160.jpg)
Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2
![Page 161: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/161.jpg)
Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2
(…)
Fully connected layers
xyW
2 Class CNN
xytxyttxyl WWW
1ln)1(ln,
![Page 162: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/162.jpg)
Conv layer 1 Pool layer 1 Conv layer 2 Pool layer 2
(…)
Fully connected layers
K Class CNN
xytxyttxyl WWW
1ln)1(ln,
exp
normexp
exp
)(0
xyw
)(1
xyw
)(2
xyw
SOFTMAX
![Page 163: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/163.jpg)
185
S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018
Nice example from the litterature
![Page 164: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/164.jpg)
Learn image-based caracteristics
http://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
![Page 165: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/165.jpg)
Batch processing
![Page 166: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/166.jpg)
5.0,6.3,0.2
0.1,4.0
w
x
1
xw T
188
1x
2x
![Page 167: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/167.jpg)
5.0,6.3,0.2
0.1,4.0
w
x
1
xw T
189
4.0
0.1
125.01
194.1)(
94.1
exyw
94.15.04.0*6.32 xwT 0.2
6.3
5.0
![Page 168: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/168.jpg)
5.0,6.3,0.2
0.1,4.0
w
x
1
xw T
190
4.0
0.1
125.0
1
4.0
1
5.0,6.3,0.2)(
aw xy
0.2
6.3
5.0
![Page 169: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/169.jpg)
5.0,6.3,0.2
0.3,1.2,0.1,4.0
w
xx ba
1
xw T4.0
0.1
125.0
1
4.0
1
5.0,6.3,0.2)(
aw xy
1
xw T1.2
0.3
99.0
0.3
1.2
1
5.0,6.3,0.2)(
bw xy
0.2
6.3
5.0
0.2
6.3
5.0
![Page 170: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/170.jpg)
Mini-batch processing
192
xw T
1
4.0
0.1
99.0,125.0
0.31
1.24.0
11
5.0,6.3,0.2)(
aw xy
1
1.2
0.3
0.2
6.3
5.0
![Page 171: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/171.jpg)
Mini-batch processing
193
1
xw T4.0
0.1
99.0,125.0,2.0,89.0)( aw xy
1
1.2
0.3
1
5.0
2.0
1
1.1
1.1
0.2
6.3
5.0
![Page 172: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/172.jpg)
Mini-batch processing
HorseDogTruck
![Page 173: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/173.jpg)
Mini-batch processing
Mini-batch of4 images 4 predictions
HorseDogTruck
![Page 174: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/174.jpg)
Classical applications of ConvNets
Classification.
![Page 175: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/175.jpg)
197
Classical applications of ConvNets
Classification.
S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018
![Page 176: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/176.jpg)
Image segmentation
Classical applications of ConvNets
![Page 177: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/177.jpg)
Image segmentation
Classical applications of ConvNets
Tran, P. V., 2016. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv:1604.00494.
![Page 178: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/178.jpg)
Image segmentation
Classical applications of ConvNets
Fang Liu, Zhaoye Zhou, +3 authors, Deep convolutional neural network and 3D deformable approach for tissue segmentation in musculoskeletal magnetic resonance imaging. in Magnetic resonance in medicine 2018 DOI:10.1002/mrm.26841
![Page 179: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/179.jpg)
Image segmentation
Classical applications of ConvNets
Havaei M., Davy A., Warde-Farley D., Biard A., Courville A., Bengio Y., Pal C., Jodoin P-M, Larochelle H. (2017)Brain Tumor Segmentation with Deep Neural Networks, Medical Image Analysis, Vol 35, 18-31
![Page 180: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/180.jpg)
Localization
Classical applications of ConvNets
![Page 181: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/181.jpg)
Localization
Classical applications of ConvNets
S. Banerjee, S. Mitra, A. Sharma, and B. U Shankar, A CADe System for Gliomas in Brain MRI using Convolutional Neural Networks, arXiv:1806.07589v, 2018
![Page 182: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/182.jpg)
Conclusion
• Linear classification (1 neuron network)
• Logistic regression
• Multilayer perceptron
• Conv Nets
• Many buzz words
– Softmax
– Loss
– Batch
– Gradient descent
– Etc.
204
![Page 183: presentation SpringSchool2019d.pptx [Autosaved]€¦ · whps iuht gldjqrvwlf +hdowk\ 6lfn 6lfn « « +hdowk\)uht whps 6lfn +hdowk\ 3dwlhqw 3dwlhqw 3dwlhqw 3dwlhqw 1](https://reader034.fdocuments.us/reader034/viewer/2022042600/5f440db7649c6f682457f7e5/html5/thumbnails/183.jpg)