Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3...
Transcript of Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3...
![Page 1: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/1.jpg)
Lecture 6 recap
Prof. Leal-Taixé and Prof. Niessner 1
![Page 2: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/2.jpg)
Neural Network
Depth
Wid
th
Prof. Leal-Taixé and Prof. Niessner 2
![Page 3: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/3.jpg)
Gradient Descent for Neural Networks
𝑥0
𝑥1
𝑥2
ℎ0
ℎ1
ℎ2
ℎ3
𝑦0
𝑦1
𝑡0
𝑡1
𝑦𝑖 = 𝐴(𝑏1,𝑖 +
𝑗
ℎ𝑗𝑤1,𝑖,𝑗)ℎ𝑗 = 𝐴(𝑏0,𝑗 +
𝑘
𝑥𝑘𝑤0,𝑗,𝑘)
𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2
𝛻𝑤,𝑏𝑓𝑥,𝑡 (𝑤) =
𝜕𝑓
𝜕𝑤0,0,0……𝜕𝑓
𝜕𝑤𝑙,𝑚,𝑛……𝜕𝑓
𝜕𝑏𝑙,𝑚
Just simple: 𝐴 𝑥 = max(0, 𝑥)Prof. Leal-Taixé and Prof. Niessner 3
![Page 4: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/4.jpg)
Stochastic Gradient Descent (SGD)𝜃𝑘+1 = 𝜃𝑘 − 𝛼𝛻𝜃𝐿(𝜃
𝑘 , 𝑥{1..𝑚}, 𝑦{1..𝑚})
𝛻𝜃𝐿 =1
𝑚σ𝑖=1𝑚 𝛻𝜃𝐿𝑖
+ all variations of SGD: momentum, RMSProp, Adam, …
𝑘 now refers to 𝑘-th iteration
𝑚 training samples in the current batch
Gradient for the 𝑘-th batch
Prof. Leal-Taixé and Prof. Niessner 4
![Page 5: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/5.jpg)
Importance of Learning Rate
Prof. Leal-Taixé and Prof. Niessner 5
![Page 6: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/6.jpg)
Over- and Underfitting
Underfitted Appropriate Overfitted
Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017
Prof. Leal-Taixé and Prof. Niessner 6
![Page 7: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/7.jpg)
Over- and Underfitting
Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html
Prof. Leal-Taixé and Prof. Niessner 7
![Page 8: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/8.jpg)
Basic recipe for machine learning• Split your data
Find your hyperparameters
20%
train testvalidation
20%60%
Prof. Leal-Taixé and Prof. Niessner 8
![Page 9: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/9.jpg)
Basic recipe for machine learning
Prof. Leal-Taixé and Prof. Niessner 9
![Page 10: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/10.jpg)
Basically…
Prof. Leal-Taixé and Prof. Niessner 10Deep learning memes
![Page 11: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/11.jpg)
Fun things…
Prof. Leal-Taixé and Prof. Niessner 11Deep learning memes
![Page 12: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/12.jpg)
Fun things…
Prof. Leal-Taixé and Prof. Niessner 12Deep learning memes
![Page 13: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/13.jpg)
Fun things…
Prof. Leal-Taixé and Prof. Niessner 13Deep learning memes
![Page 14: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/14.jpg)
Going Deep into Neural Networks
Prof. Leal-Taixé and Prof. Niessner 14
![Page 15: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/15.jpg)
Simple Starting Points for Debugging• Start simple!
– First, overfit to a single training sample– Second, overfit to several training samples
• Always try simple architecture first– It will verify that you are learning something
• Estimate timings (how long for each epoch?)
Prof. Leal-Taixé and Prof. Niessner 15
![Page 16: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/16.jpg)
Neural Network
• Problems of going deeper…– Vanishing gradients (multiplication of chain rule)
• The impact of small decisions (architecture, activation functions...)
• Is my network training correctly?
Prof. Leal-Taixé and Prof. Niessner 16
![Page 17: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/17.jpg)
Neural Networks
Prof. Leal-Taixé and Prof. Niessner 17
3) Input of data
2) Functions in Neurons
1) Output functions
![Page 18: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/18.jpg)
Output Functions
Prof. Leal-Taixé and Prof. Niessner 18
![Page 19: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/19.jpg)
Neural NetworksWhat is the shape of this function?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixé and Prof. Niessner 19
![Page 20: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/20.jpg)
Sigmoid for Binary Predictions
0
Can be interpreted as a probability
1
Prof. Leal-Taixé and Prof. Niessner 20
![Page 21: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/21.jpg)
Softmax formulation• What if we have multiple classes?
Prof. Leal-Taixé and Prof. Niessner 21
![Page 22: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/22.jpg)
Softmax formulation• What if we have multiple classes?
Softmax
Prof. Leal-Taixé and Prof. Niessner 22
![Page 23: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/23.jpg)
Softmax formulation• What if we have multiple classes?
Softmax
Prof. Leal-Taixé and Prof. Niessner 23
![Page 24: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/24.jpg)
Softmax formulation• Softmax
• Softmax loss (Maximum Likelihood Estimate)
exp
normalize
Prof. Leal-Taixé and Prof. Niessner 24
![Page 25: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/25.jpg)
Loss Functions
Prof. Leal-Taixé and Prof. Niessner 25
![Page 26: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/26.jpg)
Naïve Losses
- Sum of squared differences (SSD)- Prune to outliers- Compute-efficient (optimization)- Optimum is the mean
L2 Loss: 𝐿2 = σ𝑖=1𝑛 𝑦𝑖 − 𝑓 𝑥𝑖
2
L1 Loss: 𝐿1 = σ𝑖=1𝑛 |𝑦𝑖 − 𝑓(𝑥𝑖)|
12 24 42 23
34 32 5 2
12 31 12 31
31 64 5 13
15 20 40 25
34 32 5 2
12 31 12 31
31 64 5 13
𝑥𝑖 𝑦𝑖
𝐿1 𝑥, 𝑦 = 3 + 4 + 2 + 2 + 0 +⋯+ 0 = 15
𝐿2 𝑥, 𝑦 = 9 + 16 + 4 + 4 + 0 +⋯+ 0 = 66- Sum of absolute differences- Robust- Costly to compute- Optimum is the median
Prof. Leal-Taixé and Prof. Niessner 26
![Page 27: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/27.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
Prof. Leal-Taixé and Prof. Niessner 27
![Page 28: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/28.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
3.25.1-1.7
Prof. Leal-Taixé and Prof. Niessner 28
![Page 29: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/29.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
3.25.1-1.7
24.5164.00.18
exp
Prof. Leal-Taixé and Prof. Niessner 29
![Page 30: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/30.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
norm
aliz
e
Prof. Leal-Taixé and Prof. Niessner 30
![Page 31: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/31.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss 2.04 0.14 6.94
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
2.040.146.94no
rmal
ize
-lo
g(x
)
Prof. Leal-Taixé and Prof. Niessner 31
![Page 32: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/32.jpg)
Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss 2.04 0.079 6.156
sco
res
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
2.040.146.94no
rmal
ize
-lo
g(x
)
𝐿 =1
𝑁
𝑖=1
𝑁
𝐿𝑖 =
=𝐿1 + 𝐿2 + 𝐿3
3=
=2.04 + 0.079 + 6.156
3=
= 𝟐. 𝟕𝟔Prof. Leal-Taixé and Prof. Niessner 32
![Page 33: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/33.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 33
![Page 34: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/34.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
Prof. Leal-Taixé and Prof. Niessner 34
![Page 35: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/35.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
Prof. Leal-Taixé and Prof. Niessner 35
![Page 36: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/36.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”sc
ore
s
Prof. Leal-Taixé and Prof. Niessner 36
![Page 37: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/37.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
𝐿1 =max(0, 5.1 − 3.2 + 1) +max(0, −1.7 − 3.2 + 1) == max 0, 2.9 + max(0, −3.9) == 2.9 + 0 == 𝟐. 𝟗
2.9
sco
res
Prof. Leal-Taixé and Prof. Niessner 37
![Page 38: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/38.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss
𝐿2 =max(0, 1.3 − 4.9 + 1) +max(0, 2.0 − 4.9 + 1) == max 0, −2.6 + max 0,−1.9 == 0 + 0 == 𝟎
2.9 0
sco
res
Prof. Leal-Taixé and Prof. Niessner 38
![Page 39: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/39.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss 2.9 0 12.9
𝐿3 =max(0, 2.2 − (−3.1) + 1) +max(0, 2.5 − (−3.1) + 1) == max 0, 6.3 + max 0, 6.6 == 6.3 + 6.6 == 𝟏𝟐. 𝟗
sco
res
Prof. Leal-Taixé and Prof. Niessner 39
![Page 40: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/40.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Suppose: 3 training examples and 3 classes
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchair“car”
Loss 2.9 0 12.9
𝐿 =1
𝑁
𝑖=1
𝑁
𝐿𝑖 =
=𝐿1 + 𝐿2 + 𝐿3
3=
=2.9 + 0 + 10.9
3=
= 𝟒. 𝟔
sco
res
Full Loss (over all pairs):
Prof. Leal-Taixé and Prof. Niessner 40
![Page 41: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/41.jpg)
Hinge Loss vs Softmax
Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 41
![Page 42: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/42.jpg)
Hinge Loss vs Softmax
Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Given the following scores:
𝑠 = [5,−3, 2]
𝑠 = [5, 10, 10]
𝑠 = [5, −20,−20]
𝑦𝑖 = 0
Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 42
![Page 43: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/43.jpg)
Hinge Loss vs Softmax
Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Given the following scores:
𝑠 = [5,−3, 2]
𝑠 = [5, 10, 10]
𝑠 = [5, −20,−20]
𝑦𝑖 = 0
Hinge loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0
max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12
max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0
Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 43
![Page 44: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/44.jpg)
Hinge Loss vs Softmax
Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Given the following scores:
𝑠 = [5,−3, 2]
𝑠 = [5, 10, 10]
𝑠 = [5, −20,−20]
𝑦𝑖 = 0
Hinge loss: Softmax loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0
max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12
max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0
Google…-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05
Google…-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70
Google…-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11
Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 44
![Page 45: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/45.jpg)
Hinge Loss vs Softmax
Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Given the following scores:
𝑠 = [5,−3, 2]
𝑠 = [5, 10, 10]
𝑠 = [5, −20,−20]
𝑦𝑖 = 0
Hinge loss: Softmax loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0
max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12
max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0
Google…-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05
Google…-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70
Google…-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11
Softmax *always* wants to improve!Hinge Loss saturates
Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Prof. Leal-Taixé and Prof. Niessner 45
![Page 46: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/46.jpg)
Loss in Compute Graph
𝑊
𝑥𝑖
𝑓(𝑥𝑖 ,𝑊)SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
𝑦𝑖
𝐿score function
regularization loss
data lossesSoftmax 𝐿𝑖 = − log(
𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Full Loss 𝐿 =1
𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)
e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖
2
labeled data
input data
Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Prof. Leal-Taixé and Prof. Niessner 46
![Page 47: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/47.jpg)
Compute Graphs
SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Softmax 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Full Loss 𝐿 =1
𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)
e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖
2
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Prof. Leal-Taixé and Prof. Niessner 47
![Page 48: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/48.jpg)
Compute Graphs
SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)
Softmax 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖
σ𝑖 𝑒𝑠𝑗)
Full Loss 𝐿 =1
𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)
e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖
2
Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁
𝑇
Want to find optimal 𝑊. I.e., weights are unknowns ofoptimization problem
Compute gradient w.r.t. 𝑊.
Gradient 𝛻𝑊𝐿 is computed via backpropagation
Prof. Leal-Taixé and Prof. Niessner 48
![Page 49: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/49.jpg)
Weight Regularization & SVM LossMulticlass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖+ 1)
𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
𝑤𝑖2
𝐿1-reg: 𝑅1 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
|𝑤𝑖|
Full loss 𝐿 = 1
𝑁σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
max 0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖 + 1 + 𝜆𝑅(𝑊)
Prof. Leal-Taixé and Prof. Niessner 49
![Page 50: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/50.jpg)
Weight Regularization & SVM LossMulticlass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖
max(0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖+ 1)
𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
𝑤𝑖2
𝐿1-reg: 𝑅1 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
|𝑤𝑖|
Full loss 𝐿 = 1
𝑁σ𝑖=1𝑁 σ𝑗≠𝑦𝑖
max 0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖 + 1 + 𝜆𝑅(𝑊)
𝑤2 = [0.25, 0.25, 0.25, 0.25]
𝑥 = 1,1,1,1
𝑤1 = [1, 0, 0, 0]
Example:
𝑤1𝑇𝑥 = 𝑤2
𝑇𝑥 = 1
𝑅2 𝑤1 = 1
𝑅2 𝑤2 = 0.252 + 0.252 + 0.252 + 0.252 = 0.25
Prof. Leal-Taixé and Prof. Niessner 50
![Page 51: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/51.jpg)
Activation Functions
Prof. Leal-Taixé and Prof. Niessner 51
![Page 52: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/52.jpg)
Neurons
Prof. Leal-Taixé and Prof. Niessner 52
![Page 53: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/53.jpg)
Neurons
Prof. Leal-Taixé and Prof. Niessner 53
![Page 54: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/54.jpg)
Neural NetworksWhat is the shape of this function?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixé and Prof. Niessner 54
![Page 55: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/55.jpg)
Activation Functions or Hidden Units
Prof. Leal-Taixé and Prof. Niessner 55
![Page 56: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/56.jpg)
Sigmoid
Can be interpreted as a probability
Prof. Leal-Taixé and Prof. Niessner 56
![Page 57: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/57.jpg)
SigmoidForward
Prof. Leal-Taixé and Prof. Niessner 57
![Page 58: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/58.jpg)
SigmoidForward
Saturated neurons kill the gradient flow
Prof. Leal-Taixé and Prof. Niessner 58
![Page 59: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/59.jpg)
SigmoidForward
Active region for gradient
descent
Prof. Leal-Taixé and Prof. Niessner 59
![Page 60: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/60.jpg)
Sigmoid
Output is always positive
Prof. Leal-Taixé and Prof. Niessner 60
![Page 61: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/61.jpg)
Problem of Positive Output
We want to compute the gradient wrt the weights
Prof. Leal-Taixé and Prof. Niessner 61
![Page 62: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/62.jpg)
Problem of Positive Output
We want to compute the gradient wrt the weights
Prof. Leal-Taixé and Prof. Niessner 62
![Page 63: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/63.jpg)
Problem of Positive Output
It is going to be either positive or negative for all weights
Prof. Leal-Taixé and Prof. Niessner 63
![Page 64: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/64.jpg)
Problem of Positive Output
More on zero-mean data later
Prof. Leal-Taixé and Prof. Niessner 64
![Page 65: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/65.jpg)
tanh
Zero-centered
Still saturates
Still saturates
LeCun 1991Prof. Leal-Taixé and Prof. Niessner 65
![Page 66: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/66.jpg)
Rectified Linear Units (ReLU)
Large and consistent gradients
Does not saturateFast convergence
Krizhevsky 2012Prof. Leal-Taixé and Prof. Niessner 66
![Page 67: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/67.jpg)
Rectified Linear Units (ReLU)
Large and consistent gradients
Does not saturateFast convergence
What happens if a ReLU outputs zero?
Dead ReLU
Prof. Leal-Taixé and Prof. Niessner 67
![Page 68: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/68.jpg)
Rectified Linear Units (ReLU)• Initializing ReLU neurons with slightly positive biases
(0.1) makes it likely that they stay active for most inputs
Prof. Leal-Taixé and Prof. Niessner 68
![Page 69: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/69.jpg)
Leaky ReLU
Does not die
Mass 2013Prof. Leal-Taixé and Prof. Niessner 69
![Page 70: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/70.jpg)
Parametric ReLU
Does not die
One more parameter to backprop into
He 2015Prof. Leal-Taixé and Prof. Niessner 70
![Page 71: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/71.jpg)
Maxout units
Goodfellow 2013Prof. Leal-Taixé and Prof. Niessner 71
𝑀𝑎𝑥𝑜𝑢𝑡 = max(𝑤1𝑇𝑥 + 𝑏1, 𝑤2
𝑇𝑥 + 𝑏2)
![Page 72: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/72.jpg)
Maxout units
Piecewise linear approximation of a convex function with N pieces
Goodfellow 2013Prof. Leal-Taixé and Prof. Niessner 72
![Page 73: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/73.jpg)
Maxout units
Generalization of ReLUs
Linear regimes
Does not die
Does not saturate
Increase of the number of parametersProf. Leal-Taixé and Prof. Niessner 73
![Page 74: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/74.jpg)
Quick Guide• Sigmoid is not really used
• ReLU is the standard choice
• Second choice are the variants of ReLu or Maxout
• Recurrent nets will require tanh or similar
Prof. Leal-Taixé and Prof. Niessner 74
![Page 75: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/75.jpg)
Neural Networks
Data pre-processing?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixé and Prof. Niessner 75
![Page 76: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/76.jpg)
Data pre-processing
For images subtract the mean image (AlexNet) or per-channel mean (VGG-Net)Prof. Leal-Taixé and Prof. Niessner 76
![Page 77: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/77.jpg)
Outlook
Prof. Leal-Taixé and Prof. Niessner 77
![Page 78: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/78.jpg)
Outlook
Prof. Leal-Taixé and Prof. Niessner 78
Regularization in the optimization:- Dropout, weight decay, etc..
Regularization in the architecture:- Convolutions! (CNNs)
Init. of optimization- How to set weights at beginning
Handling limited training data- Data augmentation
![Page 79: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”](https://reader033.fdocuments.us/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/79.jpg)
Next lecture
• Next lecture Dec 13th
– More about training neural networks; regularization, BN, dropout, etc.
– Followed by CNNs
Prof. Leal-Taixé and Prof. Niessner 79