Lecture 3 recap
1Prof. Leal-Taixรฉ and Prof. Niessner
Beyond linearโข Linear score function ๐ = ๐๐ฅ
Credit: Li/Karpathy/Johnson
On CIFAR-10
On ImageNet2
Prof. Leal-Taixรฉ and Prof. Niessner
Neural Network
Credit: Li/Karpathy/Johnson
3Prof. Leal-Taixรฉ and Prof. Niessner
Neural Network
Depth
Wid
th
4Prof. Leal-Taixรฉ and Prof. Niessner
Neural Networkโข Linear score function ๐ = ๐๐ฅ
โข Neural network is a nesting of โfunctionsโโ 2-layers: ๐ = ๐2max(0,๐1๐ฅ)
โ 3-layers: ๐ = ๐3max(0,๐2max(0,๐1๐ฅ))
โ 4-layers: ๐ = ๐4 tanh (W3, max(0,๐2max(0,๐1๐ฅ)))
โ 5-layers: ๐ = ๐5๐(๐4 tanh(W3, max(0,๐2max(0,๐1๐ฅ))))
โ โฆ up to hundreds of layers
5Prof. Leal-Taixรฉ and Prof. Niessner
Computational Graphsโข Neural network is a computational graph
โ It has compute nodes
โ It has edges that connect nodes
โ It is directional
โ It is organized in โlayersโ
6Prof. Leal-Taixรฉ and Prof. Niessner
Backprop conโt
7Prof. Leal-Taixรฉ and Prof. Niessner
The importance of gradientsโข All optimization schemes are based on computing
gradients
โข One can compute gradients analytically but what if our function is too complex?
โข Break down gradient computation Backpropagation
Rumelhart 19868
Prof. Leal-Taixรฉ and Prof. Niessner
Backprop: Forward Passโข ๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง Initialization ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
Prof. Leal-Taixรฉ and Prof. Niessner 9
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
Prof. Leal-Taixรฉ and Prof. Niessner 10
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
๐๐
๐๐
1
Prof. Leal-Taixรฉ and Prof. Niessner 11
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
1
๐๐
๐๐ง
โ2โ2
Prof. Leal-Taixรฉ and Prof. Niessner 12
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
โ2
๐๐
๐๐
4
1
โ2
Prof. Leal-Taixรฉ and Prof. Niessner 13
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
1
4
๐๐
๐๐ฆ
4
๐๐
๐๐ฆ=๐๐
๐๐โ ๐๐
๐๐ฆ
Chain Rule:
โ๐๐
๐๐ฆ= 4 โ 1 = 4
4
โ2
Prof. Leal-Taixรฉ and Prof. Niessner 14
Backprop: Backward Pass
with ๐ฅ = 1, ๐ฆ = โ3, ๐ง = 4
mult
sum ๐ = โ8
1
โ3
4
๐ = โ2
1
โ3
4
๐ ๐ฅ, ๐ฆ, ๐ง = ๐ฅ + ๐ฆ โ ๐ง
๐ = ๐ฅ + ๐ฆ๐๐
๐๐ฅ= 1, ๐๐
๐๐ฆ= 1
๐ = ๐ โ ๐ง๐๐
๐๐= ๐ง, ๐๐
๐๐ง= ๐
What is ๐๐๐๐ฅ,๐๐
๐๐ฆ,๐๐
๐๐ง?
1
โ2
44
โ2
๐๐
๐๐ฅ=๐๐
๐๐โ ๐๐
๐๐ฅ
Chain Rule:
โ๐๐
๐๐ฅ= 4 โ 1 = 4
๐๐
๐๐ฅ
4
4
4
Prof. Leal-Taixรฉ and Prof. Niessner 15
Compute Graphs -> Neural Networksโข ๐ฅ๐ input variablesโข ๐ค๐,๐,๐ network weights (note 3 indices)
โ l which layerโ m which neuron in layerโ n weights in neuron
โข ๐ฆ๐ computed output (i output dim; nout)โข ๐ก๐ ground truth targetsโข ๐ฟ is loss function
Prof. Leal-Taixรฉ and Prof. Niessner 16
Compute Graphs -> Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 17
๐ฅ0
๐ฅ1
๐ฆ0 ๐ก0
Input layer Output layer
e.g., class label / regression target
๐ฅ0
๐ฅ1 โ ๐ค1
โ ๐ค0
-๐ก0+ x*x
Input Weights(unknowns!)
L2 Lossfunction
Loss/cost
We want to compute gradients w.r.t. all weights w
Compute Graphs -> Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 18
๐ฅ0
๐ฅ1
๐ฆ0 ๐ก0
Input layer Output layer
e.g., class label / regression target
๐ฅ0
๐ฅ1 โ ๐ค1
โ ๐ค0
-๐ก0+ x*x
Input Weights(unknowns!)
L2 Lossfunction
Loss/cost
We want to compute gradients w.r.t. all weights w
max(0, ๐ฅ)
ReLU Activation(btw. Iโm not arguingthis is the right choice here)
Compute Graphs -> Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 19
๐ฅ0
๐ฅ1
๐ฆ0 ๐ก0
Input layer Output layer
๐ฆ1
๐ฆ2
๐ก1
๐ก2
๐ฅ0
๐ฅ1
โ ๐ค0,0
-๐ก0+ x*x
Loss/cost
-๐ก1+ x*x
Loss/cost
โ ๐ค0,1
โ ๐ค1,0
โ ๐ค1,1
-๐ก2+ x*x
Loss/costโ ๐ค2,0
โ ๐ค2,1
We want to compute gradients w.r.t. all weights w
Compute Graphs -> Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 20
๐ฅ0
๐ฅ๐
๐ฆ0 ๐ก0
Input layer Output layer
๐ฆ1 ๐ก1
โฆ
๐ฆ๐ = ๐ด(๐๐ +
๐
๐ฅ๐๐ค๐,๐)
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐2
๐ฟ =
๐
๐ฟ๐
We want to compute gradients w.r.t. all weights w*AND* biases b
L2 loss -> simply sum up squaresEnergy to minimize is E=L
Activation function bias
๐๐ฟ
๐๐ค๐,๐=
๐๐ฟ
๐๐ฆ๐โ ๐๐ฆ๐๐๐ค๐,๐
-> use chain rule to compute partials
Gradient Descent
Optimum
Initialization
21Prof. Leal-Taixรฉ and Prof. Niessner
Gradient Descentโข From derivative to gradient
โข Gradient steps in direction of negative gradient
Direction of greatest
increase of the function
Learning rate22
Prof. Leal-Taixรฉ and Prof. Niessner
Gradient Descent for Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 23
๐ป๐ค๐๐ฅ,๐ก (๐ค) =
๐๐
๐๐ค0,0,0โฆโฆ๐๐
๐๐ค๐,๐,๐
For a given training pair {x,t},we want to update all weights;i.e., we need to compute derivativesw.r.t. to all weights
๐ Layers
๐N
euro
ns
Gradient step:๐คโฒ = ๐ค โ ๐๐ป๐ค๐๐ฅ,๐ก (๐ค)
Gradient Descent for Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 24
๐ฅ0
๐ฅ1
๐ฅ2
โ0
โ1
โ2
โ3
๐ฆ0
๐ฆ1
๐ก0
๐ก1
๐ฆ๐ = ๐ด(๐1,๐ +
๐
โ๐๐ค1,๐,๐)โ๐ = ๐ด(๐0,๐ +
๐
๐ฅ๐๐ค0,๐,๐)
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐2
๐ป๐ค๐๐ฅ,๐ก (๐ค) =
๐๐
๐๐ค0,0,0โฆโฆ๐๐
๐๐ค๐,๐,๐โฆโฆ๐๐
๐๐๐,๐
Just simple: ๐ด ๐ฅ = max(0, ๐ฅ)
Gradient Descent for Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 25
๐ฅ0
๐ฅ1
๐ฅ2
โ0
โ1
โ2
โ3
๐ฆ0
๐ฆ1
๐ก0
๐ก1
๐ฆ๐ = ๐ด(๐1,๐ +
๐
โ๐๐ค1,๐,๐)โ๐ = ๐ด(๐0,๐ +
๐
๐ฅ๐๐ค0,๐,๐)
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐2
Backpropagation
๐๐ฟ
๐๐ค1,๐,๐=
๐๐ฟ
๐๐ฆ๐โ ๐๐ฆ๐
๐๐ค1,๐,๐
๐๐ฟ
๐๐ค0,๐,๐=
๐๐ฟ
๐๐ฆ๐โ ๐๐ฆ๐๐โ๐
โ ๐โ๐
๐๐ค0,๐,๐
Just
go
th
rou
gh la
yer
by
laye
r
๐๐ฟ๐๐๐ฆ๐
= 2(yi โ ti)
๐๐ฆ๐
๐๐ค1,๐,๐= โ๐ if > 0, else 0
โฆ
Gradient Descent for Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 26
๐ฅ0
๐ฅ1
๐ฅ2
โ0
โ1
โ2
โ3
๐ฆ0
๐ฆ1
๐ก0
๐ก1
๐ฆ๐ = ๐ด(๐1,๐ +
๐
โ๐๐ค1,๐,๐)
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐2
How many unknown weights?
Output layer: 2 โ 4 + 2
Hidden Layer: 4 โ 3 + 4
#neurons #biases#input channels
Note that some activations have also weightsโ๐ = ๐ด(๐0,๐ +
๐
๐ฅ๐๐ค0,๐,๐)
Derivatives of Cross Entropy Loss
Prof. Leal-Taixรฉ and Prof. Niessner 27
[Sadowski]
Derivatives of Cross Entropy Loss
Prof. Leal-Taixรฉ and Prof. Niessner 28
[Sadowski]
Gradients of weights of last layer:
Derivatives of Cross Entropy Loss
Prof. Leal-Taixรฉ and Prof. Niessner 29
[Sadowski]
Gradients of weights of first layer:
โ๐
Derivatives of Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 30
Combining nodes:Linear activation node + hinge loss + regularization
Credit: Li/Karpathy/Johnson
Can become quite complexโฆ
Prof. Leal-Taixรฉ and Prof. Niessner 31
๐ Layers
๐N
euro
ns
Can become quite complexโฆโข These graphs can be huge!
32Prof. Leal-Taixรฉ and Prof. Niessner
The flow of the gradientsA
ctiv
atio
ns
Activation function 33Prof. Leal-Taixรฉ and Prof. Niessner
The flow of the gradientsA
ctiv
atio
ns
Activation function 34Prof. Leal-Taixรฉ and Prof. Niessner
The flow of the gradients
โข Many many many many of these nodes form a neural network
โข Each one has its own work to do
NEURONS
FORWARD AND BACKWARD PASS
35Prof. Leal-Taixรฉ and Prof. Niessner
Gradient descent
36Prof. Leal-Taixรฉ and Prof. Niessner
Gradient Descent
Optimum
Initialization
37Prof. Leal-Taixรฉ and Prof. Niessner
Gradient Descentโข From derivative to gradient
โข Gradient steps in direction of negative gradient
Direction of greatest
increase of the function
Learning rate38
Prof. Leal-Taixรฉ and Prof. Niessner
Gradient Descentโข How to pick good learning rate?
โข How to compute gradient for single training pair?
โข How to compute gradient for large training set?
โข How to speed things up
Prof. Leal-Taixรฉ and Prof. Niessner 39
Next lectureโข This week:
โ No tutorial this Thursday (due to MaiTUM)โ Exercise 1 will be released on Thursday as planned
โข Next lecture on May 7th: โ Optimization of Neural Networksโ In particular, introduction to SGD (our main method!)
40Prof. Leal-Taixรฉ and Prof. Niessner
See you next week!
Prof. Leal-Taixรฉ and Prof. Niessner 41
Top Related