Download - Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Transcript
Page 1: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Lecture 3 recap

1Prof. Leal-Taixรฉ and Prof. Niessner

Page 2: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Beyond linearโ€ข Linear score function ๐‘“ = ๐‘Š๐‘ฅ

Credit: Li/Karpathy/Johnson

On CIFAR-10

On ImageNet2

Prof. Leal-Taixรฉ and Prof. Niessner

Page 3: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Neural Network

Credit: Li/Karpathy/Johnson

3Prof. Leal-Taixรฉ and Prof. Niessner

Page 4: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Neural Network

Depth

Wid

th

4Prof. Leal-Taixรฉ and Prof. Niessner

Page 5: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Neural Networkโ€ข Linear score function ๐‘“ = ๐‘Š๐‘ฅ

โ€ข Neural network is a nesting of โ€˜functionsโ€™โ€“ 2-layers: ๐‘“ = ๐‘Š2max(0,๐‘Š1๐‘ฅ)

โ€“ 3-layers: ๐‘“ = ๐‘Š3max(0,๐‘Š2max(0,๐‘Š1๐‘ฅ))

โ€“ 4-layers: ๐‘“ = ๐‘Š4 tanh (W3, max(0,๐‘Š2max(0,๐‘Š1๐‘ฅ)))

โ€“ 5-layers: ๐‘“ = ๐‘Š5๐œŽ(๐‘Š4 tanh(W3, max(0,๐‘Š2max(0,๐‘Š1๐‘ฅ))))

โ€“ โ€ฆ up to hundreds of layers

5Prof. Leal-Taixรฉ and Prof. Niessner

Page 6: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Computational Graphsโ€ข Neural network is a computational graph

โ€“ It has compute nodes

โ€“ It has edges that connect nodes

โ€“ It is directional

โ€“ It is organized in โ€˜layersโ€™

6Prof. Leal-Taixรฉ and Prof. Niessner

Page 7: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop conโ€™t

7Prof. Leal-Taixรฉ and Prof. Niessner

Page 8: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

The importance of gradientsโ€ข All optimization schemes are based on computing

gradients

โ€ข One can compute gradients analytically but what if our function is too complex?

โ€ข Break down gradient computation Backpropagation

Rumelhart 19868

Prof. Leal-Taixรฉ and Prof. Niessner

Page 9: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Forward Passโ€ข ๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง Initialization ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

Prof. Leal-Taixรฉ and Prof. Niessner 9

Page 10: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

Prof. Leal-Taixรฉ and Prof. Niessner 10

Page 11: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

๐œ•๐‘“

๐œ•๐‘“

1

Prof. Leal-Taixรฉ and Prof. Niessner 11

Page 12: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

1

๐œ•๐‘“

๐œ•๐‘ง

โˆ’2โˆ’2

Prof. Leal-Taixรฉ and Prof. Niessner 12

Page 13: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

โˆ’2

๐œ•๐‘“

๐œ•๐‘‘

4

1

โˆ’2

Prof. Leal-Taixรฉ and Prof. Niessner 13

Page 14: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

1

4

๐œ•๐‘“

๐œ•๐‘ฆ

4

๐œ•๐‘“

๐œ•๐‘ฆ=๐œ•๐‘“

๐œ•๐‘‘โ‹…๐œ•๐‘‘

๐œ•๐‘ฆ

Chain Rule:

โ†’๐œ•๐‘“

๐œ•๐‘ฆ= 4 โ‹… 1 = 4

4

โˆ’2

Prof. Leal-Taixรฉ and Prof. Niessner 14

Page 15: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Backprop: Backward Pass

with ๐‘ฅ = 1, ๐‘ฆ = โˆ’3, ๐‘ง = 4

mult

sum ๐‘“ = โˆ’8

1

โˆ’3

4

๐‘‘ = โˆ’2

1

โˆ’3

4

๐‘“ ๐‘ฅ, ๐‘ฆ, ๐‘ง = ๐‘ฅ + ๐‘ฆ โ‹… ๐‘ง

๐‘‘ = ๐‘ฅ + ๐‘ฆ๐œ•๐‘‘

๐œ•๐‘ฅ= 1, ๐œ•๐‘‘

๐œ•๐‘ฆ= 1

๐‘“ = ๐‘‘ โ‹… ๐‘ง๐œ•๐‘“

๐œ•๐‘‘= ๐‘ง, ๐œ•๐‘“

๐œ•๐‘ง= ๐‘‘

What is ๐œ•๐‘“๐œ•๐‘ฅ,๐œ•๐‘“

๐œ•๐‘ฆ,๐œ•๐‘“

๐œ•๐‘ง?

1

โˆ’2

44

โˆ’2

๐œ•๐‘“

๐œ•๐‘ฅ=๐œ•๐‘“

๐œ•๐‘‘โ‹…๐œ•๐‘‘

๐œ•๐‘ฅ

Chain Rule:

โ†’๐œ•๐‘“

๐œ•๐‘ฅ= 4 โ‹… 1 = 4

๐œ•๐‘“

๐œ•๐‘ฅ

4

4

4

Prof. Leal-Taixรฉ and Prof. Niessner 15

Page 16: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Compute Graphs -> Neural Networksโ€ข ๐‘ฅ๐‘˜ input variablesโ€ข ๐‘ค๐‘™,๐‘š,๐‘› network weights (note 3 indices)

โ€“ l which layerโ€“ m which neuron in layerโ€“ n weights in neuron

โ€ข ๐‘ฆ๐‘– computed output (i output dim; nout)โ€ข ๐‘ก๐‘– ground truth targetsโ€ข ๐ฟ is loss function

Prof. Leal-Taixรฉ and Prof. Niessner 16

Page 17: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Compute Graphs -> Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 17

๐‘ฅ0

๐‘ฅ1

๐‘ฆ0 ๐‘ก0

Input layer Output layer

e.g., class label / regression target

๐‘ฅ0

๐‘ฅ1 โˆ— ๐‘ค1

โˆ— ๐‘ค0

-๐‘ก0+ x*x

Input Weights(unknowns!)

L2 Lossfunction

Loss/cost

We want to compute gradients w.r.t. all weights w

Page 18: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Compute Graphs -> Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 18

๐‘ฅ0

๐‘ฅ1

๐‘ฆ0 ๐‘ก0

Input layer Output layer

e.g., class label / regression target

๐‘ฅ0

๐‘ฅ1 โˆ— ๐‘ค1

โˆ— ๐‘ค0

-๐‘ก0+ x*x

Input Weights(unknowns!)

L2 Lossfunction

Loss/cost

We want to compute gradients w.r.t. all weights w

max(0, ๐‘ฅ)

ReLU Activation(btw. Iโ€™m not arguingthis is the right choice here)

Page 19: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Compute Graphs -> Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 19

๐‘ฅ0

๐‘ฅ1

๐‘ฆ0 ๐‘ก0

Input layer Output layer

๐‘ฆ1

๐‘ฆ2

๐‘ก1

๐‘ก2

๐‘ฅ0

๐‘ฅ1

โˆ— ๐‘ค0,0

-๐‘ก0+ x*x

Loss/cost

-๐‘ก1+ x*x

Loss/cost

โˆ— ๐‘ค0,1

โˆ— ๐‘ค1,0

โˆ— ๐‘ค1,1

-๐‘ก2+ x*x

Loss/costโˆ— ๐‘ค2,0

โˆ— ๐‘ค2,1

We want to compute gradients w.r.t. all weights w

Page 20: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Compute Graphs -> Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 20

๐‘ฅ0

๐‘ฅ๐‘˜

๐‘ฆ0 ๐‘ก0

Input layer Output layer

๐‘ฆ1 ๐‘ก1

โ€ฆ

๐‘ฆ๐‘– = ๐ด(๐‘๐‘– +

๐‘˜

๐‘ฅ๐‘˜๐‘ค๐‘–,๐‘˜)

๐ฟ๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ก๐‘–2

๐ฟ =

๐‘–

๐ฟ๐‘–

We want to compute gradients w.r.t. all weights w*AND* biases b

L2 loss -> simply sum up squaresEnergy to minimize is E=L

Activation function bias

๐œ•๐ฟ

๐œ•๐‘ค๐‘–,๐‘˜=

๐œ•๐ฟ

๐œ•๐‘ฆ๐‘–โ‹…๐œ•๐‘ฆ๐‘–๐œ•๐‘ค๐‘–,๐‘˜

-> use chain rule to compute partials

Page 21: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent

Optimum

Initialization

21Prof. Leal-Taixรฉ and Prof. Niessner

Page 22: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descentโ€ข From derivative to gradient

โ€ข Gradient steps in direction of negative gradient

Direction of greatest

increase of the function

Learning rate22

Prof. Leal-Taixรฉ and Prof. Niessner

Page 23: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent for Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 23

๐›ป๐‘ค๐‘“๐‘ฅ,๐‘ก (๐‘ค) =

๐œ•๐‘“

๐œ•๐‘ค0,0,0โ€ฆโ€ฆ๐œ•๐‘“

๐œ•๐‘ค๐‘™,๐‘š,๐‘›

For a given training pair {x,t},we want to update all weights;i.e., we need to compute derivativesw.r.t. to all weights

๐‘™ Layers

๐‘šN

euro

ns

Gradient step:๐‘คโ€ฒ = ๐‘ค โˆ’ ๐œ–๐›ป๐‘ค๐‘“๐‘ฅ,๐‘ก (๐‘ค)

Page 24: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent for Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 24

๐‘ฅ0

๐‘ฅ1

๐‘ฅ2

โ„Ž0

โ„Ž1

โ„Ž2

โ„Ž3

๐‘ฆ0

๐‘ฆ1

๐‘ก0

๐‘ก1

๐‘ฆ๐‘– = ๐ด(๐‘1,๐‘– +

๐‘—

โ„Ž๐‘—๐‘ค1,๐‘–,๐‘—)โ„Ž๐‘— = ๐ด(๐‘0,๐‘— +

๐‘˜

๐‘ฅ๐‘˜๐‘ค0,๐‘—,๐‘˜)

๐ฟ๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ก๐‘–2

๐›ป๐‘ค๐‘“๐‘ฅ,๐‘ก (๐‘ค) =

๐œ•๐‘“

๐œ•๐‘ค0,0,0โ€ฆโ€ฆ๐œ•๐‘“

๐œ•๐‘ค๐‘™,๐‘š,๐‘›โ€ฆโ€ฆ๐œ•๐‘“

๐œ•๐‘๐‘™,๐‘š

Just simple: ๐ด ๐‘ฅ = max(0, ๐‘ฅ)

Page 25: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent for Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 25

๐‘ฅ0

๐‘ฅ1

๐‘ฅ2

โ„Ž0

โ„Ž1

โ„Ž2

โ„Ž3

๐‘ฆ0

๐‘ฆ1

๐‘ก0

๐‘ก1

๐‘ฆ๐‘– = ๐ด(๐‘1,๐‘– +

๐‘—

โ„Ž๐‘—๐‘ค1,๐‘–,๐‘—)โ„Ž๐‘— = ๐ด(๐‘0,๐‘— +

๐‘˜

๐‘ฅ๐‘˜๐‘ค0,๐‘—,๐‘˜)

๐ฟ๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ก๐‘–2

Backpropagation

๐œ•๐ฟ

๐œ•๐‘ค1,๐‘–,๐‘—=

๐œ•๐ฟ

๐œ•๐‘ฆ๐‘–โ‹…๐œ•๐‘ฆ๐‘–

๐œ•๐‘ค1,๐‘–,๐‘—

๐œ•๐ฟ

๐œ•๐‘ค0,๐‘—,๐‘˜=

๐œ•๐ฟ

๐œ•๐‘ฆ๐‘–โ‹…๐œ•๐‘ฆ๐‘–๐œ•โ„Ž๐‘—

โ‹…๐œ•โ„Ž๐‘—

๐œ•๐‘ค0,๐‘—,๐‘˜

Just

go

th

rou

gh la

yer

by

laye

r

๐œ•๐ฟ๐‘–๐œ•๐‘ฆ๐‘–

= 2(yi โˆ’ ti)

๐œ•๐‘ฆ๐‘–

๐œ•๐‘ค1,๐‘–,๐‘—= โ„Ž๐‘— if > 0, else 0

โ€ฆ

Page 26: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent for Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 26

๐‘ฅ0

๐‘ฅ1

๐‘ฅ2

โ„Ž0

โ„Ž1

โ„Ž2

โ„Ž3

๐‘ฆ0

๐‘ฆ1

๐‘ก0

๐‘ก1

๐‘ฆ๐‘– = ๐ด(๐‘1,๐‘– +

๐‘—

โ„Ž๐‘—๐‘ค1,๐‘–,๐‘—)

๐ฟ๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ก๐‘–2

How many unknown weights?

Output layer: 2 โ‹… 4 + 2

Hidden Layer: 4 โ‹… 3 + 4

#neurons #biases#input channels

Note that some activations have also weightsโ„Ž๐‘— = ๐ด(๐‘0,๐‘— +

๐‘˜

๐‘ฅ๐‘˜๐‘ค0,๐‘—,๐‘˜)

Page 27: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Derivatives of Cross Entropy Loss

Prof. Leal-Taixรฉ and Prof. Niessner 27

[Sadowski]

Page 28: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Derivatives of Cross Entropy Loss

Prof. Leal-Taixรฉ and Prof. Niessner 28

[Sadowski]

Gradients of weights of last layer:

Page 29: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Derivatives of Cross Entropy Loss

Prof. Leal-Taixรฉ and Prof. Niessner 29

[Sadowski]

Gradients of weights of first layer:

โ„Ž๐‘—

Page 30: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Derivatives of Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 30

Combining nodes:Linear activation node + hinge loss + regularization

Credit: Li/Karpathy/Johnson

Page 31: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Can become quite complexโ€ฆ

Prof. Leal-Taixรฉ and Prof. Niessner 31

๐‘™ Layers

๐‘šN

euro

ns

Page 32: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Can become quite complexโ€ฆโ€ข These graphs can be huge!

32Prof. Leal-Taixรฉ and Prof. Niessner

Page 33: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

The flow of the gradientsA

ctiv

atio

ns

Activation function 33Prof. Leal-Taixรฉ and Prof. Niessner

Page 34: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

The flow of the gradientsA

ctiv

atio

ns

Activation function 34Prof. Leal-Taixรฉ and Prof. Niessner

Page 35: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

The flow of the gradients

โ€ข Many many many many of these nodes form a neural network

โ€ข Each one has its own work to do

NEURONS

FORWARD AND BACKWARD PASS

35Prof. Leal-Taixรฉ and Prof. Niessner

Page 36: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient descent

36Prof. Leal-Taixรฉ and Prof. Niessner

Page 37: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descent

Optimum

Initialization

37Prof. Leal-Taixรฉ and Prof. Niessner

Page 38: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descentโ€ข From derivative to gradient

โ€ข Gradient steps in direction of negative gradient

Direction of greatest

increase of the function

Learning rate38

Prof. Leal-Taixรฉ and Prof. Niessner

Page 39: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Gradient Descentโ€ข How to pick good learning rate?

โ€ข How to compute gradient for single training pair?

โ€ข How to compute gradient for large training set?

โ€ข How to speed things up

Prof. Leal-Taixรฉ and Prof. Niessner 39

Page 40: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

Next lectureโ€ข This week:

โ€“ No tutorial this Thursday (due to MaiTUM)โ€“ Exercise 1 will be released on Thursday as planned

โ€ข Next lecture on May 7th: โ€“ Optimization of Neural Networksโ€“ In particular, introduction to SGD (our main method!)

40Prof. Leal-Taixรฉ and Prof. Niessner

Page 41: Lecture 3 recapย ยท Neural Network โ€ขLinear score function ๐‘“=๐‘Š โ€ขNeural network is a nesting of โ€˜functionsโ€™ โ€“2-layers: ๐‘“=๐‘Š2max(0,๐‘Š1 ) โ€“3-layers: ๐‘“=๐‘Š3max(0,๐‘Š2max(0,๐‘Š1

See you next week!

Prof. Leal-Taixรฉ and Prof. Niessner 41


Top Related