Lecture 21 Network Function and s-domain Analysis Hung-yi Lee.
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
-
Upload
jack-basil-young -
Category
Documents
-
view
218 -
download
3
Transcript of Backpropagation An efficient way to compute the gradient Hung-yi Lee.
![Page 1: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/1.jpg)
BackpropagationAn efficient way
to compute the gradientHung-yi Lee
![Page 2: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/2.jpg)
Review: Notation
……
nodeslNLayer l
……
……
Layer 1lnodes1lN
……
1
2
j
1
2
i
la1
la2
lia
lz1
lz2
liz
lalz
lia
la
liz
lz
:output of a neuron
:output of a layer
: input of activation function
: input of activation function for a layer
![Page 3: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/3.jpg)
Review: Notation
……
……
……
……
1
2
j
1
2
ilijw l
ib
1
lijw
lW
lib
lb
: a weight
: a bias
: a bias for all neurons in a layer
: the weights between layers
nodeslNLayer lLayer 1l
nodes1lN
lW
![Page 4: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/4.jpg)
Review: Relations between Layer Outputs
……
nodeslNLayer l
……
……
Layer 1lnodes1lN
……
1
2
j
1
2
i
11la
12la
1lja
la1
la2
lia
lz1
lz2
liz
lalz1la
llll baWz 1
ll za
llll baWa 1
![Page 5: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/5.jpg)
Review: Neural Network is a function
LL bbbxxfy 2112 WWW;
vector x
vector y
111W abx 2212W aba LL1-LLW aba y
LL bbb ,W,W,,W 2211 (to be learned from training examples)
![Page 6: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/6.jpg)
Review: Gradient Descent
• Given training examples:• Find a set of parameters θ* minimizing the error
function C(θ)
• We have to compute and lij
r
w
C
RRrr yxyxyx ˆ,ˆ,ˆ, 11
r
rr yxfR
C2
ˆ;1
2ˆ;C rrr yxf
li
r
b
C
![Page 7: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/7.jpg)
Neat Representation
• is the multiplication of two termslij
r
w
C
……
1
2
j…
…
1
2
ilijw
liz
lia
rli
lij ΔCΔzΔw
li
r
lij
li
lij
r
z
C
w
z
w
C
Layer lLayer 1l
![Page 8: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/8.jpg)
Neat Representation – First Term• is the multiplication of two termsl
ij
r
w
C
……
1
2
j…
…
1
2
ilijw
liz
lia
rli
lij ΔCΔzΔw
li
r
lij
li
lij
r
z
C
w
z
w
C
Layer lLayer 1l
![Page 9: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/9.jpg)
Neat Representation – First Term
……
Layer l-1
1
2
j…
…
1
2
i
Layer l
li
r
lij
li
lij
r
z
C
w
z
w
C
li
lj
j
lij
li bawz 1 1
l
jlij
li aw
zIf l > 1
lijw
liz
![Page 10: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/10.jpg)
Neat Representation – First Term
li
r
lij
li
lij
r
z
C
w
z
w
C
If l = 1 111i
rj
jiji bxwz r
jij
i xw
z
1
1
……
Input
……
1
2
i
Layer 1 rx1
rx2
rjx 1
ijw1iz
li
lj
j
lij
li bawz 1 1
l
jlij
li aw
zIf l > 1
![Page 11: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/11.jpg)
Neat Representation – Second Term• is always the multiplication of two termsl
ij
r
w
C
……
1
2
j…
…
1
2
ilijw
liz
lia
rli
lij ΔCΔzΔw
li
r
lij
li
lij
r
z
C
w
z
w
C
Layer lLayer 1l
li
![Page 12: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/12.jpg)
Neat Representation – Second Term
……
Layer l-1
1
2
j…
…
1
2
i
Layer l …
…1
2
k
Layer l+1 ……
……
……
……
1
2
n
Layer L(output layer)
Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδli
r
lij
li
lij
r
z
C
w
z
w
C
l
i
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
Lnδ
L2δ
Lδ1
lδ 1lδ Lδ
![Page 13: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/13.jpg)
Neat Representation – Second Term Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδli
r
lij
li
lij
r
z
C
w
z
w
C
l
i
LL
n
r
n z
C
rrLL Cyaz nnn
rL
r
n
r
n
n
y
C
z
y
Lnz ……
1
2
n
Layer L(output layer)
Lnδ
L2δ
Lδ1
z
z
Depending on the definition of error function
Lnz
![Page 14: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/14.jpg)
Neat Representation – Second Term Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδli
r
lij
li
lij
r
z
C
w
z
w
C
l
i
LL
n
r
n z
C
rL
r
n
r
n
n
y
C
z
y
Ln
L
L
L
z
z
z
z
2
1
rn
r
rr
rr
rr
yC
yC
yC
yC2
1
rrl yCzδ L rn
rLn y
Cz
![Page 15: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/15.jpg)
Neat Representation – Second Term
li
li ΔaΔz rΔC
k
lk
r
li
lk
li
li
li
rli z
C
a
z
z
a
z
C1
1
11lΔz1
2lΔz
1lkΔz
……
li
r
lij
li
lij
r
z
C
w
z
w
C
l
i
Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδ
……
1
2
i
Layer l
……
1
2
k
Layer l+1
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
1lδlδ
li
rli z
Cδ
1lk
![Page 16: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/16.jpg)
Neat Representation – Second Term
li
r
lij
li
lij
r
z
C
w
z
w
C
l
i
k
lk
lki
li
li wz 11……
1
2
i
Layer l
……
1
2
k
Layer l+1
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
1lδlδ
li
li ΔaΔz rΔC
11lΔz1
2lΔz
1lkΔz
……
k
lkl
i
lk
li
lil
i a
z
z
a 11
liz 111 lk
li
i
lki
lk bawz
![Page 17: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/17.jpg)
Neat Representation – Second Term
li
r
lij
li
lij
r
z
C
w
z
w
C
l
i
liδ i
liz
multiply a constant
1lkδ
12lδ
11lδ
……
1lkiw
1lkiw
12liw
11liw
output
input
new type of neuron
k
lk
lki
li
li wz 11
……
1
2
i
Layer l
……
1
2
k
Layer l+1
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
1lδlδ
![Page 18: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/18.jpg)
Neat Representation – Second Term
2
…
11 lz
12 lz
1 lkz
1
k
2
1
i
…
Layer l+1Layer l
lz1
lz2
liz
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
11 lTlll Wz
1lδlδ
li
l
l
l
z
z
z
z
2
1
k
lk
lki
li
li wz 11
![Page 19: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/19.jpg)
Neat Representation – Second Term
2
…
11 lz
12 lz
1 lkz
1
k
2
1
i
…
Layer l+1Layer l
lz1
lz2
liz
liδ
lδ2
lδ1
1lkδ
12lδ
11lδ
11 lTlll Wz
1lδlδ
Compare
1lka
12la
11la
……
Layer l
1
2
i
……
1
2
klia
Layer l+1
la2
la1
la1la
111 llll baWa
![Page 20: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/20.jpg)
1
2
n
……
r1y
C r
Lz1
Lz2
Lnz
r2y
C r
rn
r
y
C
Layer L
2
…
11 lz
12 lz
1 lkz
1
k
2
1
i……
…
Layer l+1Layer l
lz1
lz2
liz
lδ1
lδ2
liδ
2
… 1L1 z
1
m
Layer L-1
…
……
……
……
Two Questions:1. How to compute Lδ
2. The relation of and lδ 1lδ 11 lTlll Wz
TW L TlW 1
rrl yCzδ Lli
r
lij
li
lij
r
z
C
w
z
w
C
li
rr yCL1-L
1L2 z
1L mz
1lδlδ
![Page 21: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/21.jpg)
Backpropagationli
r
lij
li
lij
r
z
C
w
z
w
C
Forward Pass Backward Pass
11 ll za 11 lTlll Wz
1211 llll baWz
rrL yCzδ L
1
11
lx
larj
lj
……
1
2
j
……
1
2
ilijw
Layer lLayer 1l
111 bxWz r
11 za
LTLLL Wz 11
li
![Page 22: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/22.jpg)
Appendix
![Page 23: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/23.jpg)
![Page 24: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/24.jpg)
A reverse network
1
2
n
……
Layer L (Output layer)
2
…
(formed by new types of neurons)
1
k
2
1
i……
…
Layer l+1Layer l
2
…1
k
Layer l+2
………
……
……
Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδ 11 lTlll Wz
2lW1lW
rrl yCzδ L
rr yCl
![Page 25: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/25.jpg)
Review: Gradient descent
Start at paramter θ0
Compute gradient at W0: g0
Move to W1 = W0 - μg0
Compute gradient at W1: g1
Move to W2 = W1 – μg1
Movement
Gradient
……
θ0
θ1
θ2
θ3
g0
g1
g2
g3
![Page 26: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/26.jpg)
Neat Representation – First Term
rx1rx2
……
Layer 1
rx3……
……
Layer L-1
……
……
……
Input
rx
111 abxW r llll abaW 121
11,bW 1
2
j
1-l1a
1-l2a
1-lja
li
r
lij
li
lij
r
z
C
w
z
w
C
![Page 27: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/27.jpg)
Neat Representation – Second Term
……
1
2
n
Layer L(output layer)
Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδli
r
lij
li
lij
r
z
C
w
z
w
C
l
i
Lnδ
L2δ
Lδ1
Lδ
r
rL
r
n
rLn
n
r
n
nLn
y
Cz
y
C
z
y
1
2
n
……
r1y
C r
Lz1
Lz2
Lnz
r2y
C r
rn
r
y
C
Layer L (Output layer)
Lnδ
L2δ
Lδ1
![Page 28: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/28.jpg)
Neat Representation – Second Term Two Questions:
1. How to compute Lδ
2. The relation of and lδ 1lδli
r
lij
li
lij
r
z
C
w
z
w
C
l
i
Lδ
1
2
n
……
r1y
C r
Lz1
Lz2
Lnz
r2y
C r
rn
r
y
C
Layer L (Output layer)
Lnδ
L2δ
Lδ1
li
l
l
l
z
z
z
z
2
1
rn
r
rr
rr
rr
yC
yC
yC
yC2
1
rrl yCzδ L
![Page 29: Backpropagation An efficient way to compute the gradient Hung-yi Lee.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649e985503460f94b9b6a8/html5/thumbnails/29.jpg)
Reference
• https://theclevermachine.wordpress.com/