Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3...
Transcript of Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3...
![Page 1: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/1.jpg)
Foundations of Machine LearningAfrican Masters in Machine Intelligence
Vector CalculusMarc Deisenroth
Quantum Leap AfricaAfrican Institute for MathematicalSciences, Rwanda
Department of ComputingImperial College London
September 26, 2018
![Page 2: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/2.jpg)
Reference
Deisenroth et al.: Mathematics for Machine Learning, Chapter 5https://mml-book.com
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 2
![Page 3: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/3.jpg)
Curve Fitting (Regression) in Machine Learning (1)
x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 5
DataMaximum likelihood estimate
§ Setting: Given inputs x, predict outputs/targets y§ Model f that depends on parameters θ. Examples:
§ Linear model: f px, θq “ θJx, x, θ P RD
§ Neural network: f px, θq “ NNpx, θq
§ Training data, e.g., N pairs pxi, yiq of inputs xi and observations yi
§ Training the model means finding parameters θ˚, such thatf pxi, θ˚q « yi
§ Define a loss function, e.g.,řN
i“1pyi ´ f pxi, θqq2, which we want tooptimize
§ Typically: Optimization based on some form of gradient descentDifferentiation required
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3
![Page 4: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/4.jpg)
Curve Fitting (Regression) in Machine Learning (2)
§ Training data, e.g., N pairs pxi, yiq ofinputs xi and observations yi
§ Training the model means findingparameters θ˚, such that f pxi, θ˚q « yi
x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 5
DataMaximum likelihood estimate
§ Define a loss function, e.g.,řN
i“1pyi ´ f pxi, θqq2, which we want tooptimize
§ Typically: Optimization based on some form of gradient descentDifferentiation required
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 4
![Page 5: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/5.jpg)
Curve Fitting (Regression) in Machine Learning (2)
§ Training data, e.g., N pairs pxi, yiq ofinputs xi and observations yi
§ Training the model means findingparameters θ˚, such that f pxi, θ˚q « yi
x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 5
DataMaximum likelihood estimate
§ Define a loss function, e.g.,řN
i“1pyi ´ f pxi, θqq2, which we want tooptimize
§ Typically: Optimization based on some form of gradient descentDifferentiation required
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 4
![Page 6: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/6.jpg)
Types of Differentiation
1. Scalar differentiation: f : RÑ R
y P Rw.r.t. x P R
2. Multivariate case: f : RN Ñ R
y P Rw.r.t. vector x P RN
3. Vector fields: f : RN Ñ RM
vector y P RM w.r.t. vector x P RN
4. General derivatives: f : RMˆN Ñ RPˆQ
matrix y P RPˆQ w.r.t. matrix X P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 5
![Page 7: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/7.jpg)
Scalar Differentiation f : R Ñ R
§ Derivative defined as the limit of the difference quotient
f 1pxq “d fdx“ lim
hÑ0
f px` hq ´ f pxqh
Slope of the secant line through f pxq and f px` hq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 6
![Page 8: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/8.jpg)
Some Examples
f pxq “ xn f 1pxq “ nxn´1
f pxq “ sinpxq f 1pxq “ cospxqf pxq “ tanhpxq f 1pxq “ 1´ tanh2
pxqf pxq “ exppxq f 1pxq “ exppxqf pxq “ logpxq f 1pxq “ 1
x
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 7
![Page 9: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/9.jpg)
Rules
§ Sum Rule`
f pxq ` gpxq˘1“ f 1pxq ` g1pxq “
d f pxqdx
`dgpxq
dx
§ Product Rule`
f pxqgpxq˘1“ f 1pxqgpxq ` f pxqg1pxq “
d f pxqdx
gpxq ` f pxqdgpxq
dx§ Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgp f pxqqd f
d f pxqdx
§ Quotient Rule
´ f pxqgpxq
¯1
“f pxq1gpxq ´ f pxqgpxq1
pgpxqq2“
d fdx gpxq ´ f pxq dg
dxpgpxqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 8
![Page 10: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/10.jpg)
Rules
§ Sum Rule`
f pxq ` gpxq˘1“ f 1pxq ` g1pxq “
d f pxqdx
`dgpxq
dx§ Product Rule
`
f pxqgpxq˘1“ f 1pxqgpxq ` f pxqg1pxq “
d f pxqdx
gpxq ` f pxqdgpxq
dx
§ Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgp f pxqqd f
d f pxqdx
§ Quotient Rule
´ f pxqgpxq
¯1
“f pxq1gpxq ´ f pxqgpxq1
pgpxqq2“
d fdx gpxq ´ f pxq dg
dxpgpxqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 8
![Page 11: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/11.jpg)
Rules
§ Sum Rule`
f pxq ` gpxq˘1“ f 1pxq ` g1pxq “
d f pxqdx
`dgpxq
dx§ Product Rule
`
f pxqgpxq˘1“ f 1pxqgpxq ` f pxqg1pxq “
d f pxqdx
gpxq ` f pxqdgpxq
dx§ Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgp f pxqqd f
d f pxqdx
§ Quotient Rule
´ f pxqgpxq
¯1
“f pxq1gpxq ´ f pxqgpxq1
pgpxqq2“
d fdx gpxq ´ f pxq dg
dxpgpxqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 8
![Page 12: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/12.jpg)
Rules
§ Sum Rule`
f pxq ` gpxq˘1“ f 1pxq ` g1pxq “
d f pxqdx
`dgpxq
dx§ Product Rule
`
f pxqgpxq˘1“ f 1pxqgpxq ` f pxqg1pxq “
d f pxqdx
gpxq ` f pxqdgpxq
dx§ Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgp f pxqqd f
d f pxqdx
§ Quotient Rule
´ f pxqgpxq
¯1
“f pxq1gpxq ´ f pxqgpxq1
pgpxqq2“
d fdx gpxq ´ f pxq dg
dxpgpxqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 8
![Page 13: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/13.jpg)
Example: Scalar Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgd f
d fdx
Beginner
gpzq “ 6z` 3z “ f pxq “ ´2x` 5
pg ˝ f q1pxq “
p6qloomoon
dg{d f
p´2qloomoon
d f {dx
“ ´12
Advanced
gpzq “ tanhpzqz “ f pxq “ xn
pg ˝ f q1pxq “
`
1´ tanh2pxnqq
loooooooomoooooooon
dg{d f
nxn´1loomoon
d f {dx
Work it out with your neighbors
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 9
![Page 14: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/14.jpg)
Example: Scalar Chain Rule
pg ˝ f q1pxq “`
gp f pxqq˘1“ g1p f pxqq f 1pxq “
dgd f
d fdx
Beginner
gpzq “ 6z` 3z “ f pxq “ ´2x` 5
pg ˝ f q1pxq “ p6qloomoon
dg{d f
p´2qloomoon
d f {dx
“ ´12
Advanced
gpzq “ tanhpzqz “ f pxq “ xn
pg ˝ f q1pxq “`
1´ tanh2pxnqq
loooooooomoooooooon
dg{d f
nxn´1loomoon
d f {dx
Work it out with your neighbors
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 9
![Page 15: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/15.jpg)
Multivariate Differentiation f : RN Ñ R
y “ f pxq , x “
»
—
–
x1...
xN
fi
ffi
fl
P RN
§ Partial derivative (change one coordinate at a time):
B fBxi
“ limhÑ0
f px1, . . . , xi´1, xi ` h , xi`1, . . . , xNq ´ f pxqh
§ Jacobian vector (gradient) collects all partial derivatives:
d fdx“
”
B fBx1
¨ ¨ ¨B fBxN
ı
P R1ˆN
Note: This is a row vector.
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 10
![Page 16: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/16.jpg)
Multivariate Differentiation f : RN Ñ R
y “ f pxq , x “
»
—
–
x1...
xN
fi
ffi
fl
P RN
§ Partial derivative (change one coordinate at a time):
B fBxi
“ limhÑ0
f px1, . . . , xi´1, xi ` h , xi`1, . . . , xNq ´ f pxqh
§ Jacobian vector (gradient) collects all partial derivatives:
d fdx“
”
B fBx1
¨ ¨ ¨B fBxN
ı
P R1ˆN
Note: This is a row vector.Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 10
![Page 17: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/17.jpg)
Example: Multivariate Differentiation
Beginner
f : R2 Ñ R
f px1, x2q “ x21x2 ` x1x3
2 P R
Advanced
f : R2 Ñ R
f px1, x2q “ px1 ` 2x32q
2 P R
Partial derivatives?Work it out with your neighbors
B f px1, x2q
Bx1“ 2x1x2 ` x3
2
B f px1, x2q
Bx2“ x2
1 ` 3x1x22
B f px1, x2q
Bx1“ 2px1 ` 2x3
2q
BBx1px1`2x3
2qhkkikkj
p1q
B f px1, x2q
Bx2“ 2px1 ` 2x3
2q p6x22q
loomoon
BBx2px1`2x3
2q
Gradientd fdx“
”
B f px1,x2q
Bx1
B f px1,x2q
Bx2
ı
P R1ˆ2
d fdx““
2x1x2 ` x32 x2
1 ` 3x1x22
‰ d fdx““
2px1 ` 2x32q 12px1 ` 2x3
2qx22
‰
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 11
![Page 18: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/18.jpg)
Example: Multivariate Differentiation
Beginner
f : R2 Ñ R
f px1, x2q “ x21x2 ` x1x3
2 P R
Advanced
f : R2 Ñ R
f px1, x2q “ px1 ` 2x32q
2 P R
Partial derivatives
?Work it out with your neighbors
B f px1, x2q
Bx1“ 2x1x2 ` x3
2
B f px1, x2q
Bx2“ x2
1 ` 3x1x22
B f px1, x2q
Bx1“ 2px1 ` 2x3
2q
BBx1px1`2x3
2qhkkikkj
p1q
B f px1, x2q
Bx2“ 2px1 ` 2x3
2q p6x22q
loomoon
BBx2px1`2x3
2q
Gradientd fdx“
”
B f px1,x2q
Bx1
B f px1,x2q
Bx2
ı
P R1ˆ2
d fdx““
2x1x2 ` x32 x2
1 ` 3x1x22
‰ d fdx““
2px1 ` 2x32q 12px1 ` 2x3
2qx22
‰
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 11
![Page 19: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/19.jpg)
Example: Multivariate Differentiation
Beginner
f : R2 Ñ R
f px1, x2q “ x21x2 ` x1x3
2 P R
Advanced
f : R2 Ñ R
f px1, x2q “ px1 ` 2x32q
2 P R
Partial derivatives
?Work it out with your neighbors
B f px1, x2q
Bx1“ 2x1x2 ` x3
2
B f px1, x2q
Bx2“ x2
1 ` 3x1x22
B f px1, x2q
Bx1“ 2px1 ` 2x3
2q
BBx1px1`2x3
2qhkkikkj
p1q
B f px1, x2q
Bx2“ 2px1 ` 2x3
2q p6x22q
loomoon
BBx2px1`2x3
2q
Gradientd fdx“
”
B f px1,x2q
Bx1
B f px1,x2q
Bx2
ı
P R1ˆ2
d fdx““
2x1x2 ` x32 x2
1 ` 3x1x22
‰ d fdx““
2px1 ` 2x32q 12px1 ` 2x3
2qx22
‰
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 11
![Page 20: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/20.jpg)
Example: Multivariate Chain Rule
§ Consider the function
Lpeq “ 12}e}
2 “ 12 eJe
e “ y´ Ax , x P RN , A P RMˆN , e, y P RM
§ Compute the gradient dLdx . What is the dimension/size of dL
dx ?
Work it out with your neighbors
dLdx
“BLBeBeBx
BLBe“ eJ P R1ˆM (1)
BeBx“ ´A P RMˆN (2)
ùñdLdx
“ eJp´Aq “ ´py´ AxqJA P R1ˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 12
![Page 21: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/21.jpg)
Example: Multivariate Chain Rule
§ Consider the function
Lpeq “ 12}e}
2 “ 12 eJe
e “ y´ Ax , x P RN , A P RMˆN , e, y P RM
§ Compute the gradient dLdx . What is the dimension/size of dL
dx ?
dLdx
“BLBeBeBx
BLBe“ eJ P R1ˆM (1)
BeBx“ ´A P RMˆN (2)
ùñdLdx
“ eJp´Aq “ ´py´ AxqJA P R1ˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 12
![Page 22: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/22.jpg)
Vector Field Differentiation f : RN Ñ RM
y “ f pxq P RM , x P RN
»
—
–
y1...
yM
fi
ffi
fl
“
»
—
–
f1pxq...
fMpxq
fi
ffi
fl
“
»
—
–
f1px1, . . . , xNq...
fMpx1, . . . , xNq
fi
ffi
fl
§ Jacobian matrix (collection of all partial derivatives)
»
—
—
–
dy1dx...
dyMdx
fi
ffi
ffi
fl
“
»
—
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
ffi
fl
P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 13
![Page 23: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/23.jpg)
Vector Field Differentiation f : RN Ñ RM
y “ f pxq P RM , x P RN
»
—
–
y1...
yM
fi
ffi
fl
“
»
—
–
f1pxq...
fMpxq
fi
ffi
fl
“
»
—
–
f1px1, . . . , xNq...
fMpx1, . . . , xNq
fi
ffi
fl
§ Jacobian matrix (collection of all partial derivatives)
»
—
—
–
dy1dx...
dyMdx
fi
ffi
ffi
fl
“
»
—
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
ffi
fl
P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 13
![Page 24: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/24.jpg)
Example: Vector Field Differentiation
f pxq “ Ax , f pxq P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
§ Compute the gradient d fdx
§ Gradient:
fipxq “Nÿ
j“1
Aijxj ùñB fiBxj
“ Aij
ùñd fdx
“
»
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
fl
“
»
—
—
–
A11 ¨ ¨ ¨ A1N...
...AM1 ¨ ¨ ¨ AMN
fi
ffi
ffi
fl
“ A P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 14
![Page 25: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/25.jpg)
Example: Vector Field Differentiation
f pxq “ Ax , f pxq P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
§ Compute the gradient d fdx
§ Gradient:
fipxq “Nÿ
j“1
Aijxj ùñB fiBxj
“ Aij
ùñd fdx
“
»
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
fl
“
»
—
—
–
A11 ¨ ¨ ¨ A1N...
...AM1 ¨ ¨ ¨ AMN
fi
ffi
ffi
fl
“ A P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 14
![Page 26: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/26.jpg)
Example: Vector Field Differentiation
f pxq “ Ax , f pxq P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
§ Compute the gradient d fdx
§ Gradient:
fipxq “Nÿ
j“1
Aijxj ùñB fiBxj
“ Aij
ùñd fdx
“
»
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
fl
“
»
—
—
–
A11 ¨ ¨ ¨ A1N...
...AM1 ¨ ¨ ¨ AMN
fi
ffi
ffi
fl
“ A P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 14
![Page 27: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/27.jpg)
Example: Vector Field Differentiation
f pxq “ Ax , f pxq P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
§ Compute the gradient d fdx
§ Gradient:
fipxq “Nÿ
j“1
Aijxj ùñB fiBxj
“ Aij
ùñd fdx
“
»
—
—
–
B f1Bx1
¨ ¨ ¨B f1BxN
......
B fMBx1
¨ ¨ ¨B fMBxN
fi
ffi
ffi
fl
“
»
—
—
–
A11 ¨ ¨ ¨ A1N...
...AM1 ¨ ¨ ¨ AMN
fi
ffi
ffi
fl
“ A P RMˆN
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 14
![Page 28: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/28.jpg)
Dimensionality of the Gradient
In general: A function f : RN Ñ RM has a gradient that is anMˆ N-matrix with
d fdx
P RMˆN , d f rm, ns “B fm
Bxn
Gradient dimension: # target dimensions ˆ # input dimensions
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 15
![Page 29: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/29.jpg)
Chain Rule
B
Bxpg ˝ f qpxq “
B
Bx`
gp f pxqq˘
“Bgp f qB f
B f pxqBx
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 16
![Page 30: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/30.jpg)
Example: Chain Rule
§ Consider f : R2 Ñ R, x : RÑ R2
f pxq “ f px1, x2q “ x21 ` 2x2 ,
xptq “
«
x1ptqx2ptq
ff
“
«
sinptqcosptq
ff
§ What are the dimensions of d fdx and dx
dt ?1ˆ 2 and 2ˆ 1
§ Compute the gradient d fdt using the chain rule:
d fdt“
d fdx
dxdt“
”
B fBx1
B fBx2
ı
»
–
Bx1Bt
Bx2Bt
fi
fl “
”
2 sin t 2ı
«
cos t´ sin t
ff
“ 2 sin t cos t´ 2 sin t “ 2 sin tpcos t´ 1q
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 17
![Page 31: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/31.jpg)
Example: Chain Rule
§ Consider f : R2 Ñ R, x : RÑ R2
f pxq “ f px1, x2q “ x21 ` 2x2 ,
xptq “
«
x1ptqx2ptq
ff
“
«
sinptqcosptq
ff
§ What are the dimensions of d fdx and dx
dt ?Work it out with your neighbors
1ˆ 2 and 2ˆ 1§ Compute the gradient d f
dt using the chain rule:
d fdt“
d fdx
dxdt“
”
B fBx1
B fBx2
ı
»
–
Bx1Bt
Bx2Bt
fi
fl “
”
2 sin t 2ı
«
cos t´ sin t
ff
“ 2 sin t cos t´ 2 sin t “ 2 sin tpcos t´ 1q
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 17
![Page 32: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/32.jpg)
Example: Chain Rule
§ Consider f : R2 Ñ R, x : RÑ R2
f pxq “ f px1, x2q “ x21 ` 2x2 ,
xptq “
«
x1ptqx2ptq
ff
“
«
sinptqcosptq
ff
§ What are the dimensions of d fdx and dx
dt ?1ˆ 2 and 2ˆ 1
§ Compute the gradient d fdt using the chain rule:
d fdt“
d fdx
dxdt“
”
B fBx1
B fBx2
ı
»
–
Bx1Bt
Bx2Bt
fi
fl “
”
2 sin t 2ı
«
cos t´ sin t
ff
“ 2 sin t cos t´ 2 sin t “ 2 sin tpcos t´ 1q
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 17
![Page 33: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/33.jpg)
Example: Chain Rule
§ Consider f : R2 Ñ R, x : RÑ R2
f pxq “ f px1, x2q “ x21 ` 2x2 ,
xptq “
«
x1ptqx2ptq
ff
“
«
sinptqcosptq
ff
§ What are the dimensions of d fdx and dx
dt ?1ˆ 2 and 2ˆ 1
§ Compute the gradient d fdt using the chain rule:
d fdt“
d fdx
dxdt“
”
B fBx1
B fBx2
ı
»
–
Bx1Bt
Bx2Bt
fi
fl “
”
2 sin t 2ı
«
cos t´ sin t
ff
“ 2 sin t cos t´ 2 sin t “ 2 sin tpcos t´ 1q
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 17
![Page 34: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/34.jpg)
Derivatives with Respect to Matrices
§ Recall: A function f : RN Ñ RM has a gradient that is anMˆ N-matrix with
d fdx
P RMˆN , d f rm, ns “B fm
Bxn
Gradient dimension: # target dimensions ˆ # input dimensions
§ This generalizes to when the inputs (N) or targets (M) arematrices
§ Function f : RMˆ N Ñ RPˆQ , has a gradient that is apPˆQq ˆ pMˆ Nq object (tensor)
d fdX
P RpPˆQqˆpMˆNq , d f rp, q, m, ns “B fpq
BXmn
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 18
![Page 35: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/35.jpg)
Derivatives with Respect to Matrices
§ Recall: A function f : RN Ñ RM has a gradient that is anMˆ N-matrix with
d fdx
P RMˆN , d f rm, ns “B fm
Bxn
Gradient dimension: # target dimensions ˆ # input dimensions
§ This generalizes to when the inputs (N) or targets (M) arematrices
§ Function f : RMˆ N Ñ RPˆQ , has a gradient that is apPˆQq ˆ pMˆ Nq object (tensor)
d fdX
P RpPˆQqˆpMˆNq , d f rp, q, m, ns “B fpq
BXmn
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 18
![Page 36: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/36.jpg)
Derivatives with Respect to Matrices
§ Recall: A function f : RN Ñ RM has a gradient that is anMˆ N-matrix with
d fdx
P RMˆN , d f rm, ns “B fm
Bxn
Gradient dimension: # target dimensions ˆ # input dimensions
§ This generalizes to when the inputs (N) or targets (M) arematrices
§ Function f : RMˆ N Ñ RPˆQ , has a gradient that is apPˆQq ˆ pMˆ Nq object (tensor)
d fdX
P RpPˆQqˆpMˆNq , d f rp, q, m, ns “B fpq
BXmn
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 18
![Page 37: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/37.jpg)
Example 1: Derivatives with Respect to Matrices
f “ Ax , f P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
d fdA
P R?
# target dimˆ # input dim“Mˆ pMˆ Nq
d fdA
“
»
—
–
B f1BA...B fMBA
fi
ffi
fl
,B fi
BAP R1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 19
![Page 38: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/38.jpg)
Example 1: Derivatives with Respect to Matrices
f “ Ax , f P RM, A P RMˆN , x P RN
»
—
—
–
y1...
yM
fi
ffi
ffi
fl
“
»
—
—
—
–
f1pxq...
fMpxq
fi
ffi
ffi
ffi
fl
“
»
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
fl
d fdA
P R# target dimˆ # input dim“Mˆ pMˆ Nq
d fdA
“
»
—
–
B f1BA...B fMBA
fi
ffi
fl
,B fi
BAP R1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 19
![Page 39: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/39.jpg)
Example 2: Derivatives with Respect to Matrices
fi “
Nÿ
j“1
Aijxj, i “ 1, . . . , M
»
—
—
—
—
—
—
–
y1...
yi...
yM
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
f1pxq...
fipxq...
fMpxq
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...Ai1x1 ` Ai2x2 ¨ ¨ ¨ `AiNxN
......
......
AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
B fi
BAiq“ ?
xqloomoon
PR
B fi
BAi,:“ ?
xJloomoon
PR1ˆ1ˆN
B fi
BAk‰i,:“ ?
0Jloomoon
PR1ˆ1ˆN
B fi
BA“ ?
»
—
—
—
—
—
—
—
—
–
0J
...xJ
...0J
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
loomoon
PR1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 20
![Page 40: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/40.jpg)
Example 2: Derivatives with Respect to Matrices
fi “
Nÿ
j“1
Aijxj, i “ 1, . . . , M
»
—
—
—
—
—
—
–
y1...
yi...
yM
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
f1pxq...
fipxq...
fMpxq
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...Ai1x1 ` Ai2x2 ¨ ¨ ¨ `AiNxN
......
......
AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
B fi
BAiq“ xq
loomoon
PR
B fi
BAi,:“ ?
xJloomoon
PR1ˆ1ˆN
B fi
BAk‰i,:“ ?
0Jloomoon
PR1ˆ1ˆN
B fi
BA“ ?
»
—
—
—
—
—
—
—
—
–
0J
...xJ
...0J
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
loomoon
PR1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 20
![Page 41: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/41.jpg)
Example 2: Derivatives with Respect to Matrices
fi “
Nÿ
j“1
Aijxj, i “ 1, . . . , M
»
—
—
—
—
—
—
–
y1...
yi...
yM
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
f1pxq...
fipxq...
fMpxq
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...Ai1x1 ` Ai2x2 ¨ ¨ ¨ `AiNxN
......
......
AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
B fi
BAiq“ xq
loomoon
PR
B fi
BAi,:“ xJ
loomoon
PR1ˆ1ˆN
B fi
BAk‰i,:“ ?
0Jloomoon
PR1ˆ1ˆN
B fi
BA“ ?
»
—
—
—
—
—
—
—
—
–
0J
...xJ
...0J
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
loomoon
PR1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 20
![Page 42: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/42.jpg)
Example 2: Derivatives with Respect to Matrices
fi “
Nÿ
j“1
Aijxj, i “ 1, . . . , M
»
—
—
—
—
—
—
–
y1...
yi...
yM
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
f1pxq...
fipxq...
fMpxq
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...Ai1x1 ` Ai2x2 ¨ ¨ ¨ `AiNxN
......
......
AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
B fi
BAiq“ xq
loomoon
PR
B fi
BAi,:“ xJ
loomoon
PR1ˆ1ˆN
B fi
BAk‰i,:“ 0J
loomoon
PR1ˆ1ˆN
B fi
BA“ ?
»
—
—
—
—
—
—
—
—
–
0J
...xJ
...0J
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
loomoon
PR1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 20
![Page 43: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/43.jpg)
Example 2: Derivatives with Respect to Matrices
fi “
Nÿ
j“1
Aijxj, i “ 1, . . . , M
»
—
—
—
—
—
—
–
y1...
yi...
yM
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
f1pxq...
fipxq...
fMpxq
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
“
»
—
—
—
—
—
—
–
A11x1 ` A12x2` ¨ ¨ ¨ `A1NxN...
......
...Ai1x1 ` Ai2x2 ¨ ¨ ¨ `AiNxN
......
......
AM1x1 ` AM2x2` ¨ ¨ ¨ `AMNxN
fi
ffi
ffi
ffi
ffi
ffi
ffi
fl
B fi
BAiq“ xq
loomoon
PR
B fi
BAi,:“ xJ
loomoon
PR1ˆ1ˆN
B fi
BAk‰i,:“ 0J
loomoon
PR1ˆ1ˆN
B fi
BA“
»
—
—
—
—
—
—
—
—
–
0J
...xJ
...0J
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
loomoon
PR1ˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 20
![Page 44: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/44.jpg)
Gradient Computation: Two Alternatives
§ Consider f : R3 Ñ R4ˆ2, f pxq “ A P R4ˆ2 where the entries Aij
depend on a vector x P R3
§ We can compute dApxqdx P R4ˆ2ˆ3 in two equivalent ways:
A P R4ˆ2 x P R3
BABx1
P R4ˆ2
BABx2
P R4ˆ2
BABx3
P R4ˆ2
x1
x2
x3
dAdx
P R4ˆ2ˆ3
4
2
3
Partial derivatives:
collate
A P R4ˆ2 x P R3
x1
x2
x3
dAdx
P R4ˆ2ˆ3
re-shape re-shapegradient
A P R4ˆ2 A P R8dAdx
P R8ˆ3
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 21
![Page 45: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/45.jpg)
Gradient Computation: Two Alternatives
§ Consider f : R3 Ñ R4ˆ2, f pxq “ A P R4ˆ2 where the entries Aij
depend on a vector x P R3
§ We can compute dApxqdx P R4ˆ2ˆ3 in two equivalent ways:
A P R4ˆ2 x P R3
BABx1
P R4ˆ2
BABx2
P R4ˆ2
BABx3
P R4ˆ2
x1
x2
x3
dAdx
P R4ˆ2ˆ3
4
2
3
Partial derivatives:
collate
A P R4ˆ2 x P R3
x1
x2
x3
dAdx
P R4ˆ2ˆ3
re-shape re-shapegradient
A P R4ˆ2 A P R8dAdx
P R8ˆ3
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 21
![Page 46: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/46.jpg)
Gradients of a Single-Layer Neural Network
x z f
A, b
σ
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 22
![Page 47: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/47.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“
B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 48: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/48.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM
B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“
B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 49: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/49.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM
B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq
B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 50: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/50.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM
B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq
B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 51: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/51.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM
B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq
B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 52: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/52.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM
B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq
B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 53: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/53.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq
B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 54: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/54.jpg)
Gradients of a Single-Layer Neural Network
f “ tanhpAx` bloomoon
“:zPRM
q P RM, x P RN , A P RMˆN , b P RM
B fBb“
B fBz
loomoon
MˆM
BzBb
loomoon
MˆM
P RMˆM B fBbri, js “
Mÿ
l“1
B fBzri, ls
BzBbrl, js
B fBA
“B fBz
loomoon
MˆM
BzBA
loomoon
MˆpMˆNq
P RMˆpMˆNq B fBAri, j, ks “
Mÿ
l“1
B fBzri, ls
BzBArl, j, ks
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBb“ I
loomoon
PRMˆM
BzBA
“
»
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
fl
looooooooooomooooooooooon
PRMˆpMˆNq
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 23
![Page 55: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/55.jpg)
Putting Things Together
§ Inputs x P RN
§ Observed outputs y “ fθpzq ` ε P RM , ε „ N`
0, Σ˘
§ Train single-layer neural network with
fθpzq “ tanhpzq P RM , z “ Ax` b P RM , θ “ tA, bu
§ Find A, b, such that the squared loss
Lpθq “ 12}e}
2 P R , e “ y´ f θpzq P RM
is minimized
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 24
![Page 56: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/56.jpg)
Putting Things Together
§ Inputs x P RN
§ Observed outputs y “ fθpzq ` ε P RM , ε „ N`
0, Σ˘
§ Train single-layer neural network with
fθpzq “ tanhpzq P RM , z “ Ax` b P RM , θ “ tA, bu
§ Find A, b, such that the squared loss
Lpθq “ 12}e}
2 P R , e “ y´ f θpzq P RM
is minimized
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 24
![Page 57: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/57.jpg)
Putting Things Together
§ Inputs x P RN
§ Observed outputs y “ fθpzq ` ε P RM , ε „ N`
0, Σ˘
§ Train single-layer neural network with
fθpzq “ tanhpzq P RM , z “ Ax` b P RM , θ “ tA, bu
§ Find A, b, such that the squared loss
Lpθq “ 12}e}
2 P R , e “ y´ f θpzq P RM
is minimized
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 24
![Page 58: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/58.jpg)
Putting Things Together
§ Inputs x P RN
§ Observed outputs y “ fθpzq ` ε P RM , ε „ N`
0, Σ˘
§ Train single-layer neural network with
fθpzq “ tanhpzq P RM , z “ Ax` b P RM , θ “ tA, bu
§ Find A, b, such that the squared loss
Lpθq “ 12}e}
2 P R , e “ y´ f θpzq P RM
is minimized
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 24
![Page 59: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/59.jpg)
Putting Things Together
Partial derivatives:BLBA
“BLBeBeB fB fBzBzBA
BLBb
“BLBeBeB fB fBzBzBb
BLBe“ eJloomoon
PR1ˆM
BeB f“ ´I
loomoon
PRMˆM
B fBz“ diagp1´ tanh2
pzqqlooooooooooomooooooooooon
PRMˆM
BzBA
“
»
—
—
—
—
–
xJ ¨ 0J ¨ 0J
¨ ¨ ¨
0J ¨ xJ ¨ 0J
¨ ¨ ¨
0J ¨ 0J ¨ xJ
fi
ffi
ffi
ffi
ffi
fl
loooooooooooomoooooooooooon
PRMˆpMˆNq
BzBb“ I
loomoon
PRMˆM
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 25
![Page 60: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/60.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
§ Inputs x, observed outputs y§ Train multi-layer neural network with
f 0 “ x
f i “ σipAi´1 f i´1 ` bi´1q , i “ 1, . . . , K
§ Find Aj, bj for j “ 0, . . . , K´ 1, such that the squared loss
Lpθq “ }y´ f K,θpxq}2
is minimized, where θ “ tAj, bju , j “ 0, . . . , K´ 1
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 26
![Page 61: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/61.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
§ Inputs x, observed outputs y§ Train multi-layer neural network with
f 0 “ x
f i “ σipAi´1 f i´1 ` bi´1q , i “ 1, . . . , K
§ Find Aj, bj for j “ 0, . . . , K´ 1, such that the squared loss
Lpθq “ }y´ f K,θpxq}2
is minimized, where θ “ tAj, bju , j “ 0, . . . , K´ 1
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 26
![Page 62: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/62.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
BLBθK´1
“BLB f K
B f KBθK´1
BLBθK´2
“BLB f K
B f KB f K´1
B f K´1
BθK´2
BLBθK´3
“BLB f K
B f KB f K´1
B f K´1
B f K´2
B f K´2
BθK´3
BLBθi
“BLB f K
B f KB f K´1
¨ ¨ ¨B f i`2
B f i`1
B f i`1
Bθi
Intermediate derivatives are stored during the forward pass
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 27
![Page 63: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/63.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
BLBθK´1
“BLB f K
B f KBθK´1
BLBθK´2
“BLB f K
B f KB f K´1
B f K´1
BθK´2
BLBθK´3
“BLB f K
B f KB f K´1
B f K´1
B f K´2
B f K´2
BθK´3
BLBθi
“BLB f K
B f KB f K´1
¨ ¨ ¨B f i`2
B f i`1
B f i`1
Bθi
Intermediate derivatives are stored during the forward pass
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 27
![Page 64: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/64.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
BLBθK´1
“BLB f K
B f KBθK´1
BLBθK´2
“BLB f K
B f KB f K´1
B f K´1
BθK´2
BLBθK´3
“BLB f K
B f KB f K´1
B f K´1
B f K´2
B f K´2
BθK´3
BLBθi
“BLB f K
B f KB f K´1
¨ ¨ ¨B f i`2
B f i`1
B f i`1
Bθi
Intermediate derivatives are stored during the forward pass
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 27
![Page 65: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/65.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
BLBθK´1
“BLB f K
B f KBθK´1
BLBθK´2
“BLB f K
B f KB f K´1
B f K´1
BθK´2
BLBθK´3
“BLB f K
B f KB f K´1
B f K´1
B f K´2
B f K´2
BθK´3
BLBθi
“BLB f K
B f KB f K´1
¨ ¨ ¨B f i`2
B f i`1
B f i`1
Bθi
Intermediate derivatives are stored during the forward pass
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 27
![Page 66: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/66.jpg)
Gradients of a Multi-Layer Neural Network
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2
BLBθK´1
“BLB f K
B f KBθK´1
BLBθK´2
“BLB f K
B f KB f K´1
B f K´1
BθK´2
BLBθK´3
“BLB f K
B f KB f K´1
B f K´1
B f K´2
B f K´2
BθK´3
BLBθi
“BLB f K
B f KB f K´1
¨ ¨ ¨B f i`2
B f i`1
B f i`1
Bθi
Intermediate derivatives are stored during the forward passVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 27
![Page 67: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/67.jpg)
Example: Linear Regression with Neural Networks
§ Linear regression with a neural network parametrized by θ, fθ:
y “ fθpxq ` ε , ε „ N`
0, σ2ε
˘
§ Given inputs xn and corresponding (noisy) observations yn,n “ 1, . . . , N, find parameters θ˚ that minimize the squared loss
Lpθq “Nÿ
n“1
pyn ´ fθpxnqq2 “ }y´ f pXq}2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 28
![Page 68: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/68.jpg)
Example: Linear Regression with Neural Networks
§ Linear regression with a neural network parametrized by θ, fθ:
y “ fθpxq ` ε , ε „ N`
0, σ2ε
˘
§ Given inputs xn and corresponding (noisy) observations yn,n “ 1, . . . , N, find parameters θ˚ that minimize the squared loss
Lpθq “Nÿ
n“1
pyn ´ fθpxnqq2 “ }y´ f pXq}2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 28
![Page 69: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/69.jpg)
Training Neural Networks as Maximum LikelihoodEstimation
§ Training a neural network in the above way corresponds tomaximum likelihood estimation:
§ If y “ NNpx, θq ` ε, ε „ N`
0, I˘
then the log-likelihood is
log ppy|X, θq “ ´ 12}y´ NNpx, θq}2
§ Find θ˚ by minimizing the negative log-likelihood:
θ˚ “ arg minθ´ log ppy|x, θq
“ arg minθ
12}y´ NNpx, θq}2
“ arg minθ
Lpθq
§ Maximum likelihood estimation can lead to overfitting (interpretnoise as signal)
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 29
![Page 70: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/70.jpg)
Training Neural Networks as Maximum LikelihoodEstimation
§ Training a neural network in the above way corresponds tomaximum likelihood estimation:
§ If y “ NNpx, θq ` ε, ε „ N`
0, I˘
then the log-likelihood is
log ppy|X, θq “ ´ 12}y´ NNpx, θq}2
§ Find θ˚ by minimizing the negative log-likelihood:
θ˚ “ arg minθ´ log ppy|x, θq
“ arg minθ
12}y´ NNpx, θq}2
“ arg minθ
Lpθq
§ Maximum likelihood estimation can lead to overfitting (interpretnoise as signal)
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 29
![Page 71: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/71.jpg)
Training Neural Networks as Maximum LikelihoodEstimation
§ Training a neural network in the above way corresponds tomaximum likelihood estimation:
§ If y “ NNpx, θq ` ε, ε „ N`
0, I˘
then the log-likelihood is
log ppy|X, θq “ ´ 12}y´ NNpx, θq}2
§ Find θ˚ by minimizing the negative log-likelihood:
θ˚ “ arg minθ´ log ppy|x, θq
“ arg minθ
12}y´ NNpx, θq}2
“ arg minθ
Lpθq
§ Maximum likelihood estimation can lead to overfitting (interpretnoise as signal)
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 29
![Page 72: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/72.jpg)
Example: Linear Regression (1)
§ Linear regression with a polynomial of order M:
y “ f px, θq ` ε , ε „ N`
0, σ2ε
˘
f px, θq “ θ0 ` θ1x` θ2x2 ` ¨ ¨ ¨ ` θMxM “
Mÿ
i“0
θixi
§ Given inputs xi and corresponding (noisy) observations yi,i “ 1, . . . , N, find parameters θ “ rθ0, . . . , θMs
J, that minimize thesquared loss (equivalently: maximize the likelihood)
Lpθq “Nÿ
i“1
pyi ´ f pxi, θqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 30
![Page 73: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/73.jpg)
Example: Linear Regression (1)
§ Linear regression with a polynomial of order M:
y “ f px, θq ` ε , ε „ N`
0, σ2ε
˘
f px, θq “ θ0 ` θ1x` θ2x2 ` ¨ ¨ ¨ ` θMxM “
Mÿ
i“0
θixi
§ Given inputs xi and corresponding (noisy) observations yi,i “ 1, . . . , N, find parameters θ “ rθ0, . . . , θMs
J, that minimize thesquared loss (equivalently: maximize the likelihood)
Lpθq “Nÿ
i“1
pyi ´ f pxi, θqq2
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 30
![Page 74: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/74.jpg)
Example: Linear Regression (2)
x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 16
DataMaximum likelihood estimate
§ Regularization, model selection etc. can address overfitting
§ Alternative approach based on integration
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 31
![Page 75: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/75.jpg)
Example: Linear Regression (2)
x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 16
DataMaximum likelihood estimate
§ Regularization, model selection etc. can address overfitting
§ Alternative approach based on integration
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 31
![Page 76: Vector Calculus - Marc DeisenrothVector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 3 Curve Fitting (Regression) in Machine Learning (2) Training data, e.g., N pairs](https://reader033.fdocuments.us/reader033/viewer/2022060507/5f1fb2097552db4c54391382/html5/thumbnails/76.jpg)
Summary
A P R4ˆ2 x P R3
BABx1
P R4ˆ2
BABx2
P R4ˆ2
BABx3
P R4ˆ2
x1
x2
x3
dAdx
P R4ˆ2ˆ3
4
2
3
Partial derivatives:
collate
x f K
A1, b1 AK´1, bK´1
Lf K´1
AK´2, bK´2
f 1
A2, b2 x-5 0 5
f(x)
-3
-2
-1
0
1
2
3Polynomial of degree 16
DataMaximum likelihood estimate
§ Vector-valued differentiation
§ Chain rule
§ Check the dimension of the gradients
Vector Calculus Marc Deisenroth @AIMS Rwanda, September 26, 2018 32