Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural...
Transcript of Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural...
![Page 1: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/1.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 1
Lecture 4: Backpropagation
and Neural Networks (part 1)
Tuesday January 31, 2017
![Page 2: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/2.jpg)
comp150dl
Announcements!- If you are adversely affected by immigration ban, please talk to me about
accommodations
- Send in paper choices by tonight
- Should be able to run Jupyter server on Tufts was and network machines now
- (deep-venv)> pip install --upgrade jupyter
- hw1 deadline in two days — Thurs Feb 2: Don’t forget to read the course notes.
- Redo calculation of dL/dW for hinge loss2
![Page 3: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/3.jpg)
comp150dl
Python/Numpy of the Day
- y_pred = scores.argmax(axis=1)
- inds = np.random.choice(X.shape[0],batch_size)
- randomly select N numbers in a range,
- useful for subsampling
- [:,np.newaxis]
- reshapes matrices of size (N,) to size (N,1)
3
![Page 4: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/4.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 4
want
scores function
SVM loss
data loss + regularization
Where we are...
![Page 5: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/5.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 5
(image credits to Alec Radford)
Optimization
![Page 6: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/6.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 6
Gradient Descent
Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :(
In practice: Derive analytic gradient, check your implementation with numerical gradient
![Page 7: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/7.jpg)
comp150dl
Hinge Loss Gradient wrt Weights W
• We want the Jacobian Matrix of all gradients
• partial derivatives of all output dimensions by all input dimensions
7http://cs231n.github.io/optimization-1/#analytic
margin size, usually 1.0
For all rows of dW where the row corresponds to the GT value for that training instance, i.e. For all rows of dW where
![Page 8: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/8.jpg)
comp150dl
Softmax Loss Gradient wrt Score S
8eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
* note change of subscripts from last slide
Skipping some steps for space, please see original notes.
![Page 9: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/9.jpg)
comp150dl
Softmax Loss Gradient wrt Score S
9eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
Skipping some steps for space, please see original notes.
![Page 10: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/10.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 10
Computational Graph
x
W
* hinge loss
R
+ Ls (scores)
![Page 11: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/11.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 11
Convolutional Network (AlexNet)
input image weights
loss
![Page 12: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/12.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 12
e.g. x = -2, y = 5, z = -4
![Page 13: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/13.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 13
e.g. x = -2, y = 5, z = -4
Want:
![Page 14: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/14.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 14
e.g. x = -2, y = 5, z = -4
Want:
![Page 15: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/15.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 15
e.g. x = -2, y = 5, z = -4
Want:
![Page 16: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/16.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 16
e.g. x = -2, y = 5, z = -4
Want:
![Page 17: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/17.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 17
e.g. x = -2, y = 5, z = -4
Want:
![Page 18: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/18.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 18
e.g. x = -2, y = 5, z = -4
Want:
![Page 19: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/19.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 19
e.g. x = -2, y = 5, z = -4
Want:
![Page 20: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/20.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 20
e.g. x = -2, y = 5, z = -4
Want:
![Page 21: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/21.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 21
e.g. x = -2, y = 5, z = -4
Want:
Chain rule:
![Page 22: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/22.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 22
e.g. x = -2, y = 5, z = -4
Want:
![Page 23: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/23.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 23
e.g. x = -2, y = 5, z = -4
Want:
Chain rule:
![Page 24: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/24.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 24
f
activations
![Page 25: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/25.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 25
f
activations
“local gradient”
![Page 26: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/26.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 26
f
activations
“local gradient”
gradients
![Page 27: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/27.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 27
f
activations
gradients
“local gradient”
![Page 28: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/28.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 28
f
activations
gradients
“local gradient”
![Page 29: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/29.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 29
f
activations
gradients
“local gradient”
![Page 30: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/30.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 30
Another example:
![Page 31: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/31.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 31
Another example:
![Page 32: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/32.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 32
Another example:
![Page 33: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/33.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 33
Another example:
![Page 34: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/34.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 34
Another example:
![Page 35: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/35.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 35
Another example:
![Page 36: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/36.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 36
Another example:
![Page 37: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/37.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 37
Another example:
![Page 38: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/38.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 38
Another example:
![Page 39: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/39.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 39
Another example:
(-1) * (-0.20) = 0.20
![Page 40: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/40.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 40
Another example:
![Page 41: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/41.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 41
Another example:
[local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!)
![Page 42: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/42.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 42
Another example:
![Page 43: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/43.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 43
Another example:
[local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2
![Page 44: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/44.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 44
sigmoid function
sigmoid gate
![Page 45: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/45.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 45
sigmoid function
sigmoid gate
(0.73) * (1 - 0.73) = 0.2
![Page 46: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/46.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 46
Patterns in backward flow
add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”?
![Page 47: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/47.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 47
Gradients add at branches
+
![Page 48: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/48.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 48
Implementation: forward/backward API
Graph (or Net) object. (Rough psuedo code)
![Page 49: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/49.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 49
Implementation: forward/backward API
(x,y,z are scalars)
*
x
y
z
![Page 50: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/50.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 50
Implementation: forward/backward API
(x,y,z are scalars)
*
x
y
z
![Page 51: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/51.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 51
Example: Torch Layers
![Page 52: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/52.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 52
Example: Torch Layers
=
![Page 53: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/53.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 53
Example: Torch MulConstant
initialization
forward()
backward()
![Page 54: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/54.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 54
Example: Caffe Layers
![Page 55: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/55.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 55
Caffe Sigmoid Layer
*top_diff (chain rule)
![Page 56: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/56.jpg)
comp150dl 56
![Page 57: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/57.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 57
Gradients for vectorized code
f
“local gradient”
This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)
(x,y,z are now vectors)
gradients
![Page 58: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/58.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 58
Vectorized operations
f(x) = max(0,x) (elementwise)
4096-d input vector
4096-d output vector
![Page 59: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/59.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 59
Vectorized operations
f(x) = max(0,x) (elementwise)
4096-d input vector
4096-d output vector
Q: what is the size of the Jacobian matrix?
Jacobian matrix
![Page 60: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/60.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 60
max(0,x) (elementwise)
4096-d input vector
4096-d output vector
Q: what is the size of the Jacobian matrix? [4096 x 4096!]
Q2: what does it look like?
Vectorized operations
Jacobian matrix
f(x) = max(0,x) (elementwise)
![Page 61: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/61.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 61
max(0,x) (elementwise)
100 4096-d input vectors
100 4096-d output vectors
Vectorized operations
in practice we process an entire minibatch (e.g. 100) of examples at one time:
i.e. Jacobian would technically be a [409,600 x 409,600] matrix :\
f(x) = max(0,x) (elementwise)
![Page 62: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/62.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 62
Assignment: Writing SVM/Softmax Stage your forward/backward computation!
E.g. for the SVM:margins
![Page 63: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/63.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 63
Summary so far
- neural nets will be very large: no hope of writing down gradient formula by hand for all parameters
- backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates
- implementations maintain a graph structure, where the nodes implement the forward() / backward().
- forward: compute result of an operation and save any intermediates needed for gradient computation in memory
- backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs.
![Page 64: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/64.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 64
Neural Network so far:
(Before) Linear score function:
![Page 65: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/65.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 65
(Before) Linear score function:
(Now) 2-layer Neural Network
Neural Network so far:
![Page 66: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/66.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 66
(Before) Linear score function:
(Now) 2-layer Neural Network
x hW1 sW2
3072 100 10
Neural Network so far:
![Page 67: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/67.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl
Neural Network so far:
67
(Before) Linear score function:
(Now) 2-layer Neural Network
x hW1 sW2
3072 100 10
![Page 68: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/68.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 68
(Before) Linear score function:
(Now) 2-layer Neural Network or 3-layer Neural Network
Neural Network so far:
![Page 69: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/69.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 69
Full implementation of training a 2-layer Neural Network needs ~11 lines:
from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/
Forward pass
Backward pass
backprop of derivative
![Page 70: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/70.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 70
Assignment: Writing 2layer Net Stage your forward/backward computation!
![Page 71: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/71.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 71
![Page 72: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/72.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 72
![Page 73: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/73.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 73
![Page 74: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/74.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 74
sigmoid activation function
![Page 75: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/75.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 75
![Page 76: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/76.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 76
Be very careful with your Brain analogies:
Biological Neurons: - Many different types - Dendrites can perform complex non-
linear computations - Synapses are not a single weight but
a complex non-linear dynamical system
- Rate code may not be adequate
[Dendritic Computation. London and Hausser]
![Page 77: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/77.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 77
Activation Functions
Sigmoid
tanh tanh(x)
ReLU max(0,x)
Leaky ReLU max(0.1x, x)
Maxout
ELU
![Page 78: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/78.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 78
Neural Networks: Architectures
“Fully-connected” layers“2-layer Neural Net”, or “1-hidden-layer Neural Net”
“3-layer Neural Net”, or “2-hidden-layer Neural Net”
![Page 79: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/79.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 79
Example Feed-forward computation of a Neural Network
We can efficiently evaluate an entire layer of neurons.
![Page 80: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/80.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 80
Example Feed-forward computation of a Neural Network
![Page 81: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/81.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 81
Setting the number of layers and their sizes
more neurons = more capacity
![Page 82: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/82.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 82
(you can play with this demo over at ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)
Do not use size of neural network as a regularizer. Use stronger regularization instead:
![Page 83: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/83.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 83
Summary
- we arrange neurons into fully-connected layers - the abstraction of a layer has the nice property that it
allows us to use efficient vectorized code (e.g. matrix multiplies)
- neural networks are not really neural - neural networks: bigger = better (but might have to
regularize more strongly)
![Page 84: Lecture 4: Backpropagation and Neural Networks (part 1 · Lecture 4: Backpropagation and Neural Networks (part 1) Tuesday January 31, 2017. comp150dl ... - neural nets will be very](https://reader030.fdocuments.us/reader030/viewer/2022041000/5ea0671c05a3323b2a4cd0ab/html5/thumbnails/84.jpg)
* Original slides borrowed from Andrej Karpathy and Li Fei-Fei, Stanford cs231n comp150dl 84
Next Lecture:
More than you ever wanted to know about Neural Networks and how to train them.