ELEC 576: Neural Networks & Backpropagation
Lecture 3Ankit B. Patel
Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)
Outline• Neural Networks
• Definition of NN and terminology
• Review of (Old) Theoretical Results about NNs
• Intuition for why compositions of nonlinear functions are more expressive
• Expressive power theorems [McC-Pitts, Rosenblatt, Cybenko]
• Backpropagation algorithm (Gradient Descent + Chain Rule)
• History of backprop summary
• Gradient descent (Review).
• Chain Rule (Review).
• Backprop
• Intro to Convnets
• Convolutional Layer, ReLu, Max-Pooling
Neural Networks
Neural Network: Definitions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Input Units
Output Units
Net Input (Output)
Activation
Neural Networks: Activation Functions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Neural Networks: Definitions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Feedforward Propagation: Scalar Form
Input Units
Output Units
Net Input
(Output) Activation
Hidden Units
Neural Networks: Definitions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Feedforward Propagation: Vector Form
Input Units
Output Units
Net Input
(Output) Activation
Hidden Units
Neural Networks: Definitions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Deep Feedforward Propagation: Vector Form
Neural Networks: Definitions
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
The Training Objective
Expressive Power Theorems
Compositions of Nonlinear Functions are more expressive
[Yoshua Bengio]
McCulloch-Pitts Neurons
Expressive Power of McCulloch-Pitts Nets
The Perceptron (Rosenblatt)
Limitations of Perceptron• Rosenblatt was overly enthusiastic about the perceptron and made the ill-timed
proclamation that:
• "Given an elementary α-perceptron, a stimulus world W, and any classification C(W) for which a solution exists; let all stimuli in W occur in any sequence, provided that each stimulus must reoccur in finite time; then beginning from an arbitrary initial state, an error correction procedure will always yield a solution to C(W) in finite time…” [4]
• In 1969, Marvin Minsky and Seymour Papert showed that the perceptron could only solve linearly separable functions. Of particular interest was the fact that the perceptron still could not solve the XOR and NXOR functions.
• Problem outlined by Minsky and Papert can be solved by deep NNs. However, many of the artificial neural networks in use today still stem from the early advances of the McCulloch-Pitts neuron and the Rosenblatt perceptron.
Universal Approximation Theorem [Cybenko 1989, Hornik 1991]
• https://en.wikipedia.org/wiki/Universal_approximation_theorem
Universal Approximation Theorem• https://en.wikipedia.org/wiki/Universal_approximation_theorem
• Shallow neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.
• Proved by George Cybenko in 1989 for sigmoid activation functions.[2]
• Kurt Hornik showed in 1991[3] that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators.
Question (5 min):Why is the theorem true? What is the intuition?
What happens when you go deep? Try iterating f(x) = x^2 vs. f(x) = ax+b
Training Neural Networks Via Gradient Descent
Gradient Descent
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Gradient Descent
Gradient Descent
Question:What kind of problems might you run into with Gradient Descent? (4 min)
Global Optima is not Guaranteed
Learning Rate Needs to Be Carefully Chosen
Training Neural Networks: Computing Gradients Efficiently
with the Backpropagation Algorithm
Chain Rule
Chain Rule
Chain Rule
Exercise:Do Chain Rule on
a nested function (2 min)
Backpropagation is an efficient way to compute gradients
• a.k.a. Reverse Mode Automatic Differentiation (AD), and based on a systematic application of the chain rule. It is fast for low-dimensional outputs. For one output (e.g. a scalar loss function), time to compute gradients with respect to ALL inputs is proportional to the time to compute the output. An explicit mathematical expression of the output is not required, only an algorithm to compute it.
• it is NOT the same as symbolic differentiation (e.g. mathematica).
• Numerical/Finite Differences are slow for high-dimensional inputs (e.g. model parameters) and outputs. For a single output, time to compute gradients scales as the number of inputs. May suffer from issues of floating point precision and requires a choice of a parameter increment.
https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
Centered Finite DifferenceGeometrical
Secant
There is no equivalent Cheap Jacobian Principle or Cheap Hessian Principle
The Cheap Gradient Principle in Backpropagation: The time complexity scales up to the number of
operations performed in the forward pass
https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf
𝙾𝙿𝚂 {F′�(x)} ≤ 𝚖 ω 𝙾𝙿𝚂 {F(x)}
for polynomial operations and OPS counting the number of multiplications
𝙾𝙿𝚂 {∇𝚏(x)} ≤ ω 𝙾𝙿𝚂 {𝚏(x)}
ω = 3
ω ∼ 5x Is a multidimensional input,
F(x)More generally, for an m-dimensional output
The Spatial Complexity of Backpropagation scales with the number of operations
performed in the forward pass
https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf
𝙼𝙴𝙼 {F′�(x)} ∼ 𝙾𝙿𝚂 {F(x)} ≳ 𝙼𝙴𝙼 {F(x)}
There is no Cheap Jacobian Principle or Cheap Hessian Principle but a Jacobian-vector product can be computed as efficiently as the gradient, and a Hessian-vector
product can be computed efficiently in O(n) instead of O(nxn)
Temporal Complexity in Automatic Differentiation
https://arxiv.org/pdf/1502.05767.pdf https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf
𝙾𝙿𝚂 {F′�(x)} ≤ 𝚖 ω 𝙾𝙿𝚂 {F(x)}
x Is a n-dimensional input, F(x) Is an m-dimensional output
Reverse Mode:
𝙾𝙿𝚂 {F′�(x)} ≤ 𝚗 ω 𝙾𝙿𝚂 {F(x)}Forward Mode:
ω < 6
How to Learn NNs? History of the Backpropagation Algorithm (1960-86)
• Introduced by Henrey J. Kelley (1960) and Arthur Bryson (1961) in control theory, using Dynamic Programming
• Simpler derivation using Chain Rule by Stephen Dreyfus (1962)
• General method for Automatic Differentiation by Seppo Linnainamaa (1970)
• Using backdrop for parameters of controllers minimizing error by Stuart Dreyfus (1973)
• Backprop brought into NN world by Paul Werbos (1974)
• Used it to learn representations in hidden layers of NNs by Rumelhart, Hinton & Williams (1986)
Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
Output calculation
Pass Pass
Backpropagation Example (5 min)
Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
Output calculation
Gradient calculationLinked by the chain rule
Pass Pass
Backpropagation Example
Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
The backward pass computes the derivative of the single output J wrt all
inputs efficiently
Pass Pass
xj , ✓j , a, y<latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit>
Backpropagation Example
Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
How efficient? The backward pass takes time proportional to making
the forward pass.
Pass Pass
Backpropagation Example
Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf
The values of the derivatives are computed at each step. Backprop does not store their mathematical
expressions, unlike in symbolic differentiation
Pass Pass
Backpropagation Example
When is Backpropagation efficient
Backprop(Reverse Mode AD)
FD
SymbolicDifferentiation
@J({✓i}, {xj})@✓1
,
@J({✓i}, {xj})@✓2
, ...
<latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit>
High-dimensional inputs
YESCheap Gradient Principle
time cost ~ one forward pass
NOtime cost is multiple forward passes; 2 PER input
May not beFormula for J can grow exponentially in size,
aka Expression Swell(https://arxiv.org/pdf/1502.05767.pdf)
Efficient?High-dimensional outputs
@J1({✓i}, {xj})@✓
,
@J2({✓i}, {xj})@✓
, ...
<latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit>
NO
May not beUnless common subexpressions are leveraged
May not be
YESForward Mode AD NO
Pseudo-Code for Backprop: Scalar Form
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Pseudo-Code for Backprop: Matrix-Vector Form
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Gradient Descent for Neural Networks
http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
Backpropagation: Network View
Another Deeper Example (for practice)
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Example
[Fei-Fei Li, Andrej Karpathy, Justin Johnson]
Question:What problems might you encounter with deeply nested functions? (3 min)
Visualizing Backprop during Training:Classification with 2-Layer Neural Network
• http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
• Try playing around with this app to build intuition:
• change datapoints to see how decision boundaries change
• change network layer types, widths, activation functions, etc.
• try shallower vs deeper
Top Related