13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural...
Transcript of 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural...
![Page 1: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/1.jpg)
Neural Networks
Intro to AI Bert Huang
Virginia Tech
![Page 2: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/2.jpg)
Outline
• Biological inspiration for artificial neural networks
• Linear vs. nonlinear functions
• Learning with neural networks: back propagation
![Page 3: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/3.jpg)
https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg
![Page 4: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/4.jpg)
https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg
https://en.wikipedia.org/wiki/Neuron#/media/File:GFPneuron.png
![Page 5: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/5.jpg)
Parameterizing p(y|x)
p(y |x) := f
f : Rd ! [0, 1]
f (x) :=1
1 + exp(�w>x)
![Page 6: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/6.jpg)
Parameterizing p(y|x)
p(y |x) := f
f : Rd ! [0, 1]
f (x) :=1
1 + exp(�w>x)
![Page 7: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/7.jpg)
Logistic Function�(x) =
1
1 + exp(�x)
![Page 8: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/8.jpg)
limx!�1
�(x) = limx!�1
1
1 + exp(�x)= 0.0
Logistic Function�(x) =
1
1 + exp(�x)
�(0) =1
1 + exp(�0)=
1
1 + 1= 0.5
limx!1
�(x) = limx!1
1
1 + exp(�x)=
1
1= 1.0
![Page 9: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/9.jpg)
From Features to Probability
f (x) :=1
1 + exp(�w>x)
![Page 10: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/10.jpg)
Parameterizing p(y|x)p(y |x) := f
f : Rd ! [0, 1]
f (x) :=1
1 + exp(�w>x)
x1 x2 x3 x4 x5
w1w2 w3w4 w5
y
![Page 11: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/11.jpg)
Multi-Layered Perceptronx1 x2 x3 x4 x5
w1w2 w3w4 w5
h1
![Page 12: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/12.jpg)
Multi-Layered Perceptronx1 x2 x3 x4 x5
h1 h2
y
raw data
representation
prediction
pixel values
shapes
faces
roundshadows
![Page 13: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/13.jpg)
Multi-Layered Perceptron
x1 x2 x3 x4 x5
h1 h2
yh = [h1, h2]>
h1 = �(w>11x) h2 = �(w>
12x)
p(y |x) = �(w>21h)
p(y |x) = �⇣w>21
⇥�(w>
11x),�(w>12x)
⇤>⌘
![Page 14: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/14.jpg)
Decision Surface: Logistic Regression
![Page 15: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/15.jpg)
Decision Surface: 2-Layer, 2 Hidden Units
![Page 16: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/16.jpg)
Decision Surface: 2-Layer, More Hidden Units
3 hidden units 10 hidden units
![Page 17: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/17.jpg)
Decision Surface: More Layers, More Hidden Units
4 layers, 10 hidden units each layer10 layers, 5 hidden units per layer
![Page 18: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/18.jpg)
= L(W)
Training Neural Networks
minW
1n
n
∑i=1
l( f(xi, W), yi)
Wj ← Wj − α ( ∂L∂Wj )
average training error
Gradient Descent
![Page 19: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/19.jpg)
Gradient Descent
Wj ← Wj − α ( ∂L∂Wj ) W1
W2
W3
L(W3)
L(W2)
L(W1) very positivetake big step left
almost zero take tiny step left
slightly negative take medium step right
![Page 20: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/20.jpg)
Approximate Q-LearningQ̂(s, a) := g(s, a,✓) := ✓1f1(s, a) + ✓2f2(s, a) + . . .+ ✓dfd(s, a)
✓i ✓i + ↵⇣R(s) + �max
a0Q̂(s0, a0)� Q̂(s, a)
⌘ @g
@✓i
✓i ✓i + ↵⇣R(s) + �max
a0Q̂(s0, a0)� Q̂(s, a)
⌘fi(s, a)
![Page 21: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/21.jpg)
Back Propagation• Back propagation:
• Compute hidden unit activations: forward propagation
• Compute gradient at output layer: error
• Propagate error back one layer at a time
• Chain rule via dynamic programming
![Page 22: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/22.jpg)
![Page 23: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/23.jpg)
Chain Rule Review
f(g(x))
d f(g(x))d x
=d f(g(x))d g(x)
⋅d g(x)
d x
g(x)x f(g(x))
![Page 24: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/24.jpg)
Chain Rule on More Complex Functionh( f(a) + g(b))
d h( f(a) + g(b))d a
d h( f(a) + g(b))d f(a) + g(b)
⋅d f(a)d a
d h( f(a) + g(b))d f(a) + g(b)
⋅d g(b)
d b
d h( f(a) + g(b))d b
![Page 25: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/25.jpg)
Back to Neural Networksraw input x
hidden layer h1
hidden layer hn-1
final output y
…
weights w1
weights wn-1
weights wn y = f(hn−1, wn)
hn−1 = f(hn−2, wn−1)
h1 = f(x, w1)
![Page 26: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/26.jpg)
Back to Neural Networksraw input x
hidden layer h1
hidden layer hn-1
final output y
…
weights w1
weights wn-1
weights wn y = f(hn−1, wn)
hn−1 = f(hn−2, wn−1)
h1 = f(x, w1)
L(y)d Ld wn
=d Ld y
⋅d y
d wn
![Page 27: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/27.jpg)
Intuition of Back Propagation• At each layer, calculate how much changing the input changes the
final output (derivative of final output w.r.t. layer’s input)
• Calculate directly for last layer
• For preceding layers, use calculation from next layer and work backwards through network
• Use that derivative to find how changing the weights affect the error of the final output
![Page 28: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/28.jpg)
FYI: Matrix Formx
h1 = s(W1x)
…
hm-1 = s(Wm-1 hm-2)
f(x, W) = s(Wm hm-1)
…
h2 = s(W2 h1)
J(W ) = `(f(x,W ))
(You will not be tested on this matrix form in
this course.)
![Page 29: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/29.jpg)
FYI: Matrix Gradient Recipeh1 = s(W1x)
…
hm-1 = s(Wm-1 hm-2)
f(x, W) = s(Wm hm-1)
h2 = s(W2 h1)
J(W ) = `(f(x,W ))
�m = `0(f(x,W ))rWmJ = �mh>
m�1
rWm�1J = �m�1h>m�2
rWiJ = �ih>i�1
rW1J = �1x>
�m�1 = (W>m�m)� s0(Wm�1hm�2)
�i = (W>i+1�i+1)� s0(Wihi�1)
(You will not be tested on this matrix form in
this course.)
![Page 30: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/30.jpg)
FYI: Matrix Gradient Recipe
h1 = s(W1x)
f(x, W) = s(Wm hm-1)
hi = s(Wi hi-1)
J(W ) = `(f(x,W ))
�m = `0(f(x,W )) rWiJ = �ih>i�1
rW1J = �1x>
Feed Forward Propagation Back Propagation
�i = (W>i+1�i+1)� s0(Wihi�1)
(You will not be tested on this matrix form in
this course.)
![Page 31: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/31.jpg)
Other New Aspects of Deep Learning
• GPU computation
• Differentiable programming
• Automatic differentiation
• Neural network structures
![Page 32: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/32.jpg)
Types of Neural Network Structures
• Feed-forward
• Recurrent neural networks (RNNs)
• Good for analyzing sequences (text, time series)
• Convolutional neural networks (convnets, CNNs)
• Good for analyzing spatial data (images, videos)
![Page 33: 13 Neural Networks - Virginia Techcourses.cs.vt.edu/cs4804/Fall18/slide_pdfs/13 Neural Networks.pdfIntuition of Back Propagation • At each layer, calculate how much changing the](https://reader033.fdocuments.us/reader033/viewer/2022050117/5f4dada20b300f6ef5329ec4/html5/thumbnails/33.jpg)
Outline
• Biological inspiration for artificial neural networks
• Linear vs. nonlinear functions
• Learning with neural networks: backpropagation