13 Neural Networks - Virginia Tech

Post on 16-Oct-2021

1 views 0 download

Transcript of 13 Neural Networks - Virginia Tech

Neural Networks

Intro to AI Bert Huang

Virginia Tech

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: back propagation

https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/Neuron#/media/File:GFPneuron.png

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Logistic Function�(x) =

1

1 + exp(�x)

limx!�1

�(x) = limx!�1

1

1 + exp(�x)= 0.0

Logistic Function�(x) =

1

1 + exp(�x)

�(0) =1

1 + exp(�0)=

1

1 + 1= 0.5

limx!1

�(x) = limx!1

1

1 + exp(�x)=

1

1= 1.0

From Features to Probability

f (x) :=1

1 + exp(�w>x)

Parameterizing p(y|x)p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

x1 x2 x3 x4 x5

w1w2 w3w4 w5

y

Multi-Layered Perceptronx1 x2 x3 x4 x5

w1w2 w3w4 w5

h1

Multi-Layered Perceptronx1 x2 x3 x4 x5

h1 h2

y

raw data

representation

prediction

pixel values

shapes

faces

roundshadows

Multi-Layered Perceptron

x1 x2 x3 x4 x5

h1 h2

yh = [h1, h2]>

h1 = �(w>11x) h2 = �(w>

12x)

p(y |x) = �(w>21h)

p(y |x) = �⇣w>21

⇥�(w>

11x),�(w>12x)

⇤>⌘

Decision Surface: Logistic Regression

Decision Surface: 2-Layer, 2 Hidden Units

Decision Surface: 2-Layer, More Hidden Units

3 hidden units 10 hidden units

Decision Surface: More Layers, More Hidden Units

4 layers, 10 hidden units each layer10 layers, 5 hidden units per layer

= L(W)

Training Neural Networks

minW

1n

n

∑i=1

l( f(xi, W), yi)

Wj ← Wj − α ( ∂L∂Wj )

average training error

Gradient Descent

Gradient Descent

Wj ← Wj − α ( ∂L∂Wj ) W1

W2

W3

L(W3)

L(W2)

L(W1) very positivetake big step left

almost zero take tiny step left

slightly negative take medium step right

Approximate Q-LearningQ̂(s, a) := g(s, a,✓) := ✓1f1(s, a) + ✓2f2(s, a) + . . .+ ✓dfd(s, a)

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘ @g

@✓i

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘fi(s, a)

Back Propagation• Back propagation:

• Compute hidden unit activations: forward propagation

• Compute gradient at output layer: error

• Propagate error back one layer at a time

• Chain rule via dynamic programming

Chain Rule Review

f(g(x))

d f(g(x))d x

=d f(g(x))d g(x)

⋅d g(x)

d x

g(x)x f(g(x))

Chain Rule on More Complex Functionh( f(a) + g(b))

d h( f(a) + g(b))d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d f(a)d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d g(b)

d b

d h( f(a) + g(b))d b

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

L(y)d Ld wn

=d Ld y

⋅d y

d wn

Intuition of Back Propagation• At each layer, calculate how much changing the input changes the

final output (derivative of final output w.r.t. layer’s input)

• Calculate directly for last layer

• For preceding layers, use calculation from next layer and work backwards through network

• Use that derivative to find how changing the weights affect the error of the final output

FYI: Matrix Formx

h1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

f(x, W) = s(Wm hm-1)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

(You will not be tested on this matrix form in

this course.)

FYI: Matrix Gradient Recipeh1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

f(x, W) = s(Wm hm-1)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W ))rWmJ = �mh>

m�1

rWm�1J = �m�1h>m�2

rWiJ = �ih>i�1

rW1J = �1x>

�m�1 = (W>m�m)� s0(Wm�1hm�2)

�i = (W>i+1�i+1)� s0(Wihi�1)

(You will not be tested on this matrix form in

this course.)

FYI: Matrix Gradient Recipe

h1 = s(W1x)

f(x, W) = s(Wm hm-1)

hi = s(Wi hi-1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W )) rWiJ = �ih>i�1

rW1J = �1x>

Feed Forward Propagation Back Propagation

�i = (W>i+1�i+1)� s0(Wihi�1)

(You will not be tested on this matrix form in

this course.)

Other New Aspects of Deep Learning

• GPU computation

• Differentiable programming

• Automatic differentiation

• Neural network structures

Types of Neural Network Structures

• Feed-forward

• Recurrent neural networks (RNNs)

• Good for analyzing sequences (text, time series)

• Convolutional neural networks (convnets, CNNs)

• Good for analyzing spatial data (images, videos)

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: backpropagation