13 Neural Networks - Virginia Tech

Neural Networks

Intro to AI Bert Huang

Virginia Tech

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: back propagation

https://en.wikipedia.org/wiki/Neuron#/media/File:Chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/Neuron#/media/File:GFPneuron.png

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Parameterizing p(y|x)

p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

Logistic Function�(x) =

1 + exp(�x)

limx!�1

�(x) = limx!�1

1 + exp(�x)= 0.0

Logistic Function�(x) =

1 + exp(�x)

�(0) =1

1 + exp(�0)=

1 + 1= 0.5

limx!1

�(x) = limx!1

1 + exp(�x)=

1= 1.0

From Features to Probability

f (x) :=1

1 + exp(�w>x)

Parameterizing p(y|x)p(y |x) := f

f : Rd ! [0, 1]

f (x) :=1

1 + exp(�w>x)

x1 x2 x3 x4 x5

w1w2 w3w4 w5

Multi-Layered Perceptronx1 x2 x3 x4 x5

w1w2 w3w4 w5

Multi-Layered Perceptronx1 x2 x3 x4 x5

raw data

representation

prediction

pixel values

shapes

roundshadows

Multi-Layered Perceptron

x1 x2 x3 x4 x5

yh = [h1, h2]>

h1 = �(w>11x) h2 = �(w>

p(y |x) = �(w>21h)

p(y |x) = �⇣w>21

⇥�(w>

11x),�(w>12x)

⇤>⌘

Decision Surface: Logistic Regression

Decision Surface: 2-Layer, 2 Hidden Units

Decision Surface: 2-Layer, More Hidden Units

3 hidden units 10 hidden units

Decision Surface: More Layers, More Hidden Units

4 layers, 10 hidden units each layer10 layers, 5 hidden units per layer

= L(W)

Training Neural Networks

∑i=1

l( f(xi, W), yi)

Wj ← Wj − α ( ∂L∂Wj )

average training error

Gradient Descent

Wj ← Wj − α ( ∂L∂Wj ) W1

L(W1) very positivetake big step left

almost zero take tiny step left

slightly negative take medium step right

Approximate Q-LearningQ̂(s, a) := g(s, a,✓) := ✓1f1(s, a) + ✓2f2(s, a) + . . .+ ✓dfd(s, a)

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘ @g

✓i ✓i + ↵⇣R(s) + �max

a0Q̂(s0, a0)� Q̂(s, a)

⌘fi(s, a)

Back Propagation• Back propagation:

• Compute hidden unit activations: forward propagation

• Compute gradient at output layer: error

• Propagate error back one layer at a time

• Chain rule via dynamic programming

Chain Rule Review

f(g(x))

d f(g(x))d x

=d f(g(x))d g(x)

⋅d g(x)

g(x)x f(g(x))

Chain Rule on More Complex Functionh( f(a) + g(b))

d h( f(a) + g(b))d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d f(a)d a

d h( f(a) + g(b))d f(a) + g(b)

⋅d g(b)

d h( f(a) + g(b))d b

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

Back to Neural Networksraw input x

hidden layer h1

hidden layer hn-1

final output y

weights w1

weights wn-1

weights wn y = f(hn−1, wn)

hn−1 = f(hn−2, wn−1)

h1 = f(x, w1)

L(y)d Ld wn

=d Ld y

⋅d y

Intuition of Back Propagation• At each layer, calculate how much changing the input changes the

final output (derivative of final output w.r.t. layer’s input)

• Calculate directly for last layer

• For preceding layers, use calculation from next layer and work backwards through network

• Use that derivative to find how changing the weights affect the error of the final output

FYI: Matrix Formx

h1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

f(x, W) = s(Wm hm-1)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

(You will not be tested on this matrix form in

this course.)

FYI: Matrix Gradient Recipeh1 = s(W1x)

hm-1 = s(Wm-1 hm-2)

h2 = s(W2 h1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W ))rWmJ = �mh>

rWm�1J = �m�1h>m�2

rWiJ = �ih>i�1

rW1J = �1x>

�m�1 = (W>m�m)� s0(Wm�1hm�2)

�i = (W>i+1�i+1)� s0(Wihi�1)

this course.)

FYI: Matrix Gradient Recipe

h1 = s(W1x)

hi = s(Wi hi-1)

J(W ) = `(f(x,W ))

�m = `0(f(x,W )) rWiJ = �ih>i�1

rW1J = �1x>

Feed Forward Propagation Back Propagation

�i = (W>i+1�i+1)� s0(Wihi�1)

this course.)

Other New Aspects of Deep Learning

• GPU computation

• Differentiable programming

• Automatic differentiation

• Neural network structures

Types of Neural Network Structures

• Feed-forward

• Recurrent neural networks (RNNs)

• Good for analyzing sequences (text, time series)

• Convolutional neural networks (convnets, CNNs)

• Good for analyzing spatial data (images, videos)

Outline

• Biological inspiration for artificial neural networks

• Linear vs. nonlinear functions

• Learning with neural networks: backpropagation

13 Neural Networks - Virginia Tech

Documents

Transcript of 13 Neural Networks - Virginia Tech

Virginia tech collections_presentation

pr. - Virginia Tech

Parhum Delgoshaei Chelsea Green Vinod K. Lohani Virginia Tech Virginia Tech Virginia Tech AbstractAssessment Applications of Dataflow Programming in Sustainability.

Neutrinos - Virginia Tech

YMCA @ Virginia Tech

1974 - Virginia Tech

ENGINEERINGNOW - Virginia Tech

Virginia tech massacre

Virginia Tech report

ELECTRICITY - Virginia Cooperative Extension - Virginia Tech

lll - Virginia Tech

Introduction - Virginia Tech

Virginia Tech simplifies high - QBS Softwareftp.qbssoftware.com/public/QBS/Opentext - Virginia Tech...Virginia Tech simplifies high performance computing usage Virginia Tech no longer

by - Virginia Tech

Virginia Tech Manufacturing

Neural-Network and Fuzzy-Logic Learning and - Virginia Tech

JESSICA BOATWRIGHT (VIRGINIA TECH) KURT STEPHENSON, PHD (VIRGINIA TECH) KEVIN BOYLE, PHD (VIRGINIA TECH) SARA NIENOW (NORTH CAROLINA DEPARTMENT OF ENVIRONMENT.

Soft Matter - Virginia Tech | Invent the Future | Virginia Tech

Virginia Tech Crisis

L. - Virginia Tech