Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural...

79
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316

Transcript of Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural...

Page 1: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Intro to Neural Networks and Deep Learning

Jack LanchantinDr. Yanjun Qi

1

UVA CS 6316

Page 2: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice2

Page 3: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

3

x1

x2

x3

W1

w3

W2

x ŷ

Page 4: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

ewx+b

1 + ewx+b

Logistic Regression

Sigmoid Function(aka logistic, logit, “S”, soft-step)

4

P(Y=1|x) =

x

Page 5: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

ez

1 + ez

Expanded Logistic Regression

x1

x2

x3

Σ

+1

z

z = wT x + b

y = sigmoid(z) =5

px11xp

p = 3

1x1

w1

w2

w3

b1SummingFunction

SigmoidFunction

Multiply by weights

ŷ = P(Y=1|x,w)

Input x

Page 6: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

“Neuron”

x1

x2

x3

Σ

+1

z

6

w1

w2

w3

b1

Page 7: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

7http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 8: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neuron

x1

x2

x3

Σ z ŷ

8

ez

1 + ez

z = wT x

ŷ = sigmoid(z) =px11xp1x1

From here on, we leave out bias for simplicity

Input x

w1

w2

w3

Page 9: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

“Block View” of a Neuron

x *

Dot Product Sigmoid

w z

9

y

Input output

Dot Product SigmoidInput output

x *w z ŷ

parameterized block

ez

1 + ez

z = wT x

ŷ = sigmoid(z) =px11xp1x1

Page 10: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neuron Representation

Σ

10

The linear transformation and nonlinearity together is typically considered a single neuron

ŷ

x1

x2

x3

x

w1

w2

w3

Page 11: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neuron Representation

*w

11

ŷx

The linear transformation and nonlinearity together is typically considered a single neuron

Page 12: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural NetworkMulti-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice12

Page 13: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

13

x1

x2

x3

W1

w3

W2

x ŷ

Page 14: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Σ

Σ

Σ

Σ

Input x Output ŷ

14

ŷ1

ŷ2

ŷ3

ŷ4

Wz

Linear Sigmoid

1 layer

matrixvector

Page 15: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Σ

Σ

Σ

Σ

Input x

15

d = 4

p = 3

Wz

Output ŷ

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

Element-wise on vector z

ŷ1

ŷ2

ŷ3

ŷ4

Page 16: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

1-Layer Neural Network (with 4 neurons)

x1

x2

x3

Input x

16

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

d = 4

p = 3

W Σ

Σ

Σ

Σ

Output ŷ

ŷ1

ŷ2

ŷ3

ŷ4

Element-wise on vector z

Page 17: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

“Block View” of a Neural Network

x *

Dot Product Sigmoid

w z

17

y

Input output

Dot Product SigmoidInput output

x *W z ŷ

W is now a matrix

ez

1 + ez

z =WT x

ŷ = sigmoid(z) =px1dxpdx1

dx1 dx1

z is now a vector

Page 18: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural NetworkLoss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice18

Page 19: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

19

x1

x2

x3

W1

w3

W2

x ŷ

Page 20: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Multi-Layer Neural Network(Multi-Layer Perceptron (MLP) Network)

20

Outputlayer

x1

x2

x3

x ŷ

W1

w2

Hiddenlayer

weight subscript represents layer number

2-layer NN

Page 21: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Multi-Layer Neural Network (MLP)

21

1st hidden layer

2nd hiddenlayer

Outputlayer

x1

x2

x3

x ŷ

3-layer NN

W1

w3

W2

Page 22: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Multi-Layer Neural Network (MLP)

22

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =wT h2 ŷ = sigmoid(z3)

x1

x2

x3

h1 h2

W1

w3

W2

x ŷ

1

2

3

hidden layer 1 output

Page 23: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Multi-Class Output MLP

x1

x2

x3

x ŷ

23

h1 h2

z1 =WT xh1 = sigmoid(z1)z2 =WT h1 h2 = sigmoid(z2)z3 =WT h2 ŷ = sigmoid(z2)

1

2

3

W1

w3

W2

Page 24: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

“Block View” Of MLP

x y

1st hidden layer

2nd hidden layer

Output layer

24

*W1

*W2

*W3

z1 z2 z3h1 h2

Page 25: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

“Deep” Neural Networks (i.e. > 1 hidden layer)

25Researchers have successfully used 1000 layers to train an object classifier

Page 26: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss FunctionsBackpropagation

Nonlinearity Functions

NNs in Practice26

Page 27: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

27

x1

x2

x3

W1

w3

W2

x ŷ E (ŷ)

Page 28: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

ŷ = P(y=1|X,W)

28

x1

x2

x3

x

Binary Classification Loss

E= loss = - logP(Y = ŷ | X= x ) = - y log(ŷ) - (1 - y) log(1-ŷ)

W1

w3

W2

this example is for a single sample x

true output

Page 29: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

29

x1

x2

x3

Σ

W1

w3

W2

x

Regression Loss

E = loss= ( y - ŷ )212

ŷ

true output

Page 30: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

z1

z2

z3

30

x1

x2

x3

x

Σ

Σ

Σ

ŷ1

ŷ2

ŷ3

Multi-Class Classification Loss

“Softmax” function. Normalizing function which converts each class output to a probability.

E = loss = - yj ln ŷjΣj = 1...K

= P( ŷi = 1 | x )

W1 W3

W2

ŷi

“0” for all except true class

K = 3

Page 31: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

BackpropagationNonlinearity Functions

NNs in Practice31

Page 32: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

32

x1

x2

x3

W1

w3

W2

x ŷ E (ŷ)

Page 33: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Training Neural Networks

33

How do we learn the optimal weights WL for our task??● Gradient descent:

LeCun et. al. Efficient Backpropagation. 1998

WL(t+1) = WL(t) - E WL(t)

But how do we get gradients of lower layers?● Backpropagation!

○ Repeated application of chain rule of calculus○ Locally minimize the objective○ Requires all “blocks” of the network to be differentiable

x ŷ

W1w3

W2

Page 34: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

34

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 35: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

35

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 36: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

36

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 37: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

37

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 38: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

38

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 39: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

39

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 40: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

40

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 41: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

41

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 42: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

42

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 43: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

43

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 44: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

44

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 45: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

45

Backpropagation Intro

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 46: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

46

Backpropagation Intro

Tells us: by increasing x by a scale of 1, we decrease f by a scale of 4

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 47: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

47

ŷ = P(y=1|X,W)

x1

x2

x3

W1

w2

x

Backpropagation(binary classification example)

Example on 1-hidden layer NN for binary classification

Page 48: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

48

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 49: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

49

x y*W1

*w2

z1z2h1

E = loss =

Backpropagation(binary classification example)

Gradient Descent to Minimize loss: Need to find these!

Page 50: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

50

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 51: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

51

Backpropagation(binary classification example)

= ??

= ??

x y*W1

*w2

z1z2h1

Page 52: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

52

Backpropagation(binary classification example)

= ??

= ??

Exploit the chain rule!

x y*W1

*w2

z1z2h1

Page 53: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

53

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 54: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

54

Backpropagation(binary classification example)

chain rule

x y*W1

*w2

z1z2h1

Page 55: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

55

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 56: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

56

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 57: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

57

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 58: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

58

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 59: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

59

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 60: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

60

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 61: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

61

Backpropagation(binary classification example)

x y*W1

*w2

z1z2h1

Page 62: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

62

Backpropagation(binary classification example)

already computed

x y*W1

*w2

z1z2h1

Page 63: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

63

“Local-ness” of Backpropagation

fx y

“local gradients”activations

gradients

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 64: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

64

“Local-ness” of Backpropagation

fx y

“local gradients”activations

gradients

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 65: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

x

65

Example: Sigmoid Block

sigmoid(x) = (x)

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 66: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Deep Learning = Concatenation of Differentiable Parameterized Layers (linear & nonlinearity functions)

x y

1st hidden layer

2nd hidden layer

Output layer

66

*W1

*W2

*W3

z1 z2 z3h1 h2

Want to find optimal weights W to minimize some loss function E!

Page 67: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Backprop Whiteboard Demo

x1

x2

1

Σ

Σ

ŷ

67

w1 z1w2

w3w4

b1

b2

z2

h1

h2

Σ

w5

w6

1b3

z1 = x1w1 + x2w3 + b1z2 = x1w2 + x2w4 + b2

h1 =exp(z1)

1 + exp(z1)exp(z2)

1 + exp(z2)h2 =

ŷ = h1w5 + h2w6 + b3

E = ( y - ŷ )2

f1

f2

f3

f4

w(t+1) = w(t) - E w(t)

E w = ??

Page 68: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity FunctionsNNs in Practice

68

Page 69: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x1

x2

x3

Σ

SummingFunction

SigmoidFunction

w1

w2

w3

+1

b1

z

x﹡w

Multiply by weights

69

Page 70: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x﹡w

70https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

Page 71: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Nonlinearity Functions (i.e. transfer or activation functions)

x﹡w

71https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

Page 72: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Nonlinearity Functions (aka transfer or activation functions)

x﹡w

72https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions

Name Plot Equation Derivative ( w.r.t x )

usually works best in practice

Page 73: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

Neurons

1-Layer Neural Network

Multi-layer Neural Network

Loss Functions

Backpropagation

Nonlinearity Functions

NNs in Practice73

Page 74: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

74

Neural Net Pipeline

x ŷ

W1w3

1. Initialize weights2. For each batch of input x samples S:

a. Run the network “Forward” on S to compute outputs and lossb. Run the network “Backward” using outputs and loss to compute gradientsc. Update weights using SGD (or a similar method)

3. Repeat step 2 until loss convergence

W2

E

Page 75: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

75

Non-Convexity of Neural Nets

In very high dimensions, there exists many local minimum which are about the same.

Pascanu, et. al. On the saddle point problem for non-convex optimization 2014

Page 76: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

76

Building Deep Neural Nets

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

fx

y

Page 77: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

77

Building Deep Neural Nets

“GoogLeNet” for Object Classification

Page 78: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

78

Block Example Implementation

http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Page 79: Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

79

Advantage of Neural Nets

As long as it’s fully differentiable, we can train the model to automatically learn features for us.