Neural Networks

26
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo

description

Neural Networks. The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo. Modeling the Human Brain. Input builds up on receptors (dendrites) Cell has an input threshold Upon breech of cell’s threshold, activation is fired down the axon. “Magical” Secrets Revealed. - PowerPoint PPT Presentation

Transcript of Neural Networks

Page 1: Neural Networks

Neural Networks

The Elements of Statistical Learning, Chapter 12

Presented by Nick Rizzolo

Page 2: Neural Networks

2

Modeling the Human Brain

Input builds up on receptors (dendrites)Cell has an input thresholdUpon breech of cell’s threshold, activation is fired down the axon.

Page 3: Neural Networks

3

“Magical” Secrets Revealed

Linear features are derived from inputsTarget concept(s) are non-linear functions of features

Page 4: Neural Networks

4

Outline

Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

Page 5: Neural Networks

5

Projection Pursuit Regression

Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable

models of data

Page 6: Neural Networks

6

Projection Pursuit Regression

Output

Inputs

ridge functions

unit vectors&

Page 7: Neural Networks

7

PPR: Derived Features

Dot product is projection of ontoRidge function varies only in the direction

Page 8: Neural Networks

8

PPR: Training

Minimize squared errorConsider Given , we derive features and

smooth Given , we minimize over with

Newton’s Method Iterate those two steps to

convergence

Page 9: Neural Networks

9

PPR: Newton’s MethodUse derivatives to iteratively improve estimate

Least squares regression to hit the target

Page 10: Neural Networks

10

PPR: Implementation Details

Suggested smoothing methods Local regression Smoothing splines

‘s can be readjusted with backfitting ‘s usually not readjusted“( , ) pairs added in a forward stage-wise manner”

Page 11: Neural Networks

11

Outline

Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

Page 12: Neural Networks

12

Neural Networks

Page 13: Neural Networks

13

NNs: Sigmoid and Softmax

Transforming activation to probability (?)Sigmoid:

Softmax:

Just like multilogit model

Page 14: Neural Networks

14

NNs: Training

We need an error function to minimize Regression: sum squared error Classification: cross-entropy

Generic approach: Gradient Descent (a.k.a. back propagation) Error functions are differentiable Forward pass to evaluate activations,

backward pass to update weights

Page 15: Neural Networks

15

NNs: Back Propagation

Back propagation equations:Update rules:

Page 16: Neural Networks

16

NNs: Back Propagation Details

Those were regression equations; classification equations are similarCan be batch or onlineOnline learning rates can be decreased during training, ensuring convergenceUsually want to start weights smallSometimes unpractically slow

Page 17: Neural Networks

17

Outline

Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

Page 18: Neural Networks

18

Issues in Training: Overfitting

Problem: might reach the global minimum ofProposed solutions: Limit training by watching the

performance of a test set Weight decay: penalizing large

weights

Page 19: Neural Networks

19

A Closer Look at Weight Decay

Less complicated hypothesis has lower error rate

Page 20: Neural Networks

20

Outline

Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

Page 21: Neural Networks

21

Example #1: Synthetic Data

More hidden nodes -> overfittingMultiple initial weight settings should be triedRadial function learned poorly

Page 22: Neural Networks

22

Example #1: Synthetic Data

2 parameters to tune: Weight decay Hidden units

Suggested training strategy: Fix either parameter

where model is least constrained

Cross validate other

Page 23: Neural Networks

23

Example #2: ZIP Code Data

Yann LeCunNNs can be structurally tailored to suit the dataWeight sharing: multiple units in a given layer will condition the same weights

Page 24: Neural Networks

24

Example #2: 5 Networks

Net 1: No hidden layerNet 2: One hidden layerNet 3: 2 hidden layers

Local connectivity

Net 4: 2 hidden layers

Local connectivity 1 layer weight sharing

Net 5: 2 hidden layers

Local connectivity 2 layer weight sharing

Page 25: Neural Networks

25

Example #2: Results

Net 5 does bestSmall number of features identifiable throughout image

Page 26: Neural Networks

26

Conclusions

Neural Networks are very general approach to both regression and classificationEffective learning tool when: Signal / noise is high Prediction is desired Formulating a description of a problem’s

solution is not desired Targets are naturally distinguished by

direction as opposed to distance