Neural Networks

Neural Networks

The Elements of Statistical Learning, Chapter 12

Presented by Nick Rizzolo

2

Modeling the Human Brain

Input builds up on receptors (dendrites)Cell has an input thresholdUpon breech of cell’s threshold, activation is fired down the axon.

3

“Magical” Secrets Revealed

Linear features are derived from inputsTarget concept(s) are non-linear functions of features

4

Outline

Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples

5

Projection Pursuit Regression

Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable

models of data

6

Projection Pursuit Regression

Output

Inputs

ridge functions

unit vectors&

7

PPR: Derived Features

Dot product is projection of ontoRidge function varies only in the direction

8

PPR: Training

Minimize squared errorConsider Given , we derive features and

smooth Given , we minimize over with

Newton’s Method Iterate those two steps to

convergence

9

PPR: Newton’s MethodUse derivatives to iteratively improve estimate

Least squares regression to hit the target

10

PPR: Implementation Details

Suggested smoothing methods Local regression Smoothing splines

‘s can be readjusted with backfitting ‘s usually not readjusted“( , ) pairs added in a forward stage-wise manner”

11

Outline


12

Neural Networks

13

NNs: Sigmoid and Softmax

Transforming activation to probability (?)Sigmoid:

Softmax:

Just like multilogit model

14

NNs: Training

We need an error function to minimize Regression: sum squared error Classification: cross-entropy

Generic approach: Gradient Descent (a.k.a. back propagation) Error functions are differentiable Forward pass to evaluate activations,

backward pass to update weights

15

NNs: Back Propagation

Back propagation equations:Update rules:

16

NNs: Back Propagation Details

Those were regression equations; classification equations are similarCan be batch or onlineOnline learning rates can be decreased during training, ensuring convergenceUsually want to start weights smallSometimes unpractically slow

17

Outline


18

Issues in Training: Overfitting

Problem: might reach the global minimum ofProposed solutions: Limit training by watching the

performance of a test set Weight decay: penalizing large

weights

19

A Closer Look at Weight Decay

Less complicated hypothesis has lower error rate

20

Outline


21

Example #1: Synthetic Data

More hidden nodes -> overfittingMultiple initial weight settings should be triedRadial function learned poorly

22

Example #1: Synthetic Data

2 parameters to tune: Weight decay Hidden units

Suggested training strategy: Fix either parameter

where model is least constrained

Cross validate other

23

Example #2: ZIP Code Data

Yann LeCunNNs can be structurally tailored to suit the dataWeight sharing: multiple units in a given layer will condition the same weights

24

Example #2: 5 Networks

Net 1: No hidden layerNet 2: One hidden layerNet 3: 2 hidden layers

Local connectivity

Net 4: 2 hidden layers

Local connectivity 1 layer weight sharing

Net 5: 2 hidden layers

Local connectivity 2 layer weight sharing

25

Example #2: Results

Net 5 does bestSmall number of features identifiable throughout image

26

Conclusions

Neural Networks are very general approach to both regression and classificationEffective learning tool when: Signal / noise is high Prediction is desired Formulating a description of a problem’s

solution is not desired Targets are naturally distinguished by

direction as opposed to distance

Neural Networks

Documents

Transcript of Neural Networks