Neural Networks
description
Transcript of Neural Networks
Neural Networks
The Elements of Statistical Learning, Chapter 12
Presented by Nick Rizzolo
2
Modeling the Human Brain
Input builds up on receptors (dendrites)Cell has an input thresholdUpon breech of cell’s threshold, activation is fired down the axon.
3
“Magical” Secrets Revealed
Linear features are derived from inputsTarget concept(s) are non-linear functions of features
4
Outline
Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples
5
Projection Pursuit Regression
Generalization of 2-layer regression NN Universal approximator Good for prediction Not good for deriving interpretable
models of data
6
Projection Pursuit Regression
Output
Inputs
ridge functions
unit vectors&
7
PPR: Derived Features
Dot product is projection of ontoRidge function varies only in the direction
8
PPR: Training
Minimize squared errorConsider Given , we derive features and
smooth Given , we minimize over with
Newton’s Method Iterate those two steps to
convergence
9
PPR: Newton’s MethodUse derivatives to iteratively improve estimate
Least squares regression to hit the target
10
PPR: Implementation Details
Suggested smoothing methods Local regression Smoothing splines
‘s can be readjusted with backfitting ‘s usually not readjusted“( , ) pairs added in a forward stage-wise manner”
11
Outline
Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples
12
Neural Networks
13
NNs: Sigmoid and Softmax
Transforming activation to probability (?)Sigmoid:
Softmax:
Just like multilogit model
14
NNs: Training
We need an error function to minimize Regression: sum squared error Classification: cross-entropy
Generic approach: Gradient Descent (a.k.a. back propagation) Error functions are differentiable Forward pass to evaluate activations,
backward pass to update weights
15
NNs: Back Propagation
Back propagation equations:Update rules:
16
NNs: Back Propagation Details
Those were regression equations; classification equations are similarCan be batch or onlineOnline learning rates can be decreased during training, ensuring convergenceUsually want to start weights smallSometimes unpractically slow
17
Outline
Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples
18
Issues in Training: Overfitting
Problem: might reach the global minimum ofProposed solutions: Limit training by watching the
performance of a test set Weight decay: penalizing large
weights
19
A Closer Look at Weight Decay
Less complicated hypothesis has lower error rate
20
Outline
Projection Pursuit Regression Neural Networks proper Fitting Neural Networks Issues in Training Examples
21
Example #1: Synthetic Data
More hidden nodes -> overfittingMultiple initial weight settings should be triedRadial function learned poorly
22
Example #1: Synthetic Data
2 parameters to tune: Weight decay Hidden units
Suggested training strategy: Fix either parameter
where model is least constrained
Cross validate other
23
Example #2: ZIP Code Data
Yann LeCunNNs can be structurally tailored to suit the dataWeight sharing: multiple units in a given layer will condition the same weights
24
Example #2: 5 Networks
Net 1: No hidden layerNet 2: One hidden layerNet 3: 2 hidden layers
Local connectivity
Net 4: 2 hidden layers
Local connectivity 1 layer weight sharing
Net 5: 2 hidden layers
Local connectivity 2 layer weight sharing
25
Example #2: Results
Net 5 does bestSmall number of features identifiable throughout image
26
Conclusions
Neural Networks are very general approach to both regression and classificationEffective learning tool when: Signal / noise is high Prediction is desired Formulating a description of a problem’s
solution is not desired Targets are naturally distinguished by
direction as opposed to distance