Robert J. Marks II
Robert J. Marks IIUniversity of WashingtonDepartment of Electrical EngineeringCIA Laboratory, Box 352500Seattle, Washington [email protected]
Artificial Neural Networks: Supervised Models
Robert J. Marks II
Supervised Learning
Given: Input (Stimulus)/Output (Response)
Data Object:
Train a machine to simulate the input/output relationship
Types Classification (Discrete Outputs) Regression (Continuous Outputs)
Robert J. Marks II
Training a Classifier
> classifier < Marks
> classifier < not Marks
> classifier < not Marks
> classifier < not Marks
> classifier < Marks
> classifier < not Marks
Robert J. Marks II
Recall from a Trained Classifier
> Classifier > Marks
Note: The test image does not appear in the training data.
Learning = Memorization
Robert J. Marks II
Classifier In Feature Space, After Training
representation
concept(truth)
= training data= Marks= not Marks
= test data (Marks)
Robert J. Marks II
Supervised Regression (Interpolation)
Output data is continuous rather than discrete
Example - Load Forecasting Training (from historical data):
Input: temperatures, current load, day of week, holiday(?), etc.
Output: next day’s load Test
Input: forecasted temperatures, current load, day of week, holiday(?), etc.
Output: tomorrow’s load forecast
Robert J. Marks II
Properties of Good Classifiersand Regression Machines
Good accuracy outside of training set Explanation Facility
Generate rules after training
Fast training Fast testing
Robert J. Marks II
Some Classifiers and Regression Machines
Classification & Autoregression Trees (CART)
Nearest Neighbor Look-Up Neural Networks
Layered Perceptron (or MLP’s) Recurrent Perceptrons Cascade Correlation Neural Networks Radial Basis Function Neural Networks
Robert J. Marks II
A Model of an Artificial Neuron
w4
w3
w2
w5
w1
s1
s5
s4
s3
s2
s = state = (sum)(.) = squashing function
sum s
nsum = wn sn
(sum)
Robert J. Marks II
Squashing Functions
sum
(sum)1
sigmoid: (x) = __________11 + e - x
Robert J. Marks II
A Layered Perceptron
interconnects
neurons
hiddenlayer
output
input
Robert J. Marks II
Training
Given Training Data, input vector set :
{ i n | 1 < n < N } corresponding output (target) vector set:
{ t n | 1 < n < N }
Find the weights of the interconnects using training data to minimize error in the test data
Robert J. Marks II
Error
Input, target & response input vector set : { i n | 1 < n < N }
target vector set: { t n | 1 < n < N }
on = neural network output when the input is i n . (Note: on = t n )
Error
on - tn n
12
Robert J. Marks II
Error Minimization Techniques The error is a function of the
fixed training and test data neural network weights
Find weights that minimize error (Standard Optimization) conjugate gradient descent random search genetic algorithms steepest descent (error backpropagation)
Robert J. Marks II
Minimizing Error Using Steepest Descent
The main idea:Find the way downhill and take a step:
E
x
minimum
downhill = - _____d Ed x
= step size
x x -d Ed x
Robert J. Marks II
Example of Steepest Descent
E(x) = _ x 2 ; minimum at x = 0
- ___ = - xx x -x
Solution to difference equation
xp xp-1
is x p p x0.
for || < 1, x1
d Ed x
12
d Ed x
Robert J. Marks II
Training the Perceptron
on- tn
wnkik -tn
im wmkik - tm
ij om- tm
12
12
2
n = 1
n=1 k=1
k=1
2 4
4 ddwm j
o1 o2
i 1 i 2 i 3 i 4
w11 w24
Robert J. Marks II
Weight Update
ij om- tm
for m = 4 and j = 2
w24 w24 - i4 o2- t2
o1 o2
i 1 i 2 i 3 i 4
w11 w24
ddwm j
Robert J. Marks II
No Hidden Alters = Linear Separation
o = (wn in )For classifier, threshold: If o > ___ , announce class #1 If o < ___ , announce class #2
Classification boundary:
o = ___ , or wn in = 0. This is the equation of a plane!
12
12
12
n
n
o
w1 w3
w2
i 1 i 2 i 3
Robert J. Marks II
wn in = 0 = line through origin.n
i 2
Classification Boundary
i 1
Robert J. Marks II
Adding Bias Term
o
w1
w2 w3
w4
i 1 i 2 i 3 1
Classification boundary still a line,but need not go through origin.
i 2
i 1
Robert J. Marks II
The Minsky-Papert Objection
i 2
i 1
1
1
The simple operationof the exclusive or
(XOR)cannot be resolved
usinga linear perceptron
withbias.
More importantproblems can
probablythus not be resolved
with a linear perceptron with bias.
?
Robert J. Marks II
The Layered Perceptron
interconnect:
weights =
wjk(l)
neurons:states = sj(l )
hiddenlayer: l
output: l = L
input: l = 0
Robert J. Marks II
Error Backpropagation
dEdwjk(l )
______ = _____ ________ ________
Problem: For an arbitrary weight, wjk(l) , update
wjk(l ) wjk(l ) - ______
A Solution: Error Backpropagation Chain rule for partial fractions
dd d sj(l ) dsumj(l )
dwm j d sj(l ) dsumj(l ) dwm j
Robert J. Marks II
Each Partial is Evaluated(Beautiful Math!!!)
dsj(l ) d 1 dsumj(l ) dsumj(l ) 1 + exp[ - sumj(l ) ]
= sj(l ) [ 1 - sj(l ) ]
dsumj(l )
dwm j
dE dsj(l )
________ = _______ _________________
________ = s (l -1)
= j (l ) = n(l +1) sn(l +1) [ 1 - s n (l +1) ] wnj (l ) n
m
Robert J. Marks II
Weight Update
dEdwjk(l )
______ = _____ ________ ________
wjk(l ) wjk(l ) - ______
dd dsj(l ) dsumj(l )
dwm j d sj(l ) dsumj(l ) dwm j
= j(l ) sj(l +1) [ 1 - s j (l +1) ] sk (l -1)
Robert J. Marks II
Step #1: Input Data & Feedforward
s1(2) = o1 s2(2) = o2
s1(1) s2(1) s3(1)
i1 i2 = s2(0)
The states of all of the
neurons are
determined by the states of
the neurons below
them and the
interconnect weights.
Robert J. Marks II
Step #2: Evaluate output error, backpropagate
to find ’s for each neuron
o1 , t1 o2 , t2
1(2) 2(2)
s1(1) s2(1) s3(1)
1(1) 2(1) 3(1)
i1 i2 = s2(0)
1(0) 2(0)
Each neuron now keeps track of two
numbers. The ’s for each neuron are determined by “back-
propagating” the output
error towards the input.
Robert J. Marks II
Step #3: Update Weights
o1 , t1 o2 , t2
1(2) 2(2)
s1(1) s2(1) s3(1)
1(1) 2(1) 3(1)
i1 i2 = s2(0)
1(0) 2(0)
w32(1) w32(1)
-3(1) s3(1)
[ 1 - s3(1) ] s2 (0)
Weight updates are
performed within the neural network
architecture
Robert J. Marks II
Neural Smithing
Bias Momentum Batch Training Learning Versus Memorization Cross Validation The Curse of Dimensionality Variations
Robert J. Marks II
Bias
Bias is used with MLP At input Hidden layers (sometimes)
Robert J. Marks II
Momentum Steepest descent wjk(l) wjk(l ) +wjk(l ) With Momentum,
wjk (l ) = wjk(l ) +wjk (l ) + wjk(l ) New step effected by previous step m is the iteration number Convergence is improved
m mm+1 m+1
Robert J. Marks II
Back Propagation Batch Training
Accumulate error from all training data prior to weight update True steepest descent Update weights each epoch
Training Layered Perceptron One Data pair at a time Randomize data to avoid structure The Widrow-Hoff Algorithm
Robert J. Marks II
Learning versus Memorization: Both have zero training error
goodgeneralization
(learning)
concept(truth)
badgeneralization(memorization)
= training data= test data
Robert J. Marks II
Alternate View:
concept
learning
memorization(over fitting)
Robert J. Marks II
Learning versus Memorization (cont.) Successful Learning:
Recognizing data outside the training set, e.g. data in the test set.
i.e. the neural network must successfully classify (interpolate) inputs it has not seen before.
How can we assure learning? Cross Validation Choosing neural network structure
Pruning Genetic Algorithms
Robert J. Marks II
Cross Validation
iterations (m)
testerror
trainingerror
minimum
Robert J. Marks II
The Curse of Dimensionality
For many problems, the required number of trainingdata increases to the power of the input’s dimension.
Example:
• For N=2 inputs, suppose that 100 = 102 training data pairs
• For N=3 inputs, 103 = 1000 training data pairs are needed
• In general, 10N training data pairs are neededfor many important problems.
Robert J. Marks II
Example: Classifying a circle in a square
i1
i2
neural net
o
i1 i2
100 = 102 pointsare shown.
Robert J. Marks II
Example: Classifying a sphere in a cubeN=3
neural net
o
i1 i2 i3
i3
i2
i1
10 layers each with 102
points= 103 points= 10N points
Robert J. Marks II
Variations Architecture variation for MLP’s
Recurrent Neural Networks Radial Basis Functions Cascade Correlation Fuzzy MLP’s
Training Algorithms
Robert J. Marks II
Applications
Power Engineering Finance Bioengineering Control Industrial Applications Politics
Robert J. Marks II
Political ApplicationsPolitical ApplicationsRobert Novak syndicated column
Washington, February 18, 1996
UNDECIDED BOWLERS
“President Clinton’s pollsters have identified the voters who will determine whether he will be elected to a second term: two-parent families whose members bowl for recreation.”
“Using a technique they call the ‘neural network,’ Clinton advisors contend that these family bowlers are the quintessential undecided voters. Therefore, these are the people who must be targeted by the president.”
Robert J. Marks II
“A footnote: Two decades ago, Illinois Democratic Gov. Dan Walker campaigned
heavily in bowling alleys in the belief he would find swing voters there. Walker had national
political ambitions but ended up in federal prison.”
Robert Novak syndicated columnWashington, February 18, 1996
(continued)
Robert J. Marks II
FiniFiniss
Top Related