Warm-up example (1)
description
Transcript of Warm-up example (1)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 1
Warm-up example (1)
• How many hidden layers would you use?
• How many hidden units per layer?
• How many connections would your net have?
• How would you select the initial weights of the connections?
• When would you stop the iterations of the error back propagation algorithm?
Having the well-known XOR problem and a NN for its approximation, answer the following questions:
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 2
Warm-up example (2)
• If the updated weights after an iteration of the error back propagation procedure are almost identical to the weights before that iteration but the output is not the desired one?
• If the number of iterations exceeds a pre-defined threshold?
• If the output error seems to be increasing instead of decreasing?
What would you do if the trained neural net does not generate the desired outputs and behaves as follows:
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 3
Item-by-item learning (sequential)
for epoch = 1:num_epochs
for t = 1:numSamples
% forward pass% backward
pass
endend
for epoch = 1:num_epochs
% shuffle training data
for t = 1:numSamples
% forward pass% backward pass
endend
perm = randperm( numSamples );
x = x( perm );
d = d( perm );
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 4
Batch learningbs = # % batch size for epoch = 1:num_epochs for s = 1:bs:numSamples % zero in_batch sums here for b = 1:bs t = s + b - 1 % forward pass % backward pass
% update in_batch sums based on BP (deltas) end
% update weights and biases here Wi = Wi - LR * (sumWi / bs); % etc. end
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 5
Generalization
• Overfitting, network pruning
(c) The MathWorks (Matlab help)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 6
Strategies
• Regularization° 1) "trainbr"
» The Bias/Variance Dilemma
° 2) Specific adjustment of weights» many techniques suggested, e.g.
net.performFcn=’msreg’ + corresponding parameters
MSE_REG = A * MSE + (1-A) * MSW
MSW = 1/N [SUM (W^2)]
» Decreases weights and biases.
• Early stopping° 3 sets (training, validation, testing; 40:30:30)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 7
Early stopping
• After some training, calculate the validation error° synaptic weights fixed
• Continue either with training or testing
Number of epoch
MSEValidationsample
Trainingsample
Early stopping point
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 8
Bayesian regularization
(c) The MathWorks (Matlab help)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 9
Early stopping
(c) The MathWorks (Matlab help)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 10
Matlab example 1/4The goal is to can determine serum cholesterol levels from measurements of spectral content of a blood sample. There are 264 patients for which we have measurements of 21 wavelengths of the spectrum. For the same patients we also have measurements of hdl, ldl, and vldl cholesterol levels, based on serum separation.
load choles_all[pn,meanp,stdp,tn,meant,stdt] = prestd(p,t);
[ptrans,transMat] = prepca(pn,0.001);[R,Q] = size(ptrans) [R = 4, Q = 264]
iitst = 2:4:Q;iival = 4:4:Q;iitr = [1:4:Q 3:4:Q];val.P = ptrans(:,iival); val.T = tn(:,iival);test.P = ptrans(:,iitst); test.T = tn(:,iitst);ptr = ptrans(:,iitr); ttr = tn(:,iitr);
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 11
Matlab example 2/4net = newff(minmax(ptr),[5 3],{'tansig' 'purelin'},'trainlm');[net,tr]=train(net,ptr,ttr,[],[],val,test);TRAINLM, Epoch 0/100, MSE 3.11023/0, Gradient 804.959/1e-10TRAINLM, Epoch 15/100, MSE 0.330295/0, Gradient 104.219/1e-10TRAINLM, Validation stop.
plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)legend('Training','Validation','Test',-1);ylabel('Squared Error'); xlabel('Epoch')
an = sim(net,ptrans);a = poststd(an,meant,stdt);for i=1:3 figure(i) [m(i),b(i),r(i)] = postreg(a(i,:),t(i,:));end
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 12
Matlab example 3/4
(c) The MathWorks (Matlab help)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 13
Matlab example 4/4
hdl, R=0.886 ldl, R=0.862
vldl, R=0.563 (c) The MathWorks (Matlab help)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 14
Cover’s separability theorem
• A pattern classification cast in high dimensional space nonlinearly is more likely to be linearly separable than in a low dimension space
X
OO
X
X
XX
X
X O
O
X
X=(x1 , x2 )
OO
O
2) :functions basis of (#
0
],[)(
02211
21
axaxa
xxX
4) :functions basis of (#
0
],,,[)(
0224
2132211
2122
21
axaxaxaxa
xxxxX
5) :functions basis of #(
0
],,,,[)(
0215224
2132211
212122
21
axxaxaxaxaxa
xxxxxxX
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 15
Radial Basis Function (RBF) networks
Gaussian basis function, s=0.5, 1.0, 1.5
radbas(n) = exp(-n^2)
Architecture:
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 16
Structure of RBF Networks
• Input layer
• Hidden layer ° Hidden units provide a set of basis function
° The higher dimension, the more linearly separable (meaning with the linear combination of basis functions)
• Output layer° Linear combination of hidden functions
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 17
XOR example
x1 x2 y
0 00 11 01 1
(x) 2(x) y'
0.13 10.36 0.360.36 0.36
1 0.13
x1
x2
(x)
(x)
?
(c) The MathWorks (Matlab help)
This makes the trick
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 19
RBF, well-estimated
RBF in Matlabnet = newrbe(P,T,SPREAD)net = newrb(P,T,GOAL,SPREAD)
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 20
RBF, too few BF
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 21
RBF, too small stdev
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 22
RBF, too large stdev
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 23
NN taxonomy 1/2
1) Paradigm° Supervised
° Unsupervised
2) Learning Rule° Error-correction
° Memory-based
° Hebbian
° Competitive
° Boltzman According to:
Jain,A.K. and Mao,J. (1996). Artificial Neural Networks: A Tutorial, IEEE Computer, vol.29, N: 3, pp.31-44.
Jan Voracek 2001,3 010597001 Soft Computing, Lecture 7 Slide 24
NN taxonomy 2/2
3) Learning Algorithm° Perceptron
° BP
° Kohonen SOM, ...
4) Network Architecture° FF
° REC
5) Task° Pattern classification
° Time-series modeling, ....