Radial Basis-Function Networks
description
Transcript of Radial Basis-Function Networks
![Page 1: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/1.jpg)
Radial Basis-Function Networks
![Page 2: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/2.jpg)
Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example
Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work
![Page 3: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/3.jpg)
Back-propagation
The algorithm gives a prescription for changing the weights wij in any feed-forward network to learn a training set of input output pairs {xd,td}
We consider a simple two-layer network
![Page 4: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/4.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
xk
x1 x2 x3 x4 x5
![Page 5: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/5.jpg)
Given the pattern xd the hidden unit j receives a net input
and produces the output
€
net jd = w jk
k=1
5
∑ xkd
€
V jd = f (net j
d ) = f ( w jk
k=1
5
∑ xkd )
![Page 6: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/6.jpg)
Output unit i thus receives
And produce the final output
€
netid = W ij
j=1
3
∑ V jd = (W ij ⋅
j=1
3
∑ f ( w jk
k=1
5
∑ xkd ))
€
oid = f (neti
d ) = f ( W ij
j=1
3
∑ V jd ) = f ( (W ij ⋅
j=1
3
∑ f ( w jk
k=1
5
∑ xkd )))
![Page 7: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/7.jpg)
In our example E becomes
E[w] is differentiable given f is differentiable Gradient descent can be applied
€
E[r w ] =
1
2(ti
d
i=1
2
∑d =1
m
∑ − oid )2
€
E[r w ] =
1
2(ti
d
i=1
2
∑d =1
m
∑ − f ( W ij
j
3
∑ ⋅ f ( w jk xkd
k=1
5
∑ )))2
![Page 8: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/8.jpg)
Consider a network with M layers m=1,2,..,M
Vmi from the output of the ith unit of the
mth layer V0
i is a synonym for xi of the ith input Subscript m layers m’s layers, not
patterns Wm
ij mean connection from Vjm-1 to Vi
m
![Page 9: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/9.jpg)
Stochastic Back-Propagation Algorithm (mostly used)
1. Initialize the weights to small random values
2. Choose a pattern xdk and apply is to the input layer V0
k= xdk for all k
3. Propagate the signal through the network
4. Compute the deltas for the output layer
5. Compute the deltas for the preceding layer for m=M,M-1,..2
6. Update all connections
7. Goto 2 and repeat for the next pattern
€
Vim = f (neti
m ) = f ( wijm
j
∑ V jm−1)
€
δiM = f '(neti
M )(tid −Vi
M )
€
δim−1 = f '(neti
m−1) w jim
j
∑ δ jm
€
Δwijm = ηδ i
mV jm−1
€
wijnew = wij
old + Δwij
![Page 10: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/10.jpg)
Examplew1={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
w2={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
w3={w11=0.1,w12=0.1,w13=0.1,w14=0.1,w15=0.1}
W1={w11=0.1,w12=0.1,w13=0.1}
W2={w11=0.1,w12=0.1,w13=0.1}
X1={1,1,0,0,0}; t1={1,0}
X2={0,0,0,1,1}; t1={0,1}
€
f (x) = σ (x) =1
1+ e(−x )
€
f '(x) = σ ' (x) = σ (x) ⋅(1−σ (x))
![Page 11: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/11.jpg)
€
net11 = w1k
k=1
5
∑ xk1 V1
1 = f (net11) =
1
1+ e−net11
€
net21 = w2k
k=1
5
∑ xk1 V2
1 = f (net11) =
1
1+ e−net21
€
net31 = w3k
k=1
5
∑ xk1 V3
1 = f (net31 ) =
1
1+ e−net31
net11=1*0.1+1*0.1+0*0.1+0*0.1+0*0.1
V11=f(net1
1 )=1/(1+exp(-0.2))=0.54983
V12=f(net1
2 )=1/(1+exp(-0.2))=0.54983
V13=f(net1
3 )=1/(1+exp(-0.2))=0.54983
![Page 12: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/12.jpg)
€
net11 = W1 j
j=1
3
∑ V j1 o1
1 = f (net11) =
1
1+ e−net11
€
net21 = W2 j
j=1
3
∑ V j1 o2
1 = f (net21 ) =
1
1+ e−net21
net11=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495
o11= f(net11)=1/(1+exp(- 0.16495))= 0.54114
net12=0.54983*0.1+ 0.54983*0.1+ 0.54983*0.1= 0.16495
o12= f(net11)=1/(1+exp(- 0.16495))= 0.54114
![Page 13: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/13.jpg)
We will use stochastic gradient descent with =1
€
ΔW ij = η (tid − oi
d ) f '
d =1
m
∑ (netid ) ⋅V j
d
€
ΔW ij = (ti − oi) f '(neti)V j
€
f '(x) = σ ' (x) = σ (x) ⋅(1−σ (x))
€
ΔW ij = (ti − oi)σ (neti)(1−σ (neti))V j
€
δi = (ti − oi)σ (neti)(1−σ (neti))
ΔW ij = δiV j
![Page 14: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/14.jpg)
δ1=(1- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= 0.11394
δ2=(0- 0.54114)*(1/(1+exp(- 0.16495)))*(1-(1/(1+exp(- 0.16495))))= -0.13437
€
δ1 = (t1 − o1)σ (net1)(1−σ (net1))
ΔW1 j = δ1V j
€
δ2 = (t2 − o2)σ (net2)(1−σ (net2))
ΔW2 j = δ2V j
![Page 15: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/15.jpg)
€
Δw jk = δi
1
2
∑ ⋅W ij f '(net j ) ⋅ xk
€
Δw jk = δi
1
2
∑ ⋅W ijσ (net j )(1−σ (net j )) ⋅ xk
€
δ j = σ (net j )(1−σ (net j )) W ij
i=1
2
∑ δ i
€
Δw jk = δ j ⋅ xk
![Page 16: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/16.jpg)
δ1= 1/(1+exp(- 0.2))*(1- 1/(1+exp(- 0.2)))*(0.1* 0.11394+0.1*( -0.13437))
δ1= -5.0568e-04
δ2= -5.0568e-04
δ3= -5.0568e-04
€
δ1 = σ (net1)(1−σ (net1)) W i1
i=1
2
∑ δ i
€
δ2 = σ (net2)(1−σ (net2)) W i2
i=1
2
∑ δ i
€
δ3 = σ (net3)(1−σ (net3)) W i3
i=1
2
∑ δ i
![Page 17: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/17.jpg)
First Adaptation for x1
(one epoch, adaptation over all training patterns, in our case x1 x2)
δ1= -5.0568e-04 δ1= 0.11394
δ2= -5.0568e-04 δ2= -0.13437
δ3= -5.0568e-04
x1 =1 v1 =0.54983
x2 =1 v2 =0.54983
x3 =0 v3=0.54983
x4 =0
x5 =0
€
ΔW ij = δiV j
€
Δw jk = δ j ⋅ xk
![Page 18: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/18.jpg)
Radial Basis-Function Networks RBF networks train rapidly No local minima problems No oscillation Universal approximators
Can approximate any continuous function Share this property with feed forward networks with
hidden layer of nonlinear neurons (units) Disadvantage
After training they are generally slower to use
![Page 19: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/19.jpg)
![Page 20: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/20.jpg)
Gaussian response function
Each hidden layer unit computes
x = an input vector u = weight vector of hidden layer neuron i
€
hi = e−Di
2
2σ 2
€
Di2 = (
r x −
r u i)
T (r x −
r u i)
![Page 21: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/21.jpg)
The output neuron produces the linear weighted sum
The weights have to be adopted (LMS)
€
Δwi = η (t − o)x i€
o = wi ⋅hi
i= 0
n
∑
![Page 22: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/22.jpg)
The operation of the hidden layer
One dimensional input
€
h = e−(x−u)2
2σ 2
![Page 23: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/23.jpg)
Two dimensional input
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 24: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/24.jpg)
Every hidden neuron has a receptive field defined by the basis-function x=u, maximum output Output for other values drops as x deviates from u Output has a significant response to the input x only
over a range of values of x called receptive field The size of the receptive field is defined by u may be called mean and standard deviation The function is radially symmetric around the
mean u
![Page 25: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/25.jpg)
Location of centers u
The location of the receptive field is critical
Apply clustering to the training set each determined cluster center would
correspond to a center u of a receptive field of a hidden neuron
![Page 26: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/26.jpg)
Determining The object is to cover the input space with
receptive fields as uniformly as possible If the spacing between centers is not uniform, it
may be necessary for each hidden layer neuron to have its own
For hidden layer neurons whose centers are widely separated from others, must be large enough to cover the gap
![Page 27: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/27.jpg)
Following heuristic will perform well in practice For each hidden layer neuron, find the RMS
distance between ui and the center of its N nearest neighbors cj
Assign this value to i€
RMS =1
n⋅ uk −
c lk
l=1
N
∑N
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
2
i= k
n
∑
![Page 28: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/28.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 29: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/29.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 30: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/30.jpg)
Why does a RBF network work?
The hidden layer applies a nonlinear transformation from the input space to the hidden space
In the hidden space a linear discrimination can be performed
( )
( )
( )( )( )
( )
( )( )
( )
( )
( )
( )( )
( )
( )( )
![Page 31: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/31.jpg)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
![Page 32: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/32.jpg)
Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example
Radial Basis-Function Networks Gaussian response function Location of center u Determining sigma Why does RBF network work
![Page 33: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/33.jpg)
Bibliography
Wasserman, P. D., Advanced Methods in Neural Computing, New York: Van Nostrand Reinhold, 1993
Simon Haykin, Neural Networks, Secend edition Prentice Hall, 1999
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (Unkomprimiert)“ benötigt.
![Page 34: Radial Basis-Function Networks](https://reader036.fdocuments.us/reader036/viewer/2022062500/5681584e550346895dc5a90b/html5/thumbnails/34.jpg)
Support Vector Machines