38 1
Self Organization Map
Learning without Examples
22
Learning Objectives
1.1. IntroductionIntroduction2.2. Hamming Net and MAXNETHamming Net and MAXNET3.3. ClusteringClustering4.4. Feature MapFeature Map5.5. Self-organizing Feature MapSelf-organizing Feature Map6.6. ConclusionConclusion
33
Introduction
1. Learning without examples
2. Data are input to the system one by one
3. Mapping the data into one or more dimension space
4. Competitive learning
44
Hamming Distance(1/5)1.1. Define HD(Hamming Distance) Define HD(Hamming Distance)
between two binary codesbetween two binary codes AA and and BB of the same length as the number of of the same length as the number of place in which place in which aaii , and , and bbjj differ.differ.
2.2. Ex:Ex:
AA: [1 -1: [1 -1 1 1 11 11 1]1]
BB: [1 1: [1 1 -1-1 -1-1 11 1]1]
=> HD(=> HD(AA, , BB) = 3) = 3
55
Hamming Distance(2/5)
66
Hamming Distance(3/5)
77
Hamming Distance(4/5)3.3. Or, HD can be defined as the lowest Or, HD can be defined as the lowest
number of edges that must be number of edges that must be traversed between the two relevant traversed between the two relevant codes. codes.
4.4. If If AA and and BB are bipolar binary are bipolar binary components, then the scalar product components, then the scalar product of of AA , , BB : :
88
Hamming Distance(5/5)
) ,(2
) ,() ,(
BAHDn
BAHDBAHDnBAdifferentarebitsagreedarebits
t
) ,(2
) ,() ,(
BAHDn
BAHDBAHDnBAdifferentarebitsagreedarebits
t
99
Hamming Net and MAXNET(1/30)
1.1. For a two layer classifier of binary For a two layer classifier of binary bipolar vectors, and bipolar vectors, and pp classes , classes , pp output neurons, the strongest output neurons, the strongest response of a neuron indicates it response of a neuron indicates it has the minimum HD value and the has the minimum HD value and the category this neuron represents.category this neuron represents.
1010
Hamming Net and MAXNET(2/30)
1111
Hamming Net and MAXNET(3/30)
2.2. The MAXNET operates to suppress The MAXNET operates to suppress values at output instead of output values at output instead of output values of Hamming network. values of Hamming network.
3.3. For the Hamming net in next figure, For the Hamming net in next figure, we have input vector we have input vector XX
pp classes => classes => pp neurons for output neurons for output
output vector output vector YY = [y = [y11,……y,……ypp]]
1212
Hamming Net and MAXNET(4/30)
1313
Hamming Net and MAXNET(5/30)
4.4. for any output neuron , for any output neuron , mm, , m=1, ……pm=1, ……p, , we havewe have
WWmm = [w = [wm1m1, w, wm2m2,……w,……wmnmn]]tt and and
m=1,2,……pm=1,2,……p
to be the weights between input to be the weights between input XX and and each output neuron.each output neuron.
5.5. Also, assuming that for each class Also, assuming that for each class mm, , one has the prototype vector one has the prototype vector SS(m)(m) as the as the standard to be matched.standard to be matched.
1414
Hamming Net and MAXNET(6/30)
6.6. For classifying For classifying pp classes, one can say classes, one can say the the m’m’thth output is 1 if and only if output is 1 if and only if
XX= = SS(m)(m) => happens only => happens only WW(m)(m) = = SS(m)(m)
output for the classifier areoutput for the classifier are
XXttSS(1)(1), , XXttSS(2)(2),…,…XXttSS(m)(m),…,…XXttSS(p)(p)
So when So when XX= = SS(m)(m), the , the m’m’thth output is output is nn and and other outputs are smaller than other outputs are smaller than nn. .
1515
Hamming Net and MAXNET(7/30)
7.7. XXttSS(m)(m) = (n - HD( = (n - HD(XX , , SS(m)(m)) ) - HD() ) - HD(XX , , SS(m)(m)) )
∴∴½ ½ XXttSS(m)(m) = n/2 – HD( = n/2 – HD(XX , , SS(m)(m)))
So the weight matrix:So the weight matrix:
WWHH=½=½SS
)()(2
)(1
)2()2(2
)2(1
)1()1(2
)1(1
2
1
pn
pp
n
n
H
SSS
SSS
SSS
W
)()(2
)(1
)2()2(2
)2(1
)1()1(2
)1(1
2
1
pn
pp
n
n
H
SSS
SSS
SSS
W
1616
Hamming Net and MAXNET(8/30)
8.8. By giving a fixed bias By giving a fixed bias n/2n/2 to the input to the input
thenthen
netnetmm = ½ = ½XXttSS(m)(m) + n/2 for m=1,2, + n/2 for m=1,2,
……p……p
or or
netnetmm = n - HD( = n - HD(XX , , SS(m)(m)))
1717
Hamming Net and MAXNET(9/30)
9.9. To scale the input 0~n to 0~1 down, To scale the input 0~n to 0~1 down, one can apply transfer function as one can apply transfer function as
f(netf(netmm) = net) = netmm for m=1,2,…..p for m=1,2,…..p
1818
Hamming Net and MAXNET(10/30)
1919
Hamming Net and MAXNET(11/30)
10.10. So for the node with the the highest output So for the node with the the highest output means that the node has smallest HD means that the node has smallest HD between input and prototype vectorsbetween input and prototype vectors SS(1)(1)…………SS(m)(m) i.e.i.e.
f(netf(netmm) = 1) = 1
for other nodes for other nodes
f(netf(netmm) < 1) < 1
2020
Hamming Net and MAXNET(12/30)
11.11. MAXNET is employed as a second MAXNET is employed as a second layer only for the cases where an layer only for the cases where an enhancement of the initial dominant enhancement of the initial dominant response of response of m’m’thth node is required. node is required.
i.e.,i.e., the purpose of MAXNET is to let the purpose of MAXNET is to let max{ ymax{ y11,……y,……ypp } } equal to 1 and let equal to 1 and let
others equal to 0.others equal to 0.
2121
Hamming Net and MAXNET(13/30)
2222
Hamming Net and MAXNET(14/30)
12.12. To achieve this, one can let To achieve this, one can let
where where i=1,……p ; j=1,……p ; ji=1,……p ; j=1,……p ; j≠i≠i
∵∵yyii is bounded by is bounded by 00≦≦yyii≦≦11 for i=1,……p for i=1,……p
↗↗
output of Hamming Net.output of Hamming Net.and can only be 1 or 0.and can only be 1 or 0.
)(1 k
jj
ki
ki yyfy
)(1 k
jj
ki
ki yyfy
1kiy 1kiy
2323
Hamming Net and MAXNET(15/30)
13.13. SoSoε is bounded by 0<ε<1/ε is bounded by 0<ε<1/ppandand
εε: lateral interaction coefficient: lateral interaction coefficient)(
1
1
1
1
pp
MW
)(1
1
1
1
pp
MW
2424
Hamming Net and MAXNET(16/30)
And And k
Mk yWnet k
Mk yWnet
2525
Hamming Net and MAXNET(17/30)
14. So the transfer function 14. So the transfer function
0
0 0)(
netnet
netnetf
0
0 0)(
netnet
netnetf
2626
Hamming Net and MAXNET(18/30)
2727
Hamming Net and MAXNET(19/30)
Ex: To have a Hamming Net for Ex: To have a Hamming Net for classifying C , I , Tclassifying C , I , T
thenthen
SS(1)(1) = [ 1 1 1 1 -1 -1 1 1 1 ] = [ 1 1 1 1 -1 -1 1 1 1 ]tt
SS(2)(2) = [ -1 1 -1 -1 1 -1 -1 1 -1 ] = [ -1 1 -1 -1 1 -1 -1 1 -1 ]tt
SS(3)(3) = [ 1 1 1 -1 1 -1 -1 1 -1 ] = [ 1 1 1 -1 1 -1 -1 1 -1 ]tt
2828
Hamming Net and MAXNET(20/30)
2929
Hamming Net and MAXNET(21/30)
SoSo
111111111
111111111
111111111
HW
111111111
111111111
111111111
HW
22
1 nXWnet H
22
1 nXWnet H
3030
Hamming Net and MAXNET(22/30)
ForFor t
t
net
X
537
111111111
t
t
net
X
537
111111111
Y
netft
9
5
9
3
9
7
Y
netft
9
5
9
3
9
7 AndAnd
3131
Hamming Net and MAXNET(23/30)
Input to MAXNET and select Input to MAXNET and select =0.2 < =0.2 < 1/3(=1/p)1/3(=1/p)
SoSo
15
1
5
15
11
5
15
1
5
11
MW
15
1
5
15
11
5
15
1
5
11
MW
3232
Hamming Net and MAXNET(24/30)
AndAnd
kk
kM
k
netfY
YWnet
1 kk
kM
k
netfY
YWnet
1
3333
Hamming Net and MAXNET(25/30)
k=0k=0
333.0
067.0
599.0
333.0
067.0
599.0
555.0
333.0
777.0
12.02.0
2.012.0
2.02.01
01
0
netfY
o
net
333.0
067.0
599.0
333.0
067.0
599.0
555.0
333.0
777.0
12.02.0
2.012.0
2.02.01
01
0
netfY
o
net
3434
Hamming Net and MAXNET(27/30)
k=1k=1
t
t
Y
net
2
0
1
120.00520.0
120.0120.0520.0
t
t
Y
net
2
0
1
120.00520.0
120.0120.0520.0
3535
Hamming Net and MAXNET(28/30)
k=2k=2
t
t
Y
net
3
0
2
096.00480.0
096.014.0480.0
t
t
Y
net
3
0
2
096.00480.0
096.014.0480.0
3636
Hamming Net and MAXNET(29/30)
k=3k=3
t
t
Y
net
4
7
0
3
00461.0
10115.0461.0
t
t
Y
net
4
7
0
3
00461.0
10115.0461.0
3737
Hamming Net and MAXNET(30/30)
Summary:Summary:
Hamming network only tells which Hamming network only tells which class it will mostly like, not to class it will mostly like, not to restore distorted pattern. restore distorted pattern.
3838
Clustering—Unsupervised Learning(1/20)
1.1. IntroductionIntroductiona.a. to categorize or cluster datato categorize or cluster data b.b. ggrouping similar objects and rouping similar objects and
separating of dissimilar onesseparating of dissimilar ones
2.2. Assuming pattern set Assuming pattern set {{XX11 , , XX22 ,…… ,……XXNN} is } is submitted to determine the decision submitted to determine the decision function required to identify possible function required to identify possible clusters,clusters,=> by similarity rules=> by similarity rules
3939
Clustering—Unsupervised Learning(2/20)
cos
)()(
: DistanceEuclidean
XX
XXψ
XXXXXX
i
it
it
ii
cos
)()(
: DistanceEuclidean
XX
XXψ
XXXXXX
i
it
it
ii
4040
Clustering—Unsupervised Learning(3/20)
4141
Clustering—Unsupervised Learning(4/20)
3.3. Winner-Take-AllWinner-Take-All learning learning
Assuming input vectors are to be Assuming input vectors are to be classified into one of specific number of classified into one of specific number of pp categories according to the clusters categories according to the clusters detected in the training set detected in the training set {{XX11, , XX22 ,…… ,……
XXNN}.}.
4242
Clustering—Unsupervised Learning(5/20)
4343
Clustering—Unsupervised Learning(6/20)
Kohonen NetworkKohonen Network
YY=f(=f(WW XX))
for vector size = for vector size = nn
,p,for iwwW
W
W
W
W
inii
t
tp
t
t
1
and
1
2
1
,p,for iwwW
W
W
W
W
inii
t
tp
t
t
1
and
1
2
1
4444
Clustering—Unsupervised Learning(7/20)
prior to the learning, normalization of all prior to the learning, normalization of all weight vectors is requiredweight vectors is required
So So
i
ii W
WW
i
ii W
WW
4545
Clustering—Unsupervised Learning(8/20)
The meaning of training is to find from The meaning of training is to find from all such thatall such that
mW
mW
ipi
m WXWX,,1
min
ipi
m WXWX,,1
min
i.e., the m’th neuron with vector is the closest approximation of the current X.
4646
Clustering—Unsupervised Learning(9/20)
FromFrom
ipi
m WXWX,,1
min
ipi
m WXWX,,1
min
XWWX
and
XWXXWX
ti
pii
pi
tm
tm
,,1,,1
21
max min
12
XWWX
and
XWXXWX
ti
pii
pi
tm
tm
,,1,,1
21
max min
12
4747
Clustering—Unsupervised Learning(10/20)
SoSo
1, ,max
t t
m ii p
W X W X
1, ,max
t t
m ii p
W X W X
The The m’m’thth neuron is the winning neuron and neuron is the winning neuron and has the largest value of has the largest value of netnetii, , i=1,……pi=1,……p..
4848
Clustering—Unsupervised Learning(11/20)
Thus Thus WWmm should be adjusted such that should be adjusted such that
||||XX - - WWmm || is reduced. || is reduced.
Then , |Then , |||XX - - WWmm || can be reduced by the || can be reduced by the direction of gradient.direction of gradient.
▽▽WWmm || ||XX - - WWmm || ||22 = -2( = -2(XX - - WWmm))
Or,Or,
increase in increase in ((XX - - WWmm) direction.) direction.
4949
Clustering—Unsupervised Learning(12/20)
For any For any XX is just a single step, we want only is just a single step, we want only fraction of fraction of ((XX - - WWmm)), thus, thus
for neuron for neuron mm::
▽▽WWmm’ = ’ = α(α(XX - - WWmm)) 0.1< α <0.70.1< α <0.7
for other neurons:for other neurons:
▽▽WWii’ = 0’ = 0
5050
Clustering—Unsupervised Learning(13/20)
In general:In general:
miforWW
WXWW
k
i
k
i
k
mk
k
m
k
m
constant learning:α
neuron winning:m where)(
1
1
miforWW
WXWW
k
i
k
i
k
mk
k
m
k
m
constant learning:α
neuron winning:m where)(
1
1
5151
Clustering—Unsupervised Learning(15/20)
X
X
X
X
X
X
X
X
X
mW
mW
,p,for iXW ti 1 max
,p,for iXW ti 1 max
4.4. GeometricalGeometrical interpretationinterpretation
input vector (normalized) and input vector (normalized) and weight is winnerweight is winner
=>=>
5252
Clustering—Unsupervised Learning(16/20)
5353
Clustering—Unsupervised Learning(17/20)
So is created.
X
X
X
X
X
X
X
mWX
mWX
WXW
WWW
m
mmm
'
WXW
WWW
m
mmm
'And
5454
Clustering—Unsupervised Learning(18/20)
Thus the weight adjustment is mainly the rotation of the weight vector toward input vector without a significant length change.
And Wm’ is not a normal , so in the new training stage , Wm’ must be normalized again. is a vector point to the gravity of each cluster on a unity sphere.
X
X
X
X
X
X
X
iW
iW
5555
Clustering—Unsupervised Learning(19/20)
5. In the case of some patterns are known class,
then
X
X
X
X
X
X
X
mm WXW
mm WXW
α > 0 for correct nodeα < 0 otherwiseThis will accelerate the learning process significantly.
5656
Clustering—Unsupervised Learning(20/20)
6. Another modification is to adjust the weight for both winner and losers. leaky competitive learning
7. Recall Y= f(W X)8. Initialization of weights
Randomly choose weight from U(0,1)or convex combination
X
X
X
X
X
X
X
piforn
W ti ,,1 11
10 piforn
W ti ,,1 11
10
5757
Feature Map(1/6)1. Transform from high dimensional pattern
space to low-dimension feature space .2. Feature extraction:
two category: natural structureno natural structure
-- depend on pattern is similar to human perception or not
X
X
X
X
X
X
X
5858
Feature Map(2/6)
X
X
X
X
X
X
X
5959
Feature Map(3/6)
3. another important aspect is to represent the feature as natural as possible.
4. Self organizing neural array that can map features from pattern space.
X
X
X
X
X
X
X
6060
Feature Map(4/6)5. Use one dimensional array or
Two dimensional array6. Example:
X: input vectori: neuron i
wij:weight from input xj to neuron i
yi: output of neuron i
X
X
X
X
X
X
X
6161
Feature Map(5/6)
X
X
X
X
X
X
X
6262
Feature Map(6/6)
7. So define
X
X
X
X
X
X
X
ii
ii
WXWXSwhere
WXSfy
and between similarity themeans ),(
)),((
ii
ii
WXWXSwhere
WXSfy
and between similarity themeans ),(
)),((
8. Thus, (b) is the result of (a)
6363
Self-organizing Feature Map(1/6)1. One dimensional mapping
set of pattern Xi, i=1,2,….
use a linear array i1> i2> i3>…..
if XXXXXXX
,2,1),(max
,2,1),(max
2
1
2
1
iXyy
iXyy
ii
ii
,2,1),(max
,2,1),(max
2
1
2
1
iXyy
iXyy
ii
ii
6464
Self-organizing Feature Map(2/6)2. Thus, this learning is to find the
best matching neuron cells whch can activate their spatial neighbors to react to the same input.
3. Or to find Wc such that
X
X
X
X
X
X
X
ii
c WXWX min ii
c WXWX min
6565
Self-organizing Feature Map(3/6)4. In case of two winners, choose
lower i.
5. If c is the winning neuron then define Nc as the neighborhood around c and Nc is always changing as learning going.
6. Then
X
X
X
X
X
X
X
otherwise 0
odneighborho in the is if )( cijjij
Niwxw
otherwise 0
odneighborho in the is if )( cijjij
Niwxw
6666
Self-organizing Feature Map(4/6)7. :
where t: current training iterationT: total # of training steps to
be done.
8. starts from 0 and is decreased until it reaches value of 0
X
X
X
X
X
X
X
T
tt 10
T
tt 10
6767
Self-organizing Feature Map(5/6)9. If rectangular neighborhood is used
thenc-d < x < c+dc-d < y < c+d
and as learning continues
d is decreased from d0 to 1
X
X
X
X
X
X
X
T
tdd 10
T
tdd 10
6868
Self-organizing Feature Map-example
10. Example:a. Input patterns are chosen randomly from U[0,1]. b. The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1,1] × [−1, 1] c. The points thus describe a geometrically square topology.d. Thus, initial weights can be plotted as in next
figure.d.“ -” connection line connects two
adjacent neuron in competitive layer.
X
X
X
X
X
X
X
6969
Self-organizing Feature Map-example
X
X
X
X
X
X
X
7070
Self-organizing Feature Map-- Example
X
X
X
X
X
X
X
7171
Self-organizing Feature Map—Example
X
X
X
X
X
X
X
7272
Self-organizing Feature MapExample
X
X
X
X
X
X
X
Top Related