The Dynamics of Learning Vector Quantization RUG 10012005
The Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector QuantizationThe Dynamics of Learning Vector Quantization
Rijksuniversiteit Groningen
Mathematics and Computing Science
Michael Biehl Anarta Ghosh
TU Clausthal-Zellerfeld
Institute of Computing Science
Barbara Hammer
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
IntroductionIntroductionIntroductionIntroduction
The dynamics of learningThe dynamics of learningThe dynamics of learningThe dynamics of learning
a model situation randomized data
learning algorithms for VQ und LVQ
analysis and comparison dynamics success of learning
SummarySummarySummarySummary
OutlookOutlookOutlookOutlook
prototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning from example datarepresentation classificationclassificationclassificationclassification
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors
example
identification and grouping
in clusters clusters clusters clusters of similar data
assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototypeie the so-called winner winner winner winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to
the cost function cost function cost function cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization errorquantization errorquantization errorquantization error
( ) ( )microj
microk
K
jk
P
1micro
jmicro
K
1j
VQ ddΘ2
wξH minusminus= prodsumsumne==
microjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors
- the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ)
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
IntroductionIntroductionIntroductionIntroduction
The dynamics of learningThe dynamics of learningThe dynamics of learningThe dynamics of learning
a model situation randomized data
learning algorithms for VQ und LVQ
analysis and comparison dynamics success of learning
SummarySummarySummarySummary
OutlookOutlookOutlookOutlook
prototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning from example datarepresentation classificationclassificationclassificationclassification
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors
example
identification and grouping
in clusters clusters clusters clusters of similar data
assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototypeie the so-called winner winner winner winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to
the cost function cost function cost function cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization errorquantization errorquantization errorquantization error
( ) ( )microj
microk
K
jk
P
1micro
jmicro
K
1j
VQ ddΘ2
wξH minusminus= prodsumsumne==
microjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors
- the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ) Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectorsprototype vectorsprototype vectorsprototype vectors
example
identification and grouping
in clusters clusters clusters clusters of similar data
assignment of feature vector ξξξξto the closest closest closest closest prototypeprototypeprototypeprototype wwww
(similarity or distance measure
eg Euclidean distance )
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototypeie the so-called winner winner winner winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to
the cost function cost function cost function cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization errorquantization errorquantization errorquantization error
( ) ( )microj
microk
K
jk
P
1micro
jmicro
K
1j
VQ ddΘ2
wξH minusminus= prodsumsumne==
microjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors
- the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
unsupervised competitive learningunsupervised competitive learningunsupervised competitive learningunsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototypeie the so-called winner winner winner winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) onononon----line gradient descent line gradient descent line gradient descent line gradient descent with respect to
the cost function cost function cost function cost function
The Dynamics of Learning Vector Quantization RUG 10012005
quantization errorquantization errorquantization errorquantization error
( ) ( )microj
microk
K
jk
P
1micro
jmicro
K
1j
VQ ddΘ2
wξH minusminus= prodsumsumne==
microjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors
- the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
quantization errorquantization errorquantization errorquantization error
( ) ( )microj
microk
K
jk
P
1micro
jmicro
K
1j
VQ ddΘ2
wξH minusminus= prodsumsumne==
microjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors
- the distance measure metric used
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)Learning Vector Quantization (LVQ)
aim
classification classification classification classification of data
learning from examples
LearningLearningLearningLearning choice of prototypes according to example data
example situtation
3 classes3 classes3 classes3 classes
classification
assignment of a vector ξξξξto the class of the closest
prototype w w w w
3 prototypes 3 prototypes 3 prototypes 3 prototypes
aim generalization abilitygeneralization abilitygeneralization abilitygeneralization ability ie correct classification
of novel data after training
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
prominent example [Kohonen] ldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquoldquo LVQ 21 rdquo
bull present a single example
bull initialize prototype vectors(for different classes)
bull identify the closest correctand the closest wrong prototype
bull move the corresponding winnertowards away from the example
known convergence stability problems
eg for infrequent classes
mostly heuristicallyheuristicallyheuristicallyheuristically motivated variations of competitive learningcompetitive learningcompetitive learningcompetitive learning
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are frequently applied in a variety of problems involving
the classification of structured data a few examples
- appear plausible intuitive flexible
- are fast easy to implement
- real time speech recognition
- medical diagnosis eg from histological data
- texture recognition and classification
- gene expression data analysis
-
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
healthy cells damaged cells
prototypes obtained by LVQ (1)
illustrationillustrationillustrationillustration microscopic images of (pig) semen cells after freezingand storage co Lidia Sanchez-Gonzalez LeonSpain
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
LVQ algorithms LVQ algorithms LVQ algorithms LVQ algorithms
- are often based on purely heuristic arguments
or derived from a cost function with unclear
relation to the generalization ability
- almost exclusively use the Euclidean distance measure
inappropriate for heterogeneous data
- lack in general a thorough theoretical understanding of
dynamics convergence properties
performance wrt generalization etc
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
In the following
analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- asymptotic behavior in the limit of many examples
typical behavior in a model situation
- randomized high-dimensional data
- essential features of LVQ learning
aim - contribute to the theoretical understanding- develop efficient LVQ schemes- test in applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
model situationmodel situationmodel situationmodel situation two clusters of N-dimensional data
random vectors ξξξξ isin ℝN according to σ)P(p )P(
1σσ ξξξξξξξξ sum
plusmn=
=
( )( )
minus=
2
σN2-
2
1exp
2π
1σ)P( ΒΒΒΒξξξξξξξξ lmixture of two Gaussians
orthonormal center vectors
BBBB+ BBBB- isin ℝN ( BBBBσ )2 =1 BBBB+ BBBB- =0
prior weights of classes p+ p-p+ + p- = 1
BBBB+
BBBB-
(p+)
(p-)
separation ℓℓ
jj Bσσξ l=
22222l Nξ1ξξ
N
1σσ
+==rarr=minus sum=j
jjj ξξξξ
independent components
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
high-dimensional data (formally Nrarrinfin)
400 examples ξξξξmicro isinℝN N=200 ℓ=1 p+=06micro
By
ξξ ξξ sdot=
minusminus
(240)
(160)
projections into the plane of center vectors B+ B-
microBy ξξξξsdot= ++
micro2
2xξξ ξξ
ww wwsdot
=
(240)(160)
projections in two independent random directions wwww12
micro11x ξξξξwwww sdot=
model for studying typical behavior of LVQ algorithmsnot density-estimation based classification
NoteNoteNoteNote
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
dynamics of ondynamics of ondynamics of ondynamics of on----line trainingline trainingline trainingline training
sequence of independent random data ( )123micromicro =ξξξξ acc to ( )microP ξξξξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
above examples
unsupervisedunsupervisedunsupervisedunsupervised Vector QuantizationVector QuantizationVector QuantizationVector Quantization [ ] ( ) dd fmicros
micross minusΘ= minus
The Winner Takes It All (classes irrelevantunknown)
Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo Learning Vector Quantization ldquo21rdquo [ ] σS fs)(1)(1
classcorrectclasswrong
+minus=sdot=
here two prototypes noexplicit competition
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+=
( )21minusminus=
plusmn=
micros
micromicrosd
1σS
wwwwξξξξ
update of prototype vectors
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
[ ] ( )
[ ] ( ) [ ] ( ) [ ] [ ] ( )Ν
1Οffη QxfηQxfη1N
Ryfη1N
RR
ts1-micro
stmicrost
1-microst
microts
1-microst
microst
1-microsσ
microσs
1-microsσ
microsσ
++minus+minus=minus
minus=minus
2
[ ] ( ) 1-micros
micromicros-
micross
1-micros
micros σSddf
N
ηwwwwξξξξwwwwwwww minus+= rarrrarrrarrrarr recursionsrecursionsrecursionsrecursions
mathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamicsmathematical analysis of the learning dynamics
( ) ( ) 1221 -micross
micros
micromicros
micromicros Q2xd +minus=minus= minus ξξξξwwwwξξξξ
micromicromicro1-micros
micros ξByx sdot=sdot= ττξξξξwwwwprojections
distances
random vector ξmicro enters only in the form of
( )11 +minusisinsdot=sdot= σtsmicrot
micros
microstσ
micros
microsσ QBR wwwwwwwwwwww
projections in the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities1 description in terms of a few characteristic quantitities
( here ℝ2N rarr ℝ7 )
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
sσ
N
1jjσjsσ
N
1jjsσs R x ll === sumsum
==
Bww jξ
completely specified in terms of first and second moments (wo indices micro)
in the thermodynamic limit thermodynamic limit thermodynamic limit thermodynamic limit N rarrrarrrarrrarr infininfininfininfin
random vector acc to σ)|P( micro rarrξξξξmicromicro
micro1-micros
micros
By
wx
ξξξξ
ξξξξ
sdot=
sdot=
ττ
correlated Gaussianrandom quantities
stσtσsσt s Q xx- xx = τττ sσσsσ s R yx- yx =
ρττρτρ δ yy- yyσσσ
===
=
else
σ ifsσσ
y0
Sl
l δτ
2 average over the current example2 average over the current example2 average over the current example2 average over the current example
rarrrarrrarrrarr averaged recursionsaveraged recursionsaveraged recursionsaveraged recursions closed in Rsσ Qst pσ
1σσ LL sum
plusmn=
=
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
characteristic quantities
- depend on the random sequence of example data
- their variance vanishes with N rarrrarrrarrrarr infininfininfininfin (here prop N-1)
microsσ
microst
R Q
learning dynamics is completely described in terms of averagesaveragesaveragesaverages
3 self3 self3 self3 self----averaging propertiesaveraging propertiesaveraging propertiesaveraging properties
4 continuous learning time4 continuous learning time4 continuous learning time4 continuous learning time
N
micro α =
of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst
recursions rarr coupled ordinary differential equations
rarr evolution of projections
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
probability for misclassification of a novel example
( ) ( ) minusminusΘ++minusΘ= +minusminusminus++ ddpddp gεεεε
( ) ( )
Φminus+
Φ=
minusminus+
minus+minus
++
minus+minus
minusminusminus
++minus
minusminus
minusminus+
minus+minus
++
+minusminus
++minus
minusminusminus
+++
QQQ
RR2QQ
QQQ
RR2QQpp
22 2
1
2
1 ll
L
5 learning curve5 learning curve5 learning curve5 learning curve
generalization error generalization error generalization error generalization error εεεεgggg((((αααα)))) after training with α N examples
- repulsiveattractive fixed points of the dynamics
- asymptotic behavior for αrarrinfin- dependence on learning rate separation initialization
-
investigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithmsinvestigation and comparison of given algorithms
- time-dependent learning rate η(α)
- variational optimization wrt fs[]
-
optimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptionsoptimization and development of new prescriptions
maximizeα
g
d
d εεεε
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
optimal classificationoptimal classificationoptimal classificationoptimal classification with minimal generalization error
BBBB-
BBBB+
(p-gtp+ )
(p+)
separation of classes by the plane with 1)σP(p 1)σP(p +==minus= +minus ξξξξξξξξin the model situation (equal variances of clusters)
excess error
minimal εg as a function
of prior weights ℓ=2
εg
025
050
005 100 p+
ℓ=1
ℓ=0
ℓ
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquoLVQ 21ldquo update the correct and wrong winner
( ) 1-micros
micro1-micros
micros Sσ
N
ηwwwwξξξξwwwwwwww minus+=
(analytical)integrationfor wwwws(0) = 0
( ) ( )
( ) ( ) KKll
Kll
αmηαmη
αmηαmη
e12
m1
mRe1
2
m1
mR
Qe12
m1
mRe1
2
m1
mR
++minusminusminus
++minus
minus+minus
++
minus+
=+minusminus
minus=
=minusminus
minus=minus+
=
pσ = (1+m σ ) 2 (mgt0)
[Seo Obermeyer] LVQ21 ս cost function
(likelihood ratios)
infinrarrinfinrarrminus+minusminus+minusminusminus
++minus+++
αQQRR
Q R R
with
finite remain
Q ++ R ++ R minus+
R +minus Q minus+
Q minusminus R minusminus
α 102 4 86
6-
0
6theory and simulation (N=100)p+=08 ℓ=1 η=05averages over 100 independent runs
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
(p- )
(p+gt p-)
sssstrategiestrategiestrategiestrategies
- selection of dataselection of dataselection of dataselection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
- Soft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector QuantizationSoft Robust Learning Vector Quantization [Seo amp Obermayer]
density-estimation based cost function
limiting case Learning from mistakes Learning from mistakes Learning from mistakes Learning from mistakes LVQ21-step only
if the example is currently misclassified
slow learning poor generalization
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification fuumlr αrarrinfin
εg = max p+p-
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquoldquo The winner takes it all rdquo
numericalintegrationfor wwwws(0)=0
theory and simulation (N=200)p+=02 ℓ=12 η=12averaged over 100 indep runs
Q++
Q--
Q+-
α
wwww++++
wwww----
ℓℓℓℓ BBBB++++
ℓℓℓℓ BBBB----
trajectories in the (B+B- )-plane
(bull) α=2040140
optimal decision boundary____ asymptotic position
RS+
RS-
R--
R-+
R--
R++
winner wwwws plusmn1
I) LVQ 1LVQ 1LVQ 1LVQ 1 [Kohonen] [ ] ( ) 1-micros
micromicromicroS
microS
1-micros
micros Sσdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
only the winner is updated according to the class membership
wwww-
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
learning curvelearning curvelearning curvelearning curve
α
εg η=12
(p+=02 ℓ=12)
εg (αrarrinfin) grows lin with η
- stationary state
- role of the learning rate
α100 200 300
εg
026
022
018
0140
η
20
04
02
ηrarr0 - variable rate η(α)
- wellwellwellwell----defined asymptoticsdefined asymptoticsdefined asymptoticsdefined asymptotics
(ODE linear in η)
10
εg
20 30 40 500014
026
022
018
min εg
(η α)
ηrarr0
η rarr0 αrarrinfin
( η α ) rarr infin
suboptimal
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
ldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquoldquo The winner takes it all ldquo
II ) LVQ+LVQ+LVQ+LVQ+ ( only positive steps without repulsion)
[ ] ( ) ( ) 1-micros
microS
microσ
microS
microS
1-micros
micros δdd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
winner correct
αrarrinfin asymptotic configuration
symmetric about ℓℓℓℓ (B(B(B(B+++++B+B+B+B----)2)2)2)2
wwww-
wwww+
ℓ ℓ ℓ ℓ BBBB+
ℓ ℓ ℓ ℓ BBBB-
p+=02 ℓ=12 η=12
classification scheme and the
achieved generalization error are
independent of the independent of the independent of the independent of the prior weights prior weights prior weights prior weights ppppplusmnplusmnplusmnplusmn
(and optimal for ppppplusmnplusmnplusmnplusmn = 12 )
LVQ+ asymp VQ within the classes
(ws updated only from class S)
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
- LVQ 21
trivial assignment to the
more frequent class
optimal classification
εg
pppp++++
min p+p-
- LVQ 1
here close to optimal
classification
pppp++++
- LVQ+
min-max solution
pplusmn -independent classification
p+=02 ℓ=10 η=10εg
α
learning curveslearning curveslearning curveslearning curves
LVQ+
LVQ1
asymptotics ηrarr0 (ηα)rarrinfin
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
Vector QuantizationVector QuantizationVector QuantizationVector Quantization
competitive learning [ ] ( ) 1-micros
micromicroS
microS
1-micros
micros dd
N
ηwwwwξξξξwwwwwwww minusminusΘ+= minus
wwwws winner
class membership is unknown
or identical for all data
numerical integration for wwwws(0)asymp0
( p+=02 ℓ=10 η=12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10
system is invariant under
exchange of the prototypes
rarr weakly repulsive fixed points
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
interpretations
- VQ unsupervised learningunlabelled data
- LVQ two prototypes of thesame class identical labels
- LVQ different classes butlabels are not used in training
εg
pppp++++
asymptotics (αrarrηrarr0 ηαrarrinfin)
pppp++++asymp0 asymp0 asymp0 asymp0
pppp----asymp1 asymp1 asymp1 asymp1
- low quantization error
- high gen error εg
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
work in progress outlookwork in progress outlookwork in progress outlookwork in progress outlook
bull regularization of LVQ 21 Robust Soft LVQ [Seo Obermayer]
bull model different cluster variances more clustersprototypes
bull optimized procedures learning rate schedules
variational approach density estimation Bayes optimal on-line
bull several classes and prototypes
Summary
bullprototypeprototypeprototypeprototype----based learningbased learningbased learningbased learning
Vector Quantization and Learning Vector Quantization
bulla model scenarioa model scenarioa model scenarioa model scenario two clusters two prototypes
dynamics of online training
bullcomparison of algorithmscomparison of algorithmscomparison of algorithmscomparison of algorithms
LVQ 21 instability trivial (stationary) classification
LVQ 1 close to optimal asymptotic generalization
LVQ + min-max solution wrt asymptotic generalization
VQ symmetry breaking representation
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
The Dynamics of Learning Vector Quantization RUG 10012005
Perspectives
bullSelfSelfSelfSelf----Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)Organizing Maps (SOM)
(many) N-dim prototypes form a (low) d-dimensional grid
representation of data in a topology preserving map
neighborhood preserving SOM Neural Gas (distance based)
bullGeneralized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ Generalized Relevance LVQ [Hammer amp Villmann]
adaptive metrics eg distance measure ( )sum=
minus=N
i
iii w
1
2)( sλ ξξwd λ
training
bullapplications applications applications applications
Top Related