K means BdBased Consensus Cl iCl ustering
Transcript of K means BdBased Consensus Cl iCl ustering
K B d C Cl iK‐means Based Consensus Clustering
Dr. Junjie WuDr. Junjie WuBeihang University
OutlineOutline
• Motivations
• Point‐to‐Centroid DistancePoint to Centroid Distance
• Utility Functions for KCC
• Experimental Results
• Concluding remarksConcluding remarks
Cluster AnalysisCluster Analysis
Clustering AlgorithmsClustering Algorithms
• Prototype‐based: K‐means, FCM
• Density‐based: DBSCAN, CLIQUEDensity based: DBSCAN, CLIQUE
• Graph‐based: AHC, MinMaxCut
• Hybrid: K‐means + AHC
Problems with Single ClustererProblems with Single Clusterer
• No perfect one!
• Sensitive to data factorsSensitive to data factors
• Hard to set proper parameters
• May converge to bad saddle points
• Or just too heuristicOr just too heuristic …
Can we find a new way?
Consensus ClusteringConsensus Clustering
• To find a best partitioning from multiple basic partitionings (an ensemble‐classifier thinking)p g ( g)
Consensus Clustering cont’dConsensus Clustering, cont dr
1
( , ) ( , )r
i ii
wUπ π π=
Γ Π =∑1i
• U: utility function• wi: weight of πi
T( )
i g i• Г: consensus function
1
2
1,1,1,2,2,3,3
2,2,2,3,3,1,1
T
T
π
π
=
=
( )
( )
1 2 3 4 5 6 7{ , , , , , , }X x x x x x x x=3 1,1,2,2,2,3,3
1111 2 2 2
T
T
π
π
=
=
( )
( )4
1 2 3 4
1,1,1,1,2,2,2{ , , , }
ππ π π π
=Π=
( )
Consensus Clustering cont’dConsensus Clustering, cont d
• Advantages– Robust: lower the risks from weird data, improper , p palgorithms and parameters, etc.
– Novelty: may help find a better structureNovelty: may help find a better structure
– Concurrency: run in parallell h i i i i d– A Must: only have partitioning episodes
• Challengeg– NP‐complete problem
Related WorkRelated Work
• Graph‐based algorithms (Strehl et al., JMLR, 2002))
• Co‐association matrix method (Fred and Jain, PAMI 2005)PAMI, 2005)
• Binary matrix method (Topchy et al., ICDM, 2003)
• Other heuristics (Lu et al AAAI 2008; )• Other heuristics (Lu et al., AAAI, 2008; …)
Why K means Consensus ClusteringWhy K‐means Consensus Clustering
• Simple
• RobustRobust
• Efficient
NP-complete Roughly linear U?
(K-means)
Main ContributionsMain Contributions
• Theory for KCC utility functions
• KCC algorithm for inconsistent samplesKCC algorithm for inconsistent samples
• Some empirical findings– UH is a good KCC utility function
– RFS strategy is useful in some circumstancesgy
– Some mutual funds have unethical behaviors
OutlineOutline
• Motivations
• Point‐to‐Centroid DistancePoint to Centroid Distance
• Utility Functions for KCC
• Experimental Results
• Concluding remarksConcluding remarks
K means ClusteringK‐means Clustering
• Objective function
i ( )K
f∑∑1
min ( , )k
x kk x C
w f x m= ∈∑∑
• Two phase iterations– Assign x to the nearest centroidAssign x to the nearest centroid
– Update centroidsTh i h i id i f d• The arithmetic mean centroid is preferred
Point to Centroid DistancePoint‐to‐Centroid Distance
• What if fix centroid type to arithmetic mean?
• A definitionA definition
( , ) ( ) ( ) ( ) ( ).Tf x y x y x y yφ φ φ= − − − ∇
• A theorem
( , ) ( ) ( ) ( ) ( )f y y y yφ φ φ
K‐means
ExamplesExamples
T• ( , ) ( ) ( ) ( ) ( )Tf x y x y x y yφ φ φ= − − − ∇
φ(x) f(x,y) Name
‖x‖2 ‖x-y‖2 Sqrt Euc. Dist.
H( ) ( ‖ ) KL di-H(x) D(x‖y) KL‐divergence
‖ ‖ ‖ ‖ Cosine Dist
• Interestingly f is not a metric!
‖x‖ ‖x‖-xty/‖y‖
Cosine Dist.
• Interestingly, f is not a metric!
OutlineOutline
• Motivations
• Point‐to‐Centroid DistancePoint to Centroid Distance
• Utility Functions for KCC
• Experimental Results
• Concluding remarksConcluding remarks
Definition and SimplificationDefinition and Simplification
( ) i ( )r K
bf∑ ∑ ∑1 1
max ( , ) min ( , )bl k
bi i l k
i k x C
wU f x mπ π= = ∈
⇔∑ ∑ ∑
A Critical FactA Critical Fact
• The contingency table for π and πi
n p
Th t id i i ht th li d !• The centroid is right the normalized row!
A Sufficient ConditionA Sufficient Condition
r K
1 1( , ) ( )
r K
i i k ki k
wU p mπ π φ+= =
=∑ ∑
2
1 1 1
min ( , ) min ( ) ( ) max ( )bl k
K K KP C Db bl k l k k k kl
k k kx C
f x m x p m p mφ φ φ−
+ += = =∈
⇔ − ⇔∑ ∑ ∑ ∑ ∑
A Sufficient Condition cont’dA Sufficient Condition, cont dTheorem 2Theorem 2
,1
( ) ( ),1r
k i k ii
m w m k Kφ ϕ=
= ≤ ≤∑
Theorem 2
K( ) ( ) ( )1
1( , ) ( / , , / , , / )
i
Ki i i
i k k k kj k kK kk
U p p p p p p pπ π ϕ+ + + +=
= < >∑
ExamplesExamples( , )iU π π( )k imϕ ( , )b
l kf x m
CU
( , )i,( )k iϕ
2 ( ) 2,
1 1( )
i iK Ki
k ij jj j
m p+= =
−∑ ∑ ( ) 2 ( ) 2
1 1 1
( / ) ( )i iK KK
i ik kj k j
k j j
p p p p+ + += = =
−∑ ∑ ∑ ( ) 2, ,
1
|| ||r
bi l i k i
i
w x m−∑j j 1 1 1k j j 1i=
HU ( )( , )iMI C C( ) ( )log logi iK K
i ik ij k ij j jm m p p+ +−∑ ∑ ( )
, ,( || )r
bi l i k iw D x m∑( , ), ,
1 1g gk ij k ij j j
j jp p+ +
= =∑ ∑ , ,
1i=∑
( ) ( )|| || || ||i i ( ) 2 ( ) 2i iK KK
i i∑ ∑ ∑r
∑( ) ( )
, 1|| || || , , ||i
i ik i Km p p+ +− < >
cosU ( ) 2 ( ) 2
1 1 1( / ) ( )i i
k kj k jk j j
p p p p+ + += = =
−∑ ∑ ∑ ( ), ,
1
(1 cos( , ))bi l i k i
i
w x m=
−∑
K
( ) ( )
1 1 1( / ) ( )
i iK KKi p i pp p
k kj k jk j j
p p p p+ + += = =
−∑ ∑ ∑LpU ( ) ( ), 1|| || || , , ||
i
i ik i p K pm p p+ +− < >
( ) 1, ,1
11 ,
( )(1 )
(|| || )
iK b pr l ij k ijji p
i k i p
x mw
m
−=
−=
−∑∑
CU HU cosU5LU
8LU
PropertiesProperties
• Non‐uniqueness of Uϕ
( ) ( ), , 1( ) ( ) ( , , )
i
i is k i k i Km m p p
α
ϕ ϕ ϕ + += − < > Utility function for KCC
( ) ( )1
1( , ) ( ( / ), , ( / ) )
s i
Ki i
i k s k k kK kk
U p p p p pϕ π π ϕ+ + +=
= < >∑ Many to one
( ) ( )1( , ) ( , , )
i
i ii KU p pϕ π π ϕ + += − < >
P2C distance
standard form, or utility gain
Properties cont’dProperties, cont d
• Non‐negativity of Utility Gain
, ,1 1
( , ) ( ) ( )K K
i k k i k k ik k
U p m p mϕ π π ϕ ϕ+ += =
= ≥∑ ∑∵
( ) ( ) ( ) ( )1 1
1
( , , ) ( , , )i i
Ki i i i
k kK Kk
p p p pϕ ϕ + +=
= < > = < >∑
( ) ( )1( , ) ( , ) ( , , ) 0
s i
i ii i KU U p pϕ ϕπ π π π ϕ + +∴ = − < > ≥
Properties cont’dProperties , cont d
• Utility Gain Ratio( ) ( )( ) ( )/ | ( ) |i im m p pϕ ϕ ϕ= < >, , 1( ) ( )/ | ( , , ) |
in k i s k i Km m p pϕ ϕ ϕ + += < >
K( ) ( )1
1( ) ( )
( , ) ( ( / ), , ( / ) )n i
Ki i
i k n k k kK kk
i i
U p p p p pϕ π π ϕ+ + +=
= < >∑( ) ( )1
( ) ( )1
( , ) ( , , )| ( , , ) |
i
i
i ii K
i iK
U p pp p
ϕ π π ϕϕ
+ +
+ +
− < >=
< >
normalized form, or utility gain ratio
Inconsistent SamplesInconsistent Samples
• Often we have basic partitionings from inconsistent sample setsp
• Adjustments for KCCb∑ ( ) ,
, ( )
ibl k k
bl ijx C C
k ij i
xm
n n∈=−
∑ ∩
k kn n( )
( )( )
ii k k
k i
n npn n+
−=
n n+−
( ) ( ) ( ) ( ) ( )1( , ) ( / , , / )
i
Ki i i i i
i k k k kK kU p p p p pπ π ϕ+ + += < >∑1
ik=
Inconsistent Samples cont’dInconsistent Samples, cont d
• The proof for convergence
( ) ( )r K r K
b bf fΔ ∑∑∑ ∑∑∑( ) ( )
( ) ( )
, , ,1 1 1 1
( , ) ( , )
( , ) ( , )
i ib bl k l kk k
i ib b
b bl i l i k ix C C x C C
i k i k
r K Kb bl i l i k i
f x y f x m
f x y f x m
∈ ∈= = = =
Δ = −
⎡ ⎤= −⎢ ⎥
∑∑∑ ∑∑∑
∑ ∑∑ ∑∑
∩ ∩
∩ ∩( ) ( )
( )
, , ,1 1 1
, ,
( , ) ( , )
( ) ( ) ( ) ( )
i ib bl k l kk k
ibl k k
l i l i k ix C C x C Ci k k
r Kb
k i l ix C C
f x y f x m
m y x y yφ φ φ
∈ ∈= = =
∈
⎢ ⎥⎣ ⎦⎡ ⎤
= − − − ∇⎢ ⎥⎣ ⎦
∑ ∑∑ ∑∑
∑∑
∩ ∩
∩∑1 1
l k ki k= =⎣ ⎦∩
( ),
1 1
( ) ( , )r K
ik k k i
i k
n n f m y= =
= −∑∑ 0≥ P2C distance
OutlineOutline
• Motivations
• Point‐to‐Centroid DistancePoint to Centroid Distance
• Utility Functions for KCC
• Experimental Results
• Concluding remarksConcluding remarks
DataData
Strategies for Basic ClusteringsStrategies for Basic Clusterings
• For UCI data sets:– Random Parameter Selection (RPS) with K‐means ( )clustering:
– Random Feature Selection (RFS) with K‐meansRandom Feature Selection (RFS) with K means clustering: two features for a basic clustering
F t t d t t• For text data sets:–Multiple Clustering Algorithms with CLUTO (5 clustering methods × 5 objective functions)
• Validation measure: normalized Rand index RnValidation measure: normalized Rand index Rn
Experimental Results #1Experimental Results, #1
• Convergence property of KCC
Experimental Results #1Experimental Results, #1
• Convergence property of KCC
Experimental Results #2Experimental Results, #2
• Clustering accuracy of KCC
datasetdataset
cos 5 8 cos 5 8H H L L L L C CU NU U U U NU NU NU U NU
Experimental Results #2Experimental Results, #2
• Clustering accuracy of KCC
Experimental Results #3Experimental Results, #3
• Effects of RFS strategy
Experimental Results #3Experimental Results, #3
• Effects of RFS strategy
OutlineOutline
• Motivations
• Point‐to‐Centroid DistancePoint to Centroid Distance
• Utility Functions for KCC
• Experimental Results
• Concluding remarksConcluding remarks
ConclusionsConclusions
• Study “K‐means based consensus clustering”– Give a sufficient condition– Handle the inconsistent samples
Give some empirical results for practical use– Give some empirical results for practical use
• Future work– Give the necessary condition (almost done!)
– ApplicationsApplications
Thank You!Thank You!
http://datamining buaa edu cnhttp://datamining.buaa.edu.cn