K means BdBased Consensus Cl iCl ustering

38
K B dC Cl i Kmeans BasedConsensus Clustering Dr. Junjie Wu Dr. Junjie Wu Beihang University

Transcript of K means BdBased Consensus Cl iCl ustering

Page 1: K means BdBased Consensus Cl iCl ustering

K B d C Cl iK‐means Based Consensus Clustering

Dr. Junjie WuDr. Junjie WuBeihang University

Page 2: K means BdBased Consensus Cl iCl ustering

OutlineOutline

• Motivations

• Point‐to‐Centroid DistancePoint to Centroid Distance

• Utility Functions for KCC

• Experimental Results

• Concluding remarksConcluding remarks

Page 3: K means BdBased Consensus Cl iCl ustering

Cluster AnalysisCluster Analysis

Page 4: K means BdBased Consensus Cl iCl ustering

Clustering AlgorithmsClustering Algorithms

• Prototype‐based: K‐means, FCM

• Density‐based: DBSCAN, CLIQUEDensity based: DBSCAN, CLIQUE

• Graph‐based: AHC, MinMaxCut

• Hybrid: K‐means + AHC

Page 5: K means BdBased Consensus Cl iCl ustering

Problems with Single ClustererProblems with Single Clusterer

• No perfect one!

• Sensitive to data factorsSensitive to data factors

• Hard to set proper parameters

• May converge to bad saddle points

• Or just too heuristicOr just too heuristic …

Can we find a new way? 

Page 6: K means BdBased Consensus Cl iCl ustering

Consensus ClusteringConsensus Clustering

• To find a best partitioning from multiple basic partitionings (an ensemble‐classifier thinking)p g ( g)

Page 7: K means BdBased Consensus Cl iCl ustering

Consensus Clustering cont’dConsensus Clustering, cont dr

1

( , ) ( , )r

i ii

wUπ π π=

Γ Π =∑1i

• U: utility function• wi: weight of πi

T( )

i g i• Г: consensus function 

1

2

1,1,1,2,2,3,3

2,2,2,3,3,1,1

T

T

π

π

=

=

( )

( )

1 2 3 4 5 6 7{ , , , , , , }X x x x x x x x=3 1,1,2,2,2,3,3

1111 2 2 2

T

T

π

π

=

=

( )

( )4

1 2 3 4

1,1,1,1,2,2,2{ , , , }

ππ π π π

=Π=

( )

Page 8: K means BdBased Consensus Cl iCl ustering

Consensus Clustering cont’dConsensus Clustering, cont d

• Advantages– Robust: lower the risks from weird data, improper , p palgorithms and parameters, etc.

– Novelty: may help find a better structureNovelty:  may help find a better structure

– Concurrency: run in parallell h i i i i d– A Must: only have partitioning episodes 

• Challengeg– NP‐complete problem

Page 9: K means BdBased Consensus Cl iCl ustering

Related WorkRelated Work

• Graph‐based algorithms (Strehl et al., JMLR, 2002))

• Co‐association matrix method (Fred and Jain, PAMI 2005)PAMI, 2005)

• Binary matrix method (Topchy et al., ICDM, 2003)

• Other heuristics (Lu et al AAAI 2008; )• Other heuristics (Lu et al., AAAI, 2008; …)

Page 10: K means BdBased Consensus Cl iCl ustering

Why K means Consensus ClusteringWhy K‐means Consensus Clustering

• Simple

• RobustRobust

• Efficient

NP-complete Roughly linear U?

(K-means)

Page 11: K means BdBased Consensus Cl iCl ustering

Main ContributionsMain Contributions

• Theory for KCC utility functions

• KCC algorithm for inconsistent samplesKCC algorithm for inconsistent samples

• Some empirical findings– UH is a good KCC utility function

– RFS strategy is useful in some circumstancesgy

– Some mutual funds have unethical behaviors

Page 12: K means BdBased Consensus Cl iCl ustering

OutlineOutline

• Motivations

• Point‐to‐Centroid DistancePoint to Centroid Distance

• Utility Functions for KCC

• Experimental Results

• Concluding remarksConcluding remarks

Page 13: K means BdBased Consensus Cl iCl ustering

K means ClusteringK‐means Clustering

• Objective function

i ( )K

f∑∑1

min ( , )k

x kk x C

w f x m= ∈∑∑

• Two phase iterations– Assign x to the nearest centroidAssign x to the nearest centroid

– Update centroidsTh i h i id i f d• The arithmetic mean centroid is preferred

Page 14: K means BdBased Consensus Cl iCl ustering

Point to Centroid DistancePoint‐to‐Centroid Distance

• What if fix centroid type to arithmetic mean?

• A definitionA definition

( , ) ( ) ( ) ( ) ( ).Tf x y x y x y yφ φ φ= − − − ∇

• A theorem

( , ) ( ) ( ) ( ) ( )f y y y yφ φ φ

K‐means

Page 15: K means BdBased Consensus Cl iCl ustering

ExamplesExamples

T• ( , ) ( ) ( ) ( ) ( )Tf x y x y x y yφ φ φ= − − − ∇

φ(x) f(x,y) Name

‖x‖2 ‖x-y‖2 Sqrt Euc. Dist.

H( ) ( ‖ ) KL di-H(x) D(x‖y) KL‐divergence

‖ ‖ ‖ ‖ Cosine Dist

• Interestingly f is not a metric!

‖x‖ ‖x‖-xty/‖y‖

Cosine Dist.

• Interestingly,  f is not a metric!

Page 16: K means BdBased Consensus Cl iCl ustering

OutlineOutline

• Motivations

• Point‐to‐Centroid DistancePoint to Centroid Distance

• Utility Functions for KCC

• Experimental Results

• Concluding remarksConcluding remarks

Page 17: K means BdBased Consensus Cl iCl ustering

Definition and SimplificationDefinition and Simplification

( ) i ( )r K

bf∑ ∑ ∑1 1

max ( , ) min ( , )bl k

bi i l k

i k x C

wU f x mπ π= = ∈

⇔∑ ∑ ∑

Page 18: K means BdBased Consensus Cl iCl ustering

A Critical FactA Critical Fact

• The contingency table for π and πi

n p

Th t id i i ht th li d !• The centroid is right the normalized row! 

Page 19: K means BdBased Consensus Cl iCl ustering

A Sufficient ConditionA Sufficient Condition

r K

1 1( , ) ( )

r K

i i k ki k

wU p mπ π φ+= =

=∑ ∑

2

1 1 1

min ( , ) min ( ) ( ) max ( )bl k

K K KP C Db bl k l k k k kl

k k kx C

f x m x p m p mφ φ φ−

+ += = =∈

⇔ − ⇔∑ ∑ ∑ ∑ ∑

Page 20: K means BdBased Consensus Cl iCl ustering

A Sufficient Condition cont’dA Sufficient Condition, cont dTheorem 2Theorem 2

,1

( ) ( ),1r

k i k ii

m w m k Kφ ϕ=

= ≤ ≤∑

Theorem 2

K( ) ( ) ( )1

1( , ) ( / , , / , , / )

i

Ki i i

i k k k kj k kK kk

U p p p p p p pπ π ϕ+ + + +=

= < >∑

Page 21: K means BdBased Consensus Cl iCl ustering

ExamplesExamples( , )iU π π( )k imϕ ( , )b

l kf x m

CU

( , )i,( )k iϕ

2 ( ) 2,

1 1( )

i iK Ki

k ij jj j

m p+= =

−∑ ∑ ( ) 2 ( ) 2

1 1 1

( / ) ( )i iK KK

i ik kj k j

k j j

p p p p+ + += = =

−∑ ∑ ∑ ( ) 2, ,

1

|| ||r

bi l i k i

i

w x m−∑j j 1 1 1k j j 1i=

HU ( )( , )iMI C C( ) ( )log logi iK K

i ik ij k ij j jm m p p+ +−∑ ∑ ( )

, ,( || )r

bi l i k iw D x m∑( , ), ,

1 1g gk ij k ij j j

j jp p+ +

= =∑ ∑ , ,

1i=∑

( ) ( )|| || || ||i i ( ) 2 ( ) 2i iK KK

i i∑ ∑ ∑r

∑( ) ( )

, 1|| || || , , ||i

i ik i Km p p+ +− < >

cosU ( ) 2 ( ) 2

1 1 1( / ) ( )i i

k kj k jk j j

p p p p+ + += = =

−∑ ∑ ∑ ( ), ,

1

(1 cos( , ))bi l i k i

i

w x m=

−∑

K

( ) ( )

1 1 1( / ) ( )

i iK KKi p i pp p

k kj k jk j j

p p p p+ + += = =

−∑ ∑ ∑LpU ( ) ( ), 1|| || || , , ||

i

i ik i p K pm p p+ +− < >

( ) 1, ,1

11 ,

( )(1 )

(|| || )

iK b pr l ij k ijji p

i k i p

x mw

m

−=

−=

−∑∑

CU HU cosU5LU

8LU

Page 22: K means BdBased Consensus Cl iCl ustering

PropertiesProperties

• Non‐uniqueness of Uϕ

( ) ( ), , 1( ) ( ) ( , , )

i

i is k i k i Km m p p

α

ϕ ϕ ϕ + += − < > Utility function for KCC

( ) ( )1

1( , ) ( ( / ), , ( / ) )

s i

Ki i

i k s k k kK kk

U p p p p pϕ π π ϕ+ + +=

= < >∑ Many to one

( ) ( )1( , ) ( , , )

i

i ii KU p pϕ π π ϕ + += − < >

P2C distance

standard form, or utility gain

Page 23: K means BdBased Consensus Cl iCl ustering

Properties cont’dProperties, cont d

• Non‐negativity of Utility Gain

, ,1 1

( , ) ( ) ( )K K

i k k i k k ik k

U p m p mϕ π π ϕ ϕ+ += =

= ≥∑ ∑∵

( ) ( ) ( ) ( )1 1

1

( , , ) ( , , )i i

Ki i i i

k kK Kk

p p p pϕ ϕ + +=

= < > = < >∑

( ) ( )1( , ) ( , ) ( , , ) 0

s i

i ii i KU U p pϕ ϕπ π π π ϕ + +∴ = − < > ≥

Page 24: K means BdBased Consensus Cl iCl ustering

Properties cont’dProperties , cont d

• Utility Gain Ratio( ) ( )( ) ( )/ | ( ) |i im m p pϕ ϕ ϕ= < >, , 1( ) ( )/ | ( , , ) |

in k i s k i Km m p pϕ ϕ ϕ + += < >

K( ) ( )1

1( ) ( )

( , ) ( ( / ), , ( / ) )n i

Ki i

i k n k k kK kk

i i

U p p p p pϕ π π ϕ+ + +=

= < >∑( ) ( )1

( ) ( )1

( , ) ( , , )| ( , , ) |

i

i

i ii K

i iK

U p pp p

ϕ π π ϕϕ

+ +

+ +

− < >=

< >

normalized form, or utility gain ratio

Page 25: K means BdBased Consensus Cl iCl ustering

Inconsistent SamplesInconsistent Samples

• Often we have basic partitionings from inconsistent sample setsp

• Adjustments for KCCb∑ ( ) ,

, ( )

ibl k k

bl ijx C C

k ij i

xm

n n∈=−

∑ ∩

k kn n( )

( )( )

ii k k

k i

n npn n+

−=

n n+−

( ) ( ) ( ) ( ) ( )1( , ) ( / , , / )

i

Ki i i i i

i k k k kK kU p p p p pπ π ϕ+ + += < >∑1

ik=

Page 26: K means BdBased Consensus Cl iCl ustering

Inconsistent Samples cont’dInconsistent Samples, cont d

• The proof for convergence

( ) ( )r K r K

b bf fΔ ∑∑∑ ∑∑∑( ) ( )

( ) ( )

, , ,1 1 1 1

( , ) ( , )

( , ) ( , )

i ib bl k l kk k

i ib b

b bl i l i k ix C C x C C

i k i k

r K Kb bl i l i k i

f x y f x m

f x y f x m

∈ ∈= = = =

Δ = −

⎡ ⎤= −⎢ ⎥

∑∑∑ ∑∑∑

∑ ∑∑ ∑∑

∩ ∩

∩ ∩( ) ( )

( )

, , ,1 1 1

, ,

( , ) ( , )

( ) ( ) ( ) ( )

i ib bl k l kk k

ibl k k

l i l i k ix C C x C Ci k k

r Kb

k i l ix C C

f x y f x m

m y x y yφ φ φ

∈ ∈= = =

⎢ ⎥⎣ ⎦⎡ ⎤

= − − − ∇⎢ ⎥⎣ ⎦

∑ ∑∑ ∑∑

∑∑

∩ ∩

∩∑1 1

l k ki k= =⎣ ⎦∩

( ),

1 1

( ) ( , )r K

ik k k i

i k

n n f m y= =

= −∑∑ 0≥ P2C distance

Page 27: K means BdBased Consensus Cl iCl ustering

OutlineOutline

• Motivations

• Point‐to‐Centroid DistancePoint to Centroid Distance

• Utility Functions for KCC

• Experimental Results

• Concluding remarksConcluding remarks

Page 28: K means BdBased Consensus Cl iCl ustering

DataData

Page 29: K means BdBased Consensus Cl iCl ustering

Strategies for Basic ClusteringsStrategies for Basic Clusterings

• For UCI data sets:– Random Parameter Selection (RPS) with K‐means ( )clustering:

– Random Feature Selection (RFS) with K‐meansRandom Feature Selection (RFS) with K means clustering: two features for a basic clustering

F t t d t t• For text data sets:–Multiple Clustering Algorithms with CLUTO (5 clustering methods × 5 objective functions)

• Validation measure: normalized Rand index RnValidation measure: normalized Rand index Rn

Page 30: K means BdBased Consensus Cl iCl ustering

Experimental Results #1Experimental Results, #1

• Convergence property of KCC

Page 31: K means BdBased Consensus Cl iCl ustering

Experimental Results #1Experimental Results, #1

• Convergence property of KCC

Page 32: K means BdBased Consensus Cl iCl ustering

Experimental Results #2Experimental Results, #2

• Clustering accuracy of KCC

datasetdataset

cos 5 8 cos 5 8H H L L L L C CU NU U U U NU NU NU U NU

Page 33: K means BdBased Consensus Cl iCl ustering

Experimental Results #2Experimental Results, #2

• Clustering accuracy of KCC

Page 34: K means BdBased Consensus Cl iCl ustering

Experimental Results #3Experimental Results, #3

• Effects of RFS strategy

Page 35: K means BdBased Consensus Cl iCl ustering

Experimental Results #3Experimental Results, #3

• Effects of RFS strategy

Page 36: K means BdBased Consensus Cl iCl ustering

OutlineOutline

• Motivations

• Point‐to‐Centroid DistancePoint to Centroid Distance

• Utility Functions for KCC

• Experimental Results

• Concluding remarksConcluding remarks

Page 37: K means BdBased Consensus Cl iCl ustering

ConclusionsConclusions

• Study “K‐means based consensus clustering”– Give a sufficient condition– Handle the inconsistent samples

Give some empirical results for practical use– Give some empirical results for practical use

• Future work– Give the necessary condition (almost done!)

– ApplicationsApplications

Page 38: K means BdBased Consensus Cl iCl ustering

Thank You!Thank You!

http://datamining buaa edu cnhttp://datamining.buaa.edu.cn