icml2004 tutorial on spectral clustering part I

40
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1 A Tutorial on Spectral Clustering Chris Ding Computational Research Division Lawrence Berkeley National Laboratory University of California Supported by Office of Science, U.S. Dept. of Energy

Transcript of icml2004 tutorial on spectral clustering part I

Page 1: icml2004 tutorial on spectral clustering part I

1

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1

A Tutorial on Spectral Clustering

Chris DingComputational Research Division

Lawrence Berkeley National LaboratoryUniversity of California

Supported by Office of Science, U.S. Dept. of Energy

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 2

Some historical notes• Fiedler, 1973, 1975, graph Laplacian matrix• Donath & Hoffman, 1973, bounds• Pothen, Simon, Liou, 1990, Spectral graph

partitioning (many related papers there after)• Hagen & Kahng, 1992, Ratio-cut• Chan, Schlag & Zien, multi-way Ratio-cut• Chung, 1997, Spectral graph theory book• Shi & Malik, 2000, Normalized Cut

Page 2: icml2004 tutorial on spectral clustering part I

1

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1

A Tutorial on Spectral Clustering

Chris DingComputational Research Division

Lawrence Berkeley National LaboratoryUniversity of California

Supported by Office of Science, U.S. Dept. of Energy

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 2

Some historical notes• Fiedler, 1973, 1975, graph Laplacian matrix• Donath & Hoffman, 1973, bounds• Pothen, Simon, Liou, 1990, Spectral graph

partitioning (many related papers there after)• Hagen & Kahng, 1992, Ratio-cut• Chan, Schlag & Zien, multi-way Ratio-cut• Chung, 1997, Spectral graph theory book• Shi & Malik, 2000, Normalized Cut

Page 3: icml2004 tutorial on spectral clustering part I

2

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 3

Spectral Gold-Rush of 20019 papers on spectral clustering

• Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut

• Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs

• Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space

• Belkin & Niyogi, NIPS 2001. Spectral Embedding• Dhillon, KDD 2001, Bipartite graph clustering• Zha et al, CIKM 2001, Bipartite graph clustering• Zha et al, NIPS 2001. Spectral Relaxation of K-means• Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.• Gu et al, K-way Relaxation of NormCut and MinMaxCut

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 4

Part I: Basic Theory, 1973 – 2001

Page 4: icml2004 tutorial on spectral clustering part I

2

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 3

Spectral Gold-Rush of 20019 papers on spectral clustering

• Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut

• Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs

• Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space

• Belkin & Niyogi, NIPS 2001. Spectral Embedding• Dhillon, KDD 2001, Bipartite graph clustering• Zha et al, CIKM 2001, Bipartite graph clustering• Zha et al, NIPS 2001. Spectral Relaxation of K-means• Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.• Gu et al, K-way Relaxation of NormCut and MinMaxCut

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 4

Part I: Basic Theory, 1973 – 2001

Page 5: icml2004 tutorial on spectral clustering part I

3

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 5

Spectral Graph Partitioning

MinCut: min cutsize

cutsize = # of cut edgesConstraint on sizes: |A| = |B|

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 6

2-way Spectral Graph Partitioning

∈−∈

=BiAi

qi if if

11

Partition membership indicator:

Relax indicators qi from discrete values to continuous values, the solution for min J(q) is given by the eigenvectors of

(Fiedler, 1973, 1975)

(Pothen, Simon, Liou, 1990)

jijijji iijijji iij qwdqqqqqw ][21]2[

41

,2

,2 −=−+= ∑∑ δ

2,

][41

jiji ij qqwCutSizeJ −== ∑

qWDqT )(21 −=

qqWD λ=− )(

Page 6: icml2004 tutorial on spectral clustering part I

3

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 5

Spectral Graph Partitioning

MinCut: min cutsize

cutsize = # of cut edgesConstraint on sizes: |A| = |B|

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 6

2-way Spectral Graph Partitioning

∈−∈

=BiAi

qi if if

11

Partition membership indicator:

Relax indicators qi from discrete values to continuous values, the solution for min J(q) is given by the eigenvectors of

(Fiedler, 1973, 1975)

(Pothen, Simon, Liou, 1990)

jijijji iijijji iij qwdqqqqqw ][21]2[

41

,2

,2 −=−+= ∑∑ δ

2,

][41

jiji ij qqwCutSizeJ −== ∑

qWDqT )(21 −=

qqWD λ=− )(

Page 7: icml2004 tutorial on spectral clustering part I

4

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 7

Properties of Graph Laplacian

WDL −=Laplacian matrix of the Graph:

• L is semi-positive definite xT Lx ≥ 0 for any x.

• First eigenvector is q1=(1,…,1)T = eT with λ1=0.

• Second eigenvector q2 is the desired solution.

• The smaller λ2, the better quality of the partitioning. Perturbation analysis gives

||||2 Bcutsize

Acutsize +=λ

• Higher eigenvectors are also useful

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 8

Recovering Partitions

}0)(|{},0)(|{ 22 ≥=<= iqiBiqiA

From the definition of cluster indicators: Partitions A, B are determined by:

Thus, we sort q2 to increasing order, and cut in the middle point.

2,

)]()[(41 cqcqwCutSizeJ jiji ij +−+== ∑

However, the objective function J(q) is insensitive to additive constant c :

Page 8: icml2004 tutorial on spectral clustering part I

4

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 7

Properties of Graph Laplacian

WDL −=Laplacian matrix of the Graph:

• L is semi-positive definite xT Lx ≥ 0 for any x.

• First eigenvector is q1=(1,…,1)T = eT with λ1=0.

• Second eigenvector q2 is the desired solution.

• The smaller λ2, the better quality of the partitioning. Perturbation analysis gives

||||2 Bcutsize

Acutsize +=λ

• Higher eigenvectors are also useful

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 8

Recovering Partitions

}0)(|{},0)(|{ 22 ≥=<= iqiBiqiA

From the definition of cluster indicators: Partitions A, B are determined by:

Thus, we sort q2 to increasing order, and cut in the middle point.

2,

)]()[(41 cqcqwCutSizeJ jiji ij +−+== ∑

However, the objective function J(q) is insensitive to additive constant c :

Page 9: icml2004 tutorial on spectral clustering part I

5

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 9

Multi-way Graph Partitioning

• Recursively applying the 2-way partitioning• Recursive 2-way partitioning• Using Kernigan-Lin to do local refinements

• Using higher eigenvectors• Using q3 to further partitioning those obtained via q2.

• Popular graph partitioning packages• Metis, Univ of Minnesota• Chaco, Sandia Nat’l Lab

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 10

2-way Spectral Clustering

• Undirected graphs (pairwise similarities)• Bipartite graphs (contingency tables)• Directed graphs (web graphs)

Page 10: icml2004 tutorial on spectral clustering part I

5

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 9

Multi-way Graph Partitioning

• Recursively applying the 2-way partitioning• Recursive 2-way partitioning• Using Kernigan-Lin to do local refinements

• Using higher eigenvectors• Using q3 to further partitioning those obtained via q2.

• Popular graph partitioning packages• Metis, Univ of Minnesota• Chaco, Sandia Nat’l Lab

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 10

2-way Spectral Clustering

• Undirected graphs (pairwise similarities)• Bipartite graphs (contingency tables)• Directed graphs (web graphs)

Page 11: icml2004 tutorial on spectral clustering part I

6

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 11

Spectral Clustering

min cutsize , without explicit size constraints

Need to balance sizes

But where to cut ?

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 12

Clustering Objective Functions

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

|B|s(A,B)

|A|s(A,B)(A,B)J Rcut +=

),(),),(

),(),(),(

BAsBs(BBAs

BAsAAsBAs

++

+=

s(B,B)s(A,B)

s(A,A)s(A,B)(A,B)JMMC +=

BANcut d

BAsd

BAsBAJ ),(),(),( +=

∑∑∈ ∈

=Ai Bj

ijws(A,B)

∑∈

=Ai

iA dd

Page 12: icml2004 tutorial on spectral clustering part I

6

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 11

Spectral Clustering

min cutsize , without explicit size constraints

Need to balance sizes

But where to cut ?

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 12

Clustering Objective Functions

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

|B|s(A,B)

|A|s(A,B)(A,B)J Rcut +=

),(),),(

),(),(),(

BAsBs(BBAs

BAsAAsBAs

++

+=

s(B,B)s(A,B)

s(A,A)s(A,B)(A,B)JMMC +=

BANcut d

BAsd

BAsBAJ ),(),(),( +=

∑∑∈ ∈

=Ai Bj

ijws(A,B)

∑∈

=Ai

iA dd

Page 13: icml2004 tutorial on spectral clustering part I

7

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 13

Ratio Cut (Hagen & Kahng, 1992)

Min similarity between A , B: ∑∑∈ ∈

=Ai Bj

ijw(A,B) s

Size Balance

Cluster membership indicator:

qWDq(q)J TRcut )( −= Substitute q leads to

Solution given by eigenvectorNow relax q, the solution is 2nd eigenvector of L

∈−∈

=BinnnAinnn

iq if if

21

12

//

)(

|B|s(A,B)

|A|s(A,B)(A,B)J Rcut += (Wei & Cheng, 1989)

Normalization: 0,1 == eqqq TT

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 14

Normalized Cut (Shi & Malik, 1997)

Min similarity between A & B: ∑

∈∑

∈=

Ai Bjijws(A,B)

Balance weights

∈−∈

=BidddAiddd

iqBA

AB

if if

//

)(Cluster indicator:

BANcut d

BAsd

BAsBAJ ),(),(),( += ∑∈

=Ai

iA dd

∑∈

=Gi

idd

0,1 == DeqDqq TTNormalization: Substitute q leads to qWDq(q)J T

Ncut )( −=

)1()( −+− DqqqWDq TT λqmin

DqqWD λ=− )(Solution is eigenvector of

Page 14: icml2004 tutorial on spectral clustering part I

7

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 13

Ratio Cut (Hagen & Kahng, 1992)

Min similarity between A , B: ∑∑∈ ∈

=Ai Bj

ijw(A,B) s

Size Balance

Cluster membership indicator:

qWDq(q)J TRcut )( −= Substitute q leads to

Solution given by eigenvectorNow relax q, the solution is 2nd eigenvector of L

∈−∈

=BinnnAinnn

iq if if

21

12

//

)(

|B|s(A,B)

|A|s(A,B)(A,B)J Rcut += (Wei & Cheng, 1989)

Normalization: 0,1 == eqqq TT

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 14

Normalized Cut (Shi & Malik, 1997)

Min similarity between A & B: ∑

∈∑

∈=

Ai Bjijws(A,B)

Balance weights

∈−∈

=BidddAiddd

iqBA

AB

if if

//

)(Cluster indicator:

BANcut d

BAsd

BAsBAJ ),(),(),( += ∑∈

=Ai

iA dd

∑∈

=Gi

idd

0,1 == DeqDqq TTNormalization: Substitute q leads to qWDq(q)J T

Ncut )( −=

)1()( −+− DqqqWDq TT λqmin

DqqWD λ=− )(Solution is eigenvector of

Page 15: icml2004 tutorial on spectral clustering part I

8

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 15

MinMaxCut (Ding et al 2001)

Min similarity between A & B: ∑∑∈ ∈

=Ai Bj

ijws(A,B)

∑∑∈ ∈

=Ai Aj

ijws(A,A) Max similarity within A & B:

Cluster indicator:

∈−∈

=BidddAiddd

iqBA

AB

if if

//

)(

s(B,B)s(A,B)

s(A,A)s(A,B)(A,B)JMMC +=

2/

/1

/

/1)( −

+

++

+

+=

BAm

BA

ABm

ABMMC ddJ

dd

ddJ

ddqJ

Substituting,

0)(

<m

mMMC

dJJdJ

min Jmmc ⇒ max Jm(q)

DqqWD λ=− )(DqWq ξ=

DqqWqqJ T

T

m =

Because

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 16

A simple example2 dense clusters, with sparse connections between them.

Eigenvector q2Adjacency matrix

Page 16: icml2004 tutorial on spectral clustering part I

8

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 15

MinMaxCut (Ding et al 2001)

Min similarity between A & B: ∑∑∈ ∈

=Ai Bj

ijws(A,B)

∑∑∈ ∈

=Ai Aj

ijws(A,A) Max similarity within A & B:

Cluster indicator:

∈−∈

=BidddAiddd

iqBA

AB

if if

//

)(

s(B,B)s(A,B)

s(A,A)s(A,B)(A,B)JMMC +=

2/

/1

/

/1)( −

+

++

+

+=

BAm

BA

ABm

ABMMC ddJ

dd

ddJ

ddqJ

Substituting,

0)(

<m

mMMC

dJJdJ

min Jmmc ⇒ max Jm(q)

DqqWD λ=− )(DqWq ξ=

DqqWqqJ T

T

m =

Because

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 16

A simple example2 dense clusters, with sparse connections between them.

Eigenvector q2Adjacency matrix

Page 17: icml2004 tutorial on spectral clustering part I

9

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 17

Comparison of Clustering Objectives

• If clusters are well separated, all three give very similar and accurate results.

• When clusters are marginally separated, NormCut and MinMaxCut give better results

• When clusters overlap significantly, MinMaxCut tend to give more compact and balanced clusters.

B)s(A,B)s(A, ++

+=

),),(

),(),(

Bs(BBAs

AAsBAsJ Ncut

Cluster Compactness ⇒ ),(max AAs

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 18

2-way Clustering of Newsgroups

83.6 ± 2.557.5 ± 0.953.6 ± 3.1Politics.mideastPolitics.misc

79.5 ± 11.074.4 ± 20.454.9 ± 2.5BaseballHockey

97.2 ± 1.197.2 ± 0.863.2 ± 16.2AtheismComp.graphics

MinMaxCutNormCutRatioCutNewsgroups

Page 18: icml2004 tutorial on spectral clustering part I

9

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 17

Comparison of Clustering Objectives

• If clusters are well separated, all three give very similar and accurate results.

• When clusters are marginally separated, NormCut and MinMaxCut give better results

• When clusters overlap significantly, MinMaxCut tend to give more compact and balanced clusters.

B)s(A,B)s(A, ++

+=

),),(

),(),(

Bs(BBAs

AAsBAsJ Ncut

Cluster Compactness ⇒ ),(max AAs

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 18

2-way Clustering of Newsgroups

83.6 ± 2.557.5 ± 0.953.6 ± 3.1Politics.mideastPolitics.misc

79.5 ± 11.074.4 ± 20.454.9 ± 2.5BaseballHockey

97.2 ± 1.197.2 ± 0.863.2 ± 16.2AtheismComp.graphics

MinMaxCutNormCutRatioCutNewsgroups

Page 19: icml2004 tutorial on spectral clustering part I

10

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 19

Cluster Balance Analysis I: Random Graph Model

• Random graph: edges are randomly assigned with probability p: 0 ≤ p ≤ 1.

• RatioCut & NormCut show no size dependence

• MinMaxCut favors balanced clusters: |A|=|B|

constantRcut ==+= npB

BApA

BApBAJ||

||||||

||||),(

constantNcut =−

=−

+−

=1)1(||

||||)1(||

||||),(n

nnBpBAp

nApBApBAJ

1||||

1||||

)1|(|||||||

)1|(|||||||),(

−+

−=

−+

−=

BA

AB

BBpBAp

AApBApBAJMMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 20

2-way Clustering of Newsgroups

Eigenvector

JNcut(i)

JMMC(i)

Cluster Balance

Page 20: icml2004 tutorial on spectral clustering part I

10

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 19

Cluster Balance Analysis I: Random Graph Model

• Random graph: edges are randomly assigned with probability p: 0 ≤ p ≤ 1.

• RatioCut & NormCut show no size dependence

• MinMaxCut favors balanced clusters: |A|=|B|

constantRcut ==+= npB

BApA

BApBAJ||

||||||

||||),(

constantNcut =−

=−

+−

=1)1(||

||||)1(||

||||),(n

nnBpBAp

nApBApBAJ

1||||

1||||

)1|(|||||||

)1|(|||||||),(

−+

−=

−+

−=

BA

AB

BBpBAp

AApBApBAJMMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 20

2-way Clustering of Newsgroups

Eigenvector

JNcut(i)

JMMC(i)

Cluster Balance

Page 21: icml2004 tutorial on spectral clustering part I

11

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 21

Cluster Balance Analysis II: Large Overlap Case

5.0)],(),()[2/1(

),( >+

=BBsAAs

BAsf

Conditions for skewed cuts:

2/),(),()21

21( BAsBAsf

s(A,A) =−≥ :NormCut

),(),(21 BAsBAsf

s(A,A) =≥ :MinMaxCut

Thus MinMaxCut is much less prone to skewed cuts

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 22

Spectral Clustering of Bipartite Graphs

Simultaneous clustering of rows and columnsof a contingency table (adjacency matrix B )

Examples of bipartite graphs

• Information Retrieval: word-by-document matrix

• Market basket data: transaction-by-item matrix

• DNA Gene expression profiles

• Protein vs protein-complex

Page 22: icml2004 tutorial on spectral clustering part I

11

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 21

Cluster Balance Analysis II: Large Overlap Case

5.0)],(),()[2/1(

),( >+

=BBsAAs

BAsf

Conditions for skewed cuts:

2/),(),()21

21( BAsBAsf

s(A,A) =−≥ :NormCut

),(),(21 BAsBAsf

s(A,A) =≥ :MinMaxCut

Thus MinMaxCut is much less prone to skewed cuts

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 22

Spectral Clustering of Bipartite Graphs

Simultaneous clustering of rows and columnsof a contingency table (adjacency matrix B )

Examples of bipartite graphs

• Information Retrieval: word-by-document matrix

• Market basket data: transaction-by-item matrix

• DNA Gene expression profiles

• Protein vs protein-complex

Page 23: icml2004 tutorial on spectral clustering part I

12

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 23

Spectral Clustering of Bipartite Graphs

)(2)()(

)(2)()(

),;,(22

1221

11

1221

,

,,

,

,,2121

CR

CRCR

CR

CRCRMMC Bs

BsBsBs

BsBsRRCCJ

++

+=

Simultaneous clustering of rows and columns(adjacency matrix B )

cut

min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1)

max within-cluster sum of xyz xyz weights: s(R1,C1), s(R2,C2)

(Ding, AI-STAT 2003)

∑ ∑∈ ∈

=1 2

21)( ,

Rr CcijCR

i j

bBs

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 24

Bipartite Graph Clustering

∈−∈

=2

1

if1 if1

)(RrRr

ifi

i

∈−∈

=2

1

if1 if1

)(CcCc

igi

i

Clustering indicators for rows and columns:

=

2212

2111

,,

,,

CRCR

CRCR

BBBB

B

=

00

TBB

W

=

gf

q

)()(

)()(

),;,(22

12

11

122121 Ws

WsWsWsRRCCJ MMC +=

Substitute and obtain

=

gf

DD

gf

BB

DD

c

rT

c

r λ0

0f,g are determined by

Page 24: icml2004 tutorial on spectral clustering part I

12

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 23

Spectral Clustering of Bipartite Graphs

)(2)()(

)(2)()(

),;,(22

1221

11

1221

,

,,

,

,,2121

CR

CRCR

CR

CRCRMMC Bs

BsBsBs

BsBsRRCCJ

++

+=

Simultaneous clustering of rows and columns(adjacency matrix B )

cut

min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1)

max within-cluster sum of xyz xyz weights: s(R1,C1), s(R2,C2)

(Ding, AI-STAT 2003)

∑ ∑∈ ∈

=1 2

21)( ,

Rr CcijCR

i j

bBs

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 24

Bipartite Graph Clustering

∈−∈

=2

1

if1 if1

)(RrRr

ifi

i

∈−∈

=2

1

if1 if1

)(CcCc

igi

i

Clustering indicators for rows and columns:

=

2212

2111

,,

,,

CRCR

CRCR

BBBB

B

=

00

TBB

W

=

gf

q

)()(

)()(

),;,(22

12

11

122121 Ws

WsWsWsRRCCJ MMC +=

Substitute and obtain

=

gf

DD

gf

BB

DD

c

rT

c

r λ0

0f,g are determined by

Page 25: icml2004 tutorial on spectral clustering part I

13

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 25

Clustering of Bipartite Graphs

=

vu

vu

BB

T λ0~~0

==

== −−

gDfDDq

vu

zBDDBc

rcr 2/1

2/12/12/1 ,~

Tkk

m

kk vuB λ∑

=

=1

~

Let

We obtain

Solution is SVD:

(Zha et al, 2001, Dhillon, 2001)

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 26

Clustering of Bipartite Graphs

},)(|,{},)(|,{ 2221 riri zifrRzifrR ≥=<=

Recovering row clusters:

zr=zc=0 are dividing points. Relaxation is invariant up to a constant shift.

Algorithm: search for optimal points icut, jcut, let zr=f2(icut), zc= g2(jcut), such that

is minimized. (Zha et al, 2001)

Recovering column clusters:

},)(|,{},)(|,{ 2221 cici zigcCzigcC ≥=<=

),;,( 2121 RRCCJ MMC

Page 26: icml2004 tutorial on spectral clustering part I

13

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 25

Clustering of Bipartite Graphs

=

vu

vu

BB

T λ0~~0

==

== −−

gDfDDq

vu

zBDDBc

rcr 2/1

2/12/12/1 ,~

Tkk

m

kk vuB λ∑

=

=1

~

Let

We obtain

Solution is SVD:

(Zha et al, 2001, Dhillon, 2001)

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 26

Clustering of Bipartite Graphs

},)(|,{},)(|,{ 2221 riri zifrRzifrR ≥=<=

Recovering row clusters:

zr=zc=0 are dividing points. Relaxation is invariant up to a constant shift.

Algorithm: search for optimal points icut, jcut, let zr=f2(icut), zc= g2(jcut), such that

is minimized. (Zha et al, 2001)

Recovering column clusters:

},)(|,{},)(|,{ 2221 cici zigcCzigcC ≥=<=

),;,( 2121 RRCCJ MMC

Page 27: icml2004 tutorial on spectral clustering part I

14

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 27

Clustering of Directed Graphs

• Equivalent to deal with• All spectral methods apply to • For example, web graphs clustered in such

way

TWWW +=~

Min directed edge weights between A & B:

∑∑∈ ∈

+=Ai Bj

jiij wws(A,B) )(

)(∑∑∈ ∈

+=Ai Aj

jiij wws(A,A) Max directed edges within A & B:

W~

(He, Ding, Zha, Simon, ICDM 2001)

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 28

K-way Spectral ClusteringK ≥ 2

Page 28: icml2004 tutorial on spectral clustering part I

14

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 27

Clustering of Directed Graphs

• Equivalent to deal with• All spectral methods apply to • For example, web graphs clustered in such

way

TWWW +=~

Min directed edge weights between A & B:

∑∑∈ ∈

+=Ai Bj

jiij wws(A,B) )(

)(∑∑∈ ∈

+=Ai Aj

jiij wws(A,A) Max directed edges within A & B:

W~

(He, Ding, Zha, Simon, ICDM 2001)

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 28

K-way Spectral ClusteringK ≥ 2

Page 29: icml2004 tutorial on spectral clustering part I

15

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 29

K-way Clustering Objectives

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

∑∑−=

+=

>< k k

kk

lk l

lk

k

lkK ||C

C,GCs||C

,CCs||C

,CCsCCJ )()()(),,(,

1 �Rcut

∑∑−=

+=

>< k k

kk

lk l

lk

k

lkK d

C,GCsd

,CCsd

,CCsCCJ )()()(),,(,

1 �Ncut

∑∑−=

+=

>< k kk

kk

lk ll

lk

kk

lkK CCs

C,GCsCCs,CCs

CCs,CCsCCJ

),()(

),()(

),()(),,(

,1 �MMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 30

K-way Spectral Relaxation

• Prove that the solution lie in the subspace spanned by the first k eigenvectors

• Ratio Cut• Normalized Cut• Min-Max-Cut

Page 30: icml2004 tutorial on spectral clustering part I

15

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 29

K-way Clustering Objectives

• Ratio Cut

• Normalized Cut

• Min-Max-Cut

∑∑−=

+=

>< k k

kk

lk l

lk

k

lkK ||C

C,GCs||C

,CCs||C

,CCsCCJ )()()(),,(,

1 �Rcut

∑∑−=

+=

>< k k

kk

lk l

lk

k

lkK d

C,GCsd

,CCsd

,CCsCCJ )()()(),,(,

1 �Ncut

∑∑−=

+=

>< k kk

kk

lk ll

lk

kk

lkK CCs

C,GCsCCs,CCs

CCs,CCsCCJ

),()(

),()(

),()(),,(

,1 �MMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 30

K-way Spectral Relaxation

• Prove that the solution lie in the subspace spanned by the first k eigenvectors

• Ratio Cut• Normalized Cut• Min-Max-Cut

Page 31: icml2004 tutorial on spectral clustering part I

16

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 31

K-way Spectral Relaxation

Tk

T

T

h

h

h

)11,00,00(

)00,11,00(

)00,00,11(

2

1

mmm

mmm

mmm

mmm

=

=

=Unsigned cluster indicators:

kTk

kTk

T

T

k hhhWDh

hhhWDhhhJ )()(),,(

11

111

−++−= ��Rcut

Re-write:

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ )()(),,(

11

111

−++−= ��Ncut

kTk

kTk

T

T

k WhhhWDh

WhhhWDhhhJ )()(),,(

11

111

−++−= ��MMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 32

K-way Ratio Cut Spectral Relaxation

Unsigned cluster indicators:

))((

)()(),,( 111

XWDX

xWDxxWDxxxJT

kTk

Tk

−=

−++−=

TrRcut ��

Re-write:

By K. Fan’s theorem, optimal solution is eigenvectors: X=(v1,v2, …, vk), (D-W)vk=λkvk

and lower-bound),,(min 11 kk xxJ �� Rcut≤++ λλ

(Chan, Schlag, Zien, 1994)

2/1/)00,11,00( kT

n

k nxk

���=

IXXXWDX TTX

=− tosubject Tr :Optimize ),)((min

),,( 1 kxxX �=

Page 32: icml2004 tutorial on spectral clustering part I

16

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 31

K-way Spectral Relaxation

Tk

T

T

h

h

h

)11,00,00(

)00,11,00(

)00,00,11(

2

1

mmm

mmm

mmm

mmm

=

=

=Unsigned cluster indicators:

kTk

kTk

T

T

k hhhWDh

hhhWDhhhJ )()(),,(

11

111

−++−= ��Rcut

Re-write:

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ )()(),,(

11

111

−++−= ��Ncut

kTk

kTk

T

T

k WhhhWDh

WhhhWDhhhJ )()(),,(

11

111

−++−= ��MMC

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 32

K-way Ratio Cut Spectral Relaxation

Unsigned cluster indicators:

))((

)()(),,( 111

XWDX

xWDxxWDxxxJT

kTk

Tk

−=

−++−=

TrRcut ��

Re-write:

By K. Fan’s theorem, optimal solution is eigenvectors: X=(v1,v2, …, vk), (D-W)vk=λkvk

and lower-bound),,(min 11 kk xxJ �� Rcut≤++ λλ

(Chan, Schlag, Zien, 1994)

2/1/)00,11,00( kT

n

k nxk

���=

IXXXWDX TTX

=− tosubject Tr :Optimize ),)((min

),,( 1 kxxX �=

Page 33: icml2004 tutorial on spectral clustering part I

17

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 33

K-way Normalized Cut Spectral Relaxation

Unsigned cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut ��

Re-write:

By K. Fan’s theorem, optimal solution is eigenvectors: Y=(v1,v2, …, vk),

),,(min 11 kk yyJ ll Ncut≤++ λλ (Gu, et al, 2001)

||||/)00,11,00( 2/12/1k

Tn

k hDDyk

ooo=

IYYYWIY TTY

=− tosubject Tr:Optimize ),)~((min

2/12/1~ −−= WDDW

kkk vvWI λ=− )~(

kkkkk vDuDuuWD 2/1,)( −==− λ

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 34

K-way Min-Max Cut Spectral Relaxation

Unsigned cluster indicators:

kyWyyWy

yyJk

Tk

Tk −++=

MMC ~1

~1),,(

111 ��

Re-write:

Theorem. Optimal solution is by eigenvectors: Y=(v1,v2, …, vk),

),,(min 11

2

kk

yyJkkm

m

MMC≤−++ λλ (Gu, et al, 2001)

||||/ 2/12/1kkk hDhDy =

.0~,),(min >= kTk

TMMCY

yWyIYYYJ tosubject :Optimize

2/12/1~ −−= WDDW

kkk vvW λ= ~

Page 34: icml2004 tutorial on spectral clustering part I

17

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 33

K-way Normalized Cut Spectral Relaxation

Unsigned cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut ��

Re-write:

By K. Fan’s theorem, optimal solution is eigenvectors: Y=(v1,v2, …, vk),

),,(min 11 kk yyJ ll Ncut≤++ λλ (Gu, et al, 2001)

||||/)00,11,00( 2/12/1k

Tn

k hDDyk

ooo=

IYYYWIY TTY

=− tosubject Tr:Optimize ),)~((min

2/12/1~ −−= WDDW

kkk vvWI λ=− )~(

kkkkk vDuDuuWD 2/1,)( −==− λ

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 34

K-way Min-Max Cut Spectral Relaxation

Unsigned cluster indicators:

kyWyyWy

yyJk

Tk

Tk −++=

MMC ~1

~1),,(

111 ��

Re-write:

Theorem. Optimal solution is by eigenvectors: Y=(v1,v2, …, vk),

),,(min 11

2

kk

yyJkkm

m

MMC≤−++ λλ (Gu, et al, 2001)

||||/ 2/12/1kkk hDhDy =

.0~,),(min >= kTk

TMMCY

yWyIYYYJ tosubject :Optimize

2/12/1~ −−= WDDW

kkk vvW λ= ~

Page 35: icml2004 tutorial on spectral clustering part I

18

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 35

K-way Spectral Clustering

• Embedding (similar to PCA subspace approach)– Embed data points in the subspace of the K eigenvectors– Clustering embedded points using another algorithm, such as K-

means (Shi & Malik, Ng et al, Zha, et al)• Recursive 2-way clustering (standard graph partitioning)

– If desired K is not power of 2, how optimcally to choose the next sub-cluster to split? (Ding, et al 2002)

• Both above approach do not use K-way clustering objective functions.

• Refine the obtained clusters using the K-way clustering objective function typically improve the results (Ding et al 2002).

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 36

DNA Gene expression

Effects of feature selection: Select 900 genes out of

4025 genes

Genes

Genes

Tissue sampleTissue sample

Lymphoma Cancer(Alizadeh et al, 2000)

Page 36: icml2004 tutorial on spectral clustering part I

18

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 35

K-way Spectral Clustering

• Embedding (similar to PCA subspace approach)– Embed data points in the subspace of the K eigenvectors– Clustering embedded points using another algorithm, such as K-

means (Shi & Malik, Ng et al, Zha, et al)• Recursive 2-way clustering (standard graph partitioning)

– If desired K is not power of 2, how optimcally to choose the next sub-cluster to split? (Ding, et al 2002)

• Both above approach do not use K-way clustering objective functions.

• Refine the obtained clusters using the K-way clustering objective function typically improve the results (Ding et al 2002).

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 36

DNA Gene expression

Effects of feature selection: Select 900 genes out of

4025 genesG

enesG

enes

Tissue sampleTissue sample

Lymphoma Cancer(Alizadeh et al, 2000)

Page 37: icml2004 tutorial on spectral clustering part I

19

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 37

Lymphoma CancerTissue samples

B cell lymphoma go thru different stages

–3 cancer stages

–3 normal stages

Key question: can we detect them automatically ?

PCA 2D Display

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 38

Page 38: icml2004 tutorial on spectral clustering part I

19

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 37

Lymphoma CancerTissue samples

B cell lymphoma go thru different stages

–3 cancer stages

–3 normal stages

Key question: can we detect them automatically ?

PCA 2D Display

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 38

Page 39: icml2004 tutorial on spectral clustering part I

20

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 39

Brief summary of Part I

• Spectral graph partitioning as origin• Clustering objective functions and solutions• Extensions to bipartite and directed graphs• Characteristics

– Principled approach– Well-motivated objective functions– Clear, un-ambiguous– A framework of rich structures and contents– Everything is proved rigorously (within the relaxation

framework, i.e., using continuous approximation of the discrete variables)

• Above results mostly done by 2001. • More to come in Part II

Page 40: icml2004 tutorial on spectral clustering part I

20

Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 39

Brief summary of Part I

• Spectral graph partitioning as origin• Clustering objective functions and solutions• Extensions to bipartite and directed graphs• Characteristics

– Principled approach– Well-motivated objective functions– Clear, un-ambiguous– A framework of rich structures and contents– Everything is proved rigorously (within the relaxation

framework, i.e., using continuous approximation of the discrete variables)

• Above results mostly done by 2001. • More to come in Part II