Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu...

27
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work with Bin Gao, Peking University

Transcript of Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu...

Page 1: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering

Tie-Yan LiuWSM Group, Microsoft Research Asia

2005.11.11

Joint work with Bin Gao, Peking University

Page 2: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

OutlineOutline

• Motivation� What is high-order heterogeneous co-clustering � Why previous methods can not work well on this

problem

• Consistent Bipartite Graph Go-partitioning (CGBC)

• Experimental Evaluation• Conclusions and Future Work

Page 3: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

ClusteringClustering

• Clustering is to group the data objects into clusters, so that objects in the same cluster are similar to each other.

• Spectral Clustering� Models the similarity of data objects by an affinity

graph, and assume that the best clustering result corresponds to the minimal (ratio, normalized or min-max) graph cut.

� It can be proven that the minimum of the normalized cut can be achieved by minimizing this objective function

and the corresponding solution q is the eigenvector associated with the second smallest eigenvalue of the generalized eigenvalue problem .

TTT

T

q eDeqtosubjectDqq

Lqq)1,...,1,1(,0min 0

DL

Page 4: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Co-ClusteringCo-Clustering

• Co-clustering is to group two types of objects into their own clusters simultaneously.

• Bipartite graph partitioning (Dhillon and Zha)� Use bipartite graph to model the inter-relationship bet

ween the two types of objects: the edges are of the same type in the bipartite graph so the graph cut is still easy to define.

� It can be proven that the solutions are the singular vectors associated with the second smallest singular value of the normalized inter-relationship matrix

2/12

2/11

ˆ ADDA

Page 5: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

High-order Heterogeneous Co-Clustering (HHCC)High-order Heterogeneous Co-Clustering (HHCC)• HHCC is to group multiple (≥2) types of objects into clusters

simultaneously.� “Order” is defined as the number of types of objects.

• If we use graph to represent the inter-relationship between data objects, we will have that although the edges in each bipartite graph are of the same type, they are of different type for different bipartite graphs. This is what “heterogeneous” refers to, as compared to spectral clustering and bipartite graph co-clustering.

Page 6: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

HHCC is not a Rare ProblemHHCC is not a Rare Problem

Typical examples Surrounding Text – Web Image – Visual Features

User – Query– Click through

Many other examples

Category – Document – Term; Reader – Newspaper – Article; Passenger – Airplane – Airways; Webpage – Website – Site-group; Article – Magazine – Category; Hardware – Computer – Usage; Software – People – Community

Page 7: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Why HHCC is a new problem?Why HHCC is a new problem?

• Although bipartite graph partitioning is just a trivial extension of the spectral clustering, the extension to HHCC is non-trivial� Since there are different types of edges in the HHCC

problem, the cut of high-order data is difficult to define. It may not be very reasonable to assign some weights to heterogeneous edges so as to make their contributions to the graph cut comparable.

� Simply applying spectral clustering may cause the high-order problem degraded to be a 2-order problem.

Page 8: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

An Example of Weighting Heterogeneous EdgesAn Example of Weighting Heterogeneous Edges

Embeddings produced by spectral clustering

α = 0.01

α = 100

α = 1

no matter how we adjust the weights to balance the

different types of edges, we always can not cluster X into

two groups successfully

Page 9: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

An Example of Weighting Heterogeneous Edges (Cont.)An Example of Weighting Heterogeneous Edges (Cont.)• Mathematical Proof.

z

y

x

S

R

P

z

y

x

SB

BRA

AP

T

T

00

00

00

0

1(

01(

))

zSyBS

yRBzRxAR

xPAyP

T

T

2121

212121

2121

)1(

)1(1(

)1(1(

))

vs

u

RBS

RAP

s

uv

RBS

RAP

T

T

T

)1()(

))1((

)1()(

))1((

2121

2121

2121

2121

vF

FvT )1(

)1(

Including X and Z

Page 10: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Order Degradation Order Degradation

3-Order Heterogeneous graph 2-Order Heterogeneous graph

Page 11: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Our SolutionOur Solution

• We will try to tackle the aforementioned problems by proposing a new solution to HHCC: Consistent Bipartite Graph Co-Partitioning (CGBC).

• Where should we get started?� Star-structured HHCC� The concept of consistency� An SDP-based solution

Page 12: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Why “Star-Structured”?Why “Star-Structured”?

• “Star-Structure” means that in the heterogeneous graph, there is a central type of objects which connects all the other types of objects, and there is no direct connections between any other object types

• “Star-Structured” is the simplest but very common case of HHCC.

Page 13: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Why “Star-Structured”?Why “Star-Structured”?• “Star-Structured” is the simplest but very common case of HHCC.

Surrounding textWeb ImagesVisual features

AuthorConferencePaperKey Word

CustomerShareholderShopSupplierAdvertisement Media

Page 14: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

The Concept of ConsistencyThe Concept of Consistency

• Divide the star-structured HHCC problem into a set of bipartite sub-problems, where each sub-problem only has homogeneous edges.

• Solve each sub problem separately, to avoid the order degradation.

• Add a global constraint to the central type of objects, so as to get a feasible cut for the original problem.

Page 15: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

The Concept of ConsistencyThe Concept of Consistency

1)(

1)1()1(

1)1(

1

)(

0

0

nm

T

nm

y

xq

MDL

eMdiagD

A

AM

A

1)(

2)2()2(

2)2(

2

)(

0

0

tn

T

tn

z

yp

MDL

eMdiagD

B

BM

B

divide this tripartite graph into two bipartite graphs

partition these two graphs simultaneously and consistently

Page 16: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Formulating the Optimization ProblemFormulating the Optimization Problem• Minimize the cuts of the two bipartite graphs, with the

constraints that their partitioning results on the central type of objects are the same.

• Objective Function:(1)

(1)

(2)

(2)

(1)

(2)

min

min

subject to 0, 0

0, 0

0 1

T

T

T

T

T

T

q L q

q D q

p L p

p D p

q D e q

p D e p

1)(

nmy

xq

1)(

tnz

yp

The definition of q and p indicates the consistency between these two graphs: the y in the two embeddings are the same, so we actually force the partitioning on the central type of objects to be the same.

Page 17: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

How to Solve the Optimization Problem #1: Convert it to a QCQP Problem

How to Solve the Optimization Problem #1: Convert it to a QCQP Problem

(1) (2)

(1) (2)

(1)

(2)

min (1 )

subject to 0, 0

0, 0

0 1

T T

T T

T

T

q L q p L p

q D q p D p

q D e q

p D e p

(1)

1 2 (2)

(1)

1 2 (2)

,

,

s ss s

s ss s

L

L

D

D

0 00

00 0

0 00

00 0

1 2

1 2

1

2

min (1 )

subject to 0

0

0, 0 1

T T

T T

T

T

e

e

1 1

2 2

1

2

1 21 2

min

subject to

0

0

1, 0 1

T

T T

T T

T

T

T T

e e

e e

e

e

e e e e

Simplify the original Problem to single-objective programming

Assistant Notations

Sum-of-ratios Quadratic Fractional Programming

Quadratically Constrained Quadratic Programming (QCQP)

Considering that the normalized Rayleigh quotient has been a scalar measure of the graph structure, the combination of two Rayleigh quotients is more reasonable and indicates which graph we should trust more.

Linear combination is only one of the approaches of multi-objective programming. We can surely use other methods which do not have this argument.

Page 18: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

How to Solve the Optimization Problem #2: Convert QCQP to SDPHow to Solve the Optimization Problem #2: Convert QCQP to SDP

,

1

1

2

2

1

1

2

2

0 1 min

1subject to 0

10

0 2 10

2

0 2 10

2

10

T

T T

T T

T T

T T

T

e e

e e

e

e

e

e

0

0

0

0

0

0

0

0

1

1

2

2

1

1

2

2

1

0 min

subject to 0

0

0 20

2

0 20

2

1 1,

0 ,

0

W

T

T

T

T

W

e eW

e eW

eW

e

eW

e

W

eW

e

WE

0

0

0

0

0

0

0

0

0

0 0

0

0

0 2

0W

Semi-definite Programming (SDP)

Page 19: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

The Final Algorithm (CGBC)The Final Algorithm (CGBC)

1. Set the parameters β, θ1 and θ2.2. Given the inter-relation matrices A and B, form the corresponding

diagonal matrices and Laplacian matrices D(1), D(2), L(1) and L(2).3. Extend D(1), D(2), L(1) and L(2) to Π1, Π2, Г1 and Г2, and form Г, such

that the coefficient matrices in the SDP problem can be computed.4. Solve the above SDP problem by a certain iterative algorithm such

as SDPA.5. Extract ω from W and regard it as the embedding vector of the

heterogeneous objects.6. Run the k-means algorithm on ω to obtain the desired partitioning of

the heterogeneous objects.

Page 20: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

CGBC’s Extension to the k-star-structured HHCCCGBC’s Extension to the k-star-structured HHCC

( )1

( )1

( )

1

1

min

subject to 0, 0, 1,..., 1

1, 0 1

T iki i

i T ii i i

T ii i

k

i ii

q L q

q D q

q D e q i k

Page 21: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Experiment on Toy ProblemExperiment on Toy Problem

Relation Matrix A

Relation Matrix BEm

bed

din

g v

alu

es o

f h

ete

rog

en

eou

s o

bje

cts

Totally based on the first graph Y(8:12)

Totally based on the second graph Y(12:8)

A more reasonable cut which is based on the

information from both the first and the second graph

β=

0

0.

2

0

.4

0

.6

0

.8

1

.0

Page 22: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Experiment on Web Image ClusteringExperiment on Web Image Clustering

Page 23: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Embedding of the ClusteringEmbedding of the Clustering

Hill vs Owl Flying vs Map

Page 24: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Average PerformanceAverage Performance

Performance Comparison

Page 25: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

ConclusionsConclusions

• We propose a new problem named high-order heterogeneous co-clustering (HHCC).

• We propose a consistent bipartite graph co-partitioning algorithm to solve the HHCC problem with star-structured inter-relationship.

• Various experiments demonstrate the effectiveness of our proposed algorithm.

Page 26: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

ReferencesReferences

• Bin Gao, Tie-Yan Liu, et al, Consistent Bipartite Graph Co-Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering, in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005), pp41~50.

• Bin Gao, Tie-Yan Liu, Tao Qin, Qian-Sheng Cheng, Wei-Ying Ma, Web Image Clustering by Consistent Utilization of Low-level Features and Surrounding Texts, in Proceedings of ACM Multimedia 2005.

Page 27: Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

2005.11.11 Talk at NTU, Tie-Yan Liu

Contact: [email protected]

http://research.microsoft.com/users/tyliu/