School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment...

Post on 14-Dec-2015

216 views 1 download

Tags:

Transcript of School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment...

School of Computer ScienceCarnegie Mellon University

BiG-Align: Fast Bipartite Graph Alignment

Danai Koutra Hanghang Tong David Lubensky

IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Danai Koutra (CMU) 2

Can we identify users across social networks?

Same or “similar” users?

Danai Koutra (CMU) 3

More applications?

protein-protein alignment

chemical compound comparison

IR: synonym extraction

link prediction &viral marketing

Optical character

recognition

Structure matching in DB

wikitranslation

Danai Koutra (CMU) 4

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Danai Koutra (CMU) 5

Problem Definition

INPUT: A, B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Danai Koutra (CMU) 6

Problem Definition

INPUT: A, B

OUTPUT: P and …(permutationmatrices)

P (users)

A B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Danai Koutra (CMU) 7

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Problem Definition

INPUT: A, B

OUTPUT: P and Q(permutationmatrices)

s.t. min || PAQ - B|| F 2

P (users)

A B

Q (groups)

A Busers/groups

permutation of A

permutation of users/groups in

A

Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what?

constraints / relaxations

Danai Koutra (CMU) 8

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B

Danai Koutra (CMU) 9

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

CONSTRAINTS:(a) Pij, Qij = probabilities (not 1-1 mapping)

(b) sparse matrices P and Q (more efficient for large scale graphs)

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B

Danai Koutra (CMU) 10

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs

BiG-Align

vs.

other approaches

11

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

BiG-Align

vs.

other approaches

12

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

The hope is: the specific graph structure will lead to more

accurate graph alignment

BiG-Align

vs.

other approaches

13

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

14

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

nodes

communities

15

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

(3) conversion of uni-partite graph to bi-partite --> clustering + (2)

nodes

communities

16

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

(2) coupled alignment:individual & community-level

(3) conversion of unipartite graph to bipartite --> clustering + (2)

(4) general formulation: (a) match clouds of points (point-feature graph)(b) tensors (e.g. time-evolving, or other 3rd dimension)

17

users

emailtime

nodes

communities

Danai Koutra (CMU) 18

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Danai Koutra (CMU) 19

BiG-Align: algorithmDETAILS

untilconvergence

alternating, projected gradient descent

Danai Koutra (CMU) 20

BiG-Align: algorithmDETAILS

untilconvergence

Probabilistic

Constraint

Danai Koutra (CMU) 21

BiG-Align: algorithmDETAILS

untilconvergence

Sparsity Constraint

Danai Koutra (CMU) 22

BiG-Align: algorithmDETAILS

untilconvergence

Sparsity Constraint

min f = min||| PAQ – B||F 2 + λΣPij +

μΣQij

Danai Koutra (CMU) 23

RoadMap

• Problem Definition• What’s different?• BiG-Align

Optimizations• Uni-Align• Conclusions

Danai Koutra (CMU) 24

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

Danai Koutra (CMU) 25

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Danai Koutra (CMU) 26

Optimization 1:Structurally equivalent nodes

DETAILS

• Aggregation to super-nodes

Graph A

Danai Koutra (CMU) 27

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Danai Koutra (CMU) 28

Optimization 2:Initialization of P and Q

DETAILS

• Why is the initialization important?

global minimumlocal minima

Danai Koutra (CMU) 29

• Social networks are structured: the degree distribution is power-law like.

Optimization 2:Initialization of P and Q

DETAILS

ranked nodes

log(

degr

ee)

Danai Koutra (CMU) 30

Optimization 2:Initialization of P and Q

DETAILS• Network-inspired initialization

cluster 1

cluster 2cluster n

cluster 2 cluster n

k

k

user degrees of GA

user degrees in GB

……………

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...…

2000

15

00

1000

945

940

800

799

750

740

735

730

… … … 3 2 1

1000800500450449445

…1

P

1-1 matching of top k nodes 1-1 matching of clusters of degrees

cluster 1

degr

ee

rank of node

knee

k

Danai Koutra (CMU) 31

BiG-Align: OptimizationsDETAILS

untilconvergence

alternating, projected gradient descent

alternating, projected gradient descent

Danai Koutra (CMU) 32

Optimization 3:Steps of gradient descent

DETAILS

• Constant step: thrashing or slow convergence

Danai Koutra (CMU) 33

Optimization 3:Steps of gradient descent

DETAILS

• Variable step with line search: strategy for local optimum

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

Danai Koutra (CMU) 34

• Variable step with line search: strategy for local optimum

• BiG-Align-Exact: computes the steps at every iteration

Optimization 3:Steps of gradient descent

DETAILS

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

Danai Koutra (CMU) 35

Optimization 3:Steps of gradient descent

DETAILS

• But

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps

Danai Koutra (CMU) 36

Optimization 3:Steps of gradient descent

DETAILS

• But

• BiG-Align-Skip: compute η’s every m (=500) iterations

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps

Danai Koutra (CMU) 37

RoadMap

• Problem Definition• What’s different?• BiG-Align

Experiments• Uni-Align• Conclusions

Danai Koutra (CMU) 38

Experimental Setup• Implementation: Matlab• Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres)

• Setup: random permutations noise level: 0 - 20 %

Ground truth

Simulate real-world applications

Danai Koutra (CMU) 39

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Danai Koutra (CMU) 40

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Bi-partite Uni-partite

Danai Koutra (CMU) 41

Big-Align: Accuracy vs. Runtime

marker size related to graph size

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Danai Koutra (CMU) 42

Big-Align: Accuracy vs. Runtime

Big-Align improves both speed and accuracy.

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Danai Koutra (CMU) 43

Big-Align: Accuracy w.r.t. noise

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

Danai Koutra (CMU) 44

Big-Align: Accuracy w.r.t. noise

BiG-Align improves the accuracy for almost all levels of noise.

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

Danai Koutra (CMU) 45

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Danai Koutra (CMU) 46

Algorithm: Uni-AlignDETAILS

n nodes

d features• node degree• clustering coeff•… …

min || PAQ - B||F 2

fixed

P

Danai Koutra (CMU) 47

Algorithm: Uni-AlignDETAILS

n nodes

d features

min || PAQ - B||F 2 P

P = g*(A,B,S,U)= = closed-form solution

SVDA = USVT

O(n.d2)

Danai Koutra (CMU) 48

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align

Experiments• Conclusions

Danai Koutra (CMU) 49

Uni-Align• Dataset: Facebook friendship graph

(64K users)

• Setup: uni-partite bi-partite graph Feature extraction

node degree egonet degree edges in egonet mean degree of node’s neighbors

egonet

Danai Koutra (CMU) 50

Uni-Align: Accuracy vs. Runtime

Uni-Align, followed by Net-Align, is more accurate and faster than other approaches.

NMF-based

NetAlign

Umeyama

Uni-Align

Danai Koutra (CMU) 51

Uni-Align: Runtime

Uni-Align is 2x - 31,700x faster depending on graph size.

Umeyama

Uni-Align

NMF-based

NetAlign-deg

NetAlign-full

Danai Koutra (CMU) 52

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Danai Koutra (CMU) 53

Conclusions

• Formulation: new problem / constraints

Danai Koutra (CMU) 54

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs

Danai Koutra (CMU) 55

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs• Evaluations: more accurate and efficient

Danai Koutra (CMU) 56

Beyond BiG-Align: Multi-way Linkage~

(1) All build upon BiG-Align (2) Led to 7 patents

– ~

S1: Dynamic Graph Linkage

– ~

S2: Community-level Linkage

S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage

Danai Koutra (CMU) 57

Thank you!

http://www.cs.cmu.edu/~dkoutradanai@cs.cmu.edu