Download - School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10.

School of Computer ScienceCarnegie Mellon University

BiG-Align: Fast Bipartite Graph Alignment

Danai Koutra Hanghang Tong David Lubensky

IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Danai Koutra (CMU) 2

Can we identify users across social networks?

Same or “similar” users?


More applications?

protein-protein alignment

chemical compound comparison

IR: synonym extraction

link prediction &viral marketing

Optical character

recognition

Structure matching in DB

wikitranslation


RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions


Problem Definition

INPUT: A, B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B


Problem Definition

INPUT: A, B

OUTPUT: P and …(permutationmatrices)

P (users)

A B

users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B


users

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

users

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

A B

Problem Definition

INPUT: A, B

OUTPUT: P and Q(permutationmatrices)

s.t. min || PAQ - B|| F 2

P (users)

A B

Q (groups)

A Busers/groups

permutation of A

permutation of users/groups in

A

Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what?

constraints / relaxations


Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B


Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

CONSTRAINTS:(a) Pij, Qij = probabilities (not 1-1 mapping)

(b) sparse matrices P and Q (more efficient for large scale graphs)

ug

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

uA B

P (users)

A B

Q (groups)

A B


RoadMap


Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs

BiG-Align

vs.

other approaches

11

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

BiG-Align

vs.

other approaches

12

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

The hope is: the specific graph structure will lead to more

accurate graph alignment

BiG-Align

vs.

other approaches

13

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

14

Danai Koutra (CMU)



(2) coupled alignment:individual & community-level

nodes

communities

15

Danai Koutra (CMU)




(3) conversion of uni-partite graph to bi-partite --> clustering + (2)

nodes

communities

16

Danai Koutra (CMU)




(3) conversion of unipartite graph to bipartite --> clustering + (2)

(4) general formulation: (a) match clouds of points (point-feature graph)(b) tensors (e.g. time-evolving, or other 3rd dimension)

17

users

emailtime

nodes

communities


RoadMap



BiG-Align: algorithmDETAILS

untilconvergence

alternating, projected gradient descent



untilconvergence

Probabilistic

Constraint



untilconvergence

Sparsity Constraint



untilconvergence

Sparsity Constraint

min f = min||| PAQ – B||F 2 + λΣPij +

μΣQij


RoadMap

• Problem Definition• What’s different?• BiG-Align

Optimizations• Uni-Align• Conclusions


BiG-Align: OptimizationsDETAILS

untilconvergence




untilconvergence




Optimization 1:Structurally equivalent nodes

DETAILS

• Aggregation to super-nodes

Graph A



untilconvergence




Optimization 2:Initialization of P and Q

DETAILS

• Why is the initialization important?

global minimumlocal minima

…


• Social networks are structured: the degree distribution is power-law like.


DETAILS

ranked nodes

log(

degr

ee)



DETAILS• Network-inspired initialization

cluster 1

cluster 2cluster n

cluster 2 cluster n

k

k

user degrees of GA

user degrees in GB

……………

…

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...…

2000

15

00

1000

945

940

800

799

750

740

735

730

… … … 3 2 1

1000800500450449445

…1

P

1-1 matching of top k nodes 1-1 matching of clusters of degrees

cluster 1

degr

ee

rank of node

knee

k



untilconvergence




Optimization 3:Steps of gradient descent

DETAILS

• Constant step: thrashing or slow convergence



DETAILS

• Variable step with line search: strategy for local optimum

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas


• Variable step with line search: strategy for local optimum

• BiG-Align-Exact: computes the steps at every iteration


DETAILS

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas



DETAILS

• But

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps



DETAILS

• But

• BiG-Align-Skip: compute η’s every m (=500) iterations

3.104104 2.104

10-4

3. 10-4

5. 10-4st

ep s

ize

(η)

iterations

Slow change in the steps


RoadMap

• Problem Definition• What’s different?• BiG-Align

Experiments• Uni-Align• Conclusions


Experimental Setup• Implementation: Matlab• Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres)

• Setup: random permutations noise level: 0 - 20 %

Ground truth

Simulate real-world applications


State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND


State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Bi-partite Uni-partite


Big-Align: Accuracy vs. Runtime

marker size related to graph size

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact


Big-Align: Accuracy vs. Runtime

Big-Align improves both speed and accuracy.

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact


Big-Align: Accuracy w.r.t. noise

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama


Big-Align: Accuracy w.r.t. noise

BiG-Align improves the accuracy for almost all levels of noise.

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama


RoadMap



Algorithm: Uni-AlignDETAILS

n nodes

d features• node degree• clustering coeff•… …

min || PAQ - B||F 2

fixed

P


Algorithm: Uni-AlignDETAILS

n nodes

d features

min || PAQ - B||F 2 P

P = g*(A,B,S,U)= = closed-form solution

SVDA = USVT

O(n.d2)


RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align

Experiments• Conclusions


Uni-Align• Dataset: Facebook friendship graph

(64K users)

• Setup: uni-partite bi-partite graph Feature extraction

node degree egonet degree edges in egonet mean degree of node’s neighbors

egonet


Uni-Align: Accuracy vs. Runtime

Uni-Align, followed by Net-Align, is more accurate and faster than other approaches.

NMF-based

NetAlign

Umeyama

Uni-Align


Uni-Align: Runtime

Uni-Align is 2x - 31,700x faster depending on graph size.

Umeyama

Uni-Align

NMF-based

NetAlign-deg

NetAlign-full


RoadMap



Conclusions

• Formulation: new problem / constraints


Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs


Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs• Evaluations: more accurate and efficient


Beyond BiG-Align: Multi-way Linkage~

(1) All build upon BiG-Align (2) Led to 7 patents

– ~

S1: Dynamic Graph Linkage

– ~

S2: Community-level Linkage

S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage


Thank you!

http://www.cs.cmu.edu/[email protected]

http://www.cs.cmu.edu/~dkoutra