School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment...

School of Computer ScienceCarnegie Mellon University

BiG-Align: Fast Bipartite Graph Alignment

Danai Koutra Hanghang Tong David Lubensky

IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Danai Koutra (CMU) 2

Can we identify users across social networks?

Same or “similar” users?

More applications?

protein-protein alignment

chemical compound comparison

IR: synonym extraction

link prediction &viral marketing

Optical character

recognition

Structure matching in DB

wikitranslation

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions

Problem Definition

INPUT: A, B

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

Problem Definition

INPUT: A, B

OUTPUT: P and …(permutationmatrices)

P (users)

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1

groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0

Problem Definition

INPUT: A, B

OUTPUT: P and Q(permutationmatrices)

s.t. min || PAQ - B|| F 2

P (users)

Q (groups)

A Busers/groups

permutation of A

permutation of users/groups in

Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what?

constraints / relaxations

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

P (users)

Q (groups)

Problem Definition: constraints

INPUT: A, B

OUTPUT: P, Qcorrespondence

matrices s.t. min || PAQ - B|| F 2

CONSTRAINTS:(a) Pij, Qij = probabilities (not 1-1 mapping)

(b) sparse matrices P and Q (more efficient for large scale graphs)

1 1 0 0 … …1 1 0 1

g1 1 0 0 0 … … …0 1 0 1 0

P (users)

Q (groups)

RoadMap

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs

BiG-Align

other approaches

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

BiG-Align

other approaches

Danai Koutra (CMU)

What’s different?

• Focus on bipartite graphs• New optimization problem/constraints

The hope is: the specific graph structure will lead to more

accurate graph alignment

BiG-Align

other approaches

Danai Koutra (CMU)

Why bipartite graphs?(1) ubiquitous –

e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs

Danai Koutra (CMU)

(2) coupled alignment:individual & community-level

communities

Danai Koutra (CMU)

(3) conversion of uni-partite graph to bi-partite --> clustering + (2)

communities

Danai Koutra (CMU)

(3) conversion of unipartite graph to bipartite --> clustering + (2)

(4) general formulation: (a) match clouds of points (point-feature graph)(b) tensors (e.g. time-evolving, or other 3rd dimension)

emailtime

communities

RoadMap

BiG-Align: algorithmDETAILS

untilconvergence

alternating, projected gradient descent

untilconvergence

Probabilistic

Constraint

untilconvergence

Sparsity Constraint

untilconvergence

Sparsity Constraint

min f = min||| PAQ – B||F 2 + λΣPij +

μΣQij

RoadMap

• Problem Definition• What’s different?• BiG-Align

Optimizations• Uni-Align• Conclusions

BiG-Align: OptimizationsDETAILS

untilconvergence

Optimization 1:Structurally equivalent nodes

DETAILS

• Aggregation to super-nodes

Graph A

untilconvergence

Optimization 2:Initialization of P and Q

DETAILS

• Why is the initialization important?

global minimumlocal minima

• Social networks are structured: the degree distribution is power-law like.

DETAILS

ranked nodes

DETAILS• Network-inspired initialization

cluster 1

cluster 2cluster n

cluster 2 cluster n

user degrees of GA

user degrees in GB

……………

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...…

… … … 3 2 1

1000800500450449445

1-1 matching of top k nodes 1-1 matching of clusters of degrees

cluster 1

rank of node

untilconvergence

Optimization 3:Steps of gradient descent

DETAILS

• Constant step: thrashing or slow convergence

DETAILS

• Variable step with line search: strategy for local optimum

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

• Variable step with line search: strategy for local optimum

• BiG-Align-Exact: computes the steps at every iteration

DETAILS

ηP = argmin f(ηP) = g1(P,Q,A,B)

ηQ = argmin f(ηQ) = g2(P,Q,A,B)

closed formulas

DETAILS

• But

3.104104 2.104

3. 10-4

5. 10-4st

iterations

Slow change in the steps

DETAILS

• But

• BiG-Align-Skip: compute η’s every m (=500) iterations

3.104104 2.104

3. 10-4

5. 10-4st

iterations

Slow change in the steps

RoadMap

• Problem Definition• What’s different?• BiG-Align

Experiments• Uni-Align• Conclusions

Experimental Setup• Implementation: Matlab• Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres)

• Setup: random permutations noise level: 0 - 20 %

Ground truth

Simulate real-world applications

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

State-of-the-art

①Umeyama’s algorithm [Umeyama88]: SVD-based

②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach

③Net-Align [Bayati+09]Belief Propagation

BACKGROUND

Bi-partite Uni-partite

Big-Align: Accuracy vs. Runtime

marker size related to graph size

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Big-Align: Accuracy vs. Runtime

Big-Align improves both speed and accuracy.

Umeyama

NetAlign

NMF-based

BiG-Alignskip

BiG-Alignexact

Big-Align: Accuracy w.r.t. noise

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

Big-Align: Accuracy w.r.t. noise

BiG-Align improves the accuracy for almost all levels of noise.

BiG-Align-exact

BiG-Align-skip

NMF-based NetAlign-deg

NetAlign-fullUmeyama

RoadMap

Algorithm: Uni-AlignDETAILS

n nodes

d features• node degree• clustering coeff•… …

min || PAQ - B||F 2

Algorithm: Uni-AlignDETAILS

n nodes

d features

min || PAQ - B||F 2 P

P = g*(A,B,S,U)= = closed-form solution

SVDA = USVT

O(n.d2)

RoadMap

• Problem Definition• What’s different?• BiG-Align• Uni-Align

Experiments• Conclusions

Uni-Align• Dataset: Facebook friendship graph

(64K users)

• Setup: uni-partite bi-partite graph Feature extraction

node degree egonet degree edges in egonet mean degree of node’s neighbors

egonet

Uni-Align: Accuracy vs. Runtime

Uni-Align, followed by Net-Align, is more accurate and faster than other approaches.

NMF-based

NetAlign

Umeyama

Uni-Align

Uni-Align: Runtime

Uni-Align is 2x - 31,700x faster depending on graph size.

Umeyama

Uni-Align

NMF-based

NetAlign-deg

NetAlign-full

RoadMap

Conclusions

• Formulation: new problem / constraints

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs

Conclusions

• Formulation: new problem / constraints • Algorithms:

BiG-Align: optimized alternating projectedgradient descent

Uni-Align: alignment for uni-partite graphs• Evaluations: more accurate and efficient

Beyond BiG-Align: Multi-way Linkage~

(1) All build upon BiG-Align (2) Led to 7 patents

S1: Dynamic Graph Linkage

S2: Community-level Linkage

S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage

Thank you!

http://www.cs.cmu.edu/~dkoutradanai@cs.cmu.edu

School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment...

Documents

Transcript of School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment...

Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec. 18-22, HongKong.

brain2013 posterweb.eecs.umich.edu/~dkoutra/papers/are_all_brains_wired...Title brain2013_poster.ppt Author Danai Koutra Created Date 9/18/2013 8:55:58 AM

Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.

Perseus: An Interactive Large-Scale Graph Mining and ... · Perseus: An Interactive Large-Scale Graph Mining and Visualization Tool Danai Koutra Carnegie Mellon University Pittsburgh,

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas.

ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,

CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University htong@cs.cmu.edu htong.

PNP: Fast Path Ensemble Method for Movie Design Fast Path Ensemble Method for Movie Design Danai Koutra University of Michigan {dkoutra}@umich.edu Abhilash Dighe University of Michigan

1 Effects of an In-Vehicle Collision Avoidance Warning System on Short and Long-Term Driving Performance Presented by: Josh Essner & Scott Lubensky.

A small body of water a)pondpond b) windingwinding c) branchesbranches d) hanghang.

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Emulsion patterns in the wake of a liquid–liquid phase separation … · Edited by Tom C. Lubensky, University of Pennsylvania, Philadelphia, PA, and approved February 27, 2018

Dept. of Computer Science Rutgers Node Similarity, Graph Similarity and Matching: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers)

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at BigMine16

NetTrans: Neural Cross-Network Transformation · 2020. 11. 1. · NetTrans: Neural Cross-Network Transformation Si Zhang∗, Hanghang Tong∗, Yinglong Xia†, Liang Xiong†, and

Learning Match Quality - Kelley School of Business Match... · 2018. 1. 11. · Learning Match Quality ∗ Arthur Fishman† Bar Ilan University Dmitry Lubensky‡ Indiana University

2007-8-13KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.

Principles of Condensed Matter Physicsfiore/libricorsoptr/Chaikin Lubensky - Principles... · Principles of Condensed Matter Physics Author: P. M. Chaikin and T. C. Lubensky Created

ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash, Hanghang Tong ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,