School of Computer ScienceCarnegie Mellon University
BiG-Align: Fast Bipartite Graph Alignment
Danai Koutra Hanghang Tong David Lubensky
IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA
Danai Koutra (CMU) 2
Can we identify users across social networks?
Same or “similar” users?
Danai Koutra (CMU) 3
More applications?
protein-protein alignment
chemical compound comparison
IR: synonym extraction
link prediction &viral marketing
Optical character
recognition
Structure matching in DB
wikitranslation
Danai Koutra (CMU) 4
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions
Danai Koutra (CMU) 5
Problem Definition
INPUT: A, B
users
groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1
users
groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0
A B
Danai Koutra (CMU) 6
Problem Definition
INPUT: A, B
OUTPUT: P and …(permutationmatrices)
P (users)
A B
users
groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1
users
groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0
A B
Danai Koutra (CMU) 7
users
groups1 1 0 01 1 0 00 0 1 01 0 1 01 1 0 1
users
groups1 1 0 0 00 1 0 1 0 0 0 1 1 10 0 0 1 00 0 0 1 0
A B
Problem Definition
INPUT: A, B
OUTPUT: P and Q(permutationmatrices)
s.t. min || PAQ - B|| F 2
P (users)
A B
Q (groups)
A Busers/groups
permutation of A
permutation of users/groups in
A
Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what?
constraints / relaxations
Danai Koutra (CMU) 8
Problem Definition: constraints
INPUT: A, B
OUTPUT: P, Qcorrespondence
matrices s.t. min || PAQ - B|| F 2
ug
1 1 0 0 … …1 1 0 1
g1 1 0 0 0 … … …0 1 0 1 0
uA B
P (users)
A B
Q (groups)
A B
Danai Koutra (CMU) 9
Problem Definition: constraints
INPUT: A, B
OUTPUT: P, Qcorrespondence
matrices s.t. min || PAQ - B|| F 2
CONSTRAINTS:(a) Pij, Qij = probabilities (not 1-1 mapping)
(b) sparse matrices P and Q (more efficient for large scale graphs)
ug
1 1 0 0 … …1 1 0 1
g1 1 0 0 0 … … …0 1 0 1 0
uA B
P (users)
A B
Q (groups)
A B
Danai Koutra (CMU) 10
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions
Danai Koutra (CMU)
What’s different?
• Focus on bipartite graphs
BiG-Align
vs.
other approaches
11
Danai Koutra (CMU)
What’s different?
• Focus on bipartite graphs• New optimization problem/constraints
BiG-Align
vs.
other approaches
12
Danai Koutra (CMU)
What’s different?
• Focus on bipartite graphs• New optimization problem/constraints
The hope is: the specific graph structure will lead to more
accurate graph alignment
BiG-Align
vs.
other approaches
13
Danai Koutra (CMU)
Why bipartite graphs?(1) ubiquitous –
e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs
14
Danai Koutra (CMU)
Why bipartite graphs?(1) ubiquitous –
e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs
(2) coupled alignment:individual & community-level
nodes
communities
15
Danai Koutra (CMU)
Why bipartite graphs?(1) ubiquitous –
e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs
(2) coupled alignment:individual & community-level
(3) conversion of uni-partite graph to bi-partite --> clustering + (2)
nodes
communities
16
Danai Koutra (CMU)
Why bipartite graphs?(1) ubiquitous –
e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs
(2) coupled alignment:individual & community-level
(3) conversion of unipartite graph to bipartite --> clustering + (2)
(4) general formulation: (a) match clouds of points (point-feature graph)(b) tensors (e.g. time-evolving, or other 3rd dimension)
17
users
emailtime
nodes
communities
Danai Koutra (CMU) 18
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions
Danai Koutra (CMU) 19
BiG-Align: algorithmDETAILS
untilconvergence
alternating, projected gradient descent
Danai Koutra (CMU) 20
BiG-Align: algorithmDETAILS
untilconvergence
Probabilistic
Constraint
Danai Koutra (CMU) 21
BiG-Align: algorithmDETAILS
untilconvergence
Sparsity Constraint
Danai Koutra (CMU) 22
BiG-Align: algorithmDETAILS
untilconvergence
Sparsity Constraint
min f = min||| PAQ – B||F 2 + λΣPij +
μΣQij
Danai Koutra (CMU) 23
RoadMap
• Problem Definition• What’s different?• BiG-Align
Optimizations• Uni-Align• Conclusions
Danai Koutra (CMU) 24
BiG-Align: OptimizationsDETAILS
untilconvergence
alternating, projected gradient descent
Danai Koutra (CMU) 25
BiG-Align: OptimizationsDETAILS
untilconvergence
alternating, projected gradient descent
alternating, projected gradient descent
Danai Koutra (CMU) 26
Optimization 1:Structurally equivalent nodes
DETAILS
• Aggregation to super-nodes
Graph A
Danai Koutra (CMU) 27
BiG-Align: OptimizationsDETAILS
untilconvergence
alternating, projected gradient descent
alternating, projected gradient descent
Danai Koutra (CMU) 28
Optimization 2:Initialization of P and Q
DETAILS
• Why is the initialization important?
global minimumlocal minima
…
Danai Koutra (CMU) 29
• Social networks are structured: the degree distribution is power-law like.
Optimization 2:Initialization of P and Q
DETAILS
ranked nodes
log(
degr
ee)
Danai Koutra (CMU) 30
Optimization 2:Initialization of P and Q
DETAILS• Network-inspired initialization
cluster 1
cluster 2cluster n
cluster 2 cluster n
k
k
user degrees of GA
user degrees in GB
……………
…
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...…
2000
15
00
1000
945
940
800
799
750
740
735
730
… … … 3 2 1
1000800500450449445
…1
P
1-1 matching of top k nodes 1-1 matching of clusters of degrees
cluster 1
degr
ee
rank of node
knee
k
Danai Koutra (CMU) 31
BiG-Align: OptimizationsDETAILS
untilconvergence
alternating, projected gradient descent
alternating, projected gradient descent
Danai Koutra (CMU) 32
Optimization 3:Steps of gradient descent
DETAILS
• Constant step: thrashing or slow convergence
Danai Koutra (CMU) 33
Optimization 3:Steps of gradient descent
DETAILS
• Variable step with line search: strategy for local optimum
ηP = argmin f(ηP) = g1(P,Q,A,B)
ηQ = argmin f(ηQ) = g2(P,Q,A,B)
closed formulas
Danai Koutra (CMU) 34
• Variable step with line search: strategy for local optimum
• BiG-Align-Exact: computes the steps at every iteration
Optimization 3:Steps of gradient descent
DETAILS
ηP = argmin f(ηP) = g1(P,Q,A,B)
ηQ = argmin f(ηQ) = g2(P,Q,A,B)
closed formulas
Danai Koutra (CMU) 35
Optimization 3:Steps of gradient descent
DETAILS
• But
3.104104 2.104
10-4
3. 10-4
5. 10-4st
ep s
ize
(η)
iterations
Slow change in the steps
Danai Koutra (CMU) 36
Optimization 3:Steps of gradient descent
DETAILS
• But
• BiG-Align-Skip: compute η’s every m (=500) iterations
3.104104 2.104
10-4
3. 10-4
5. 10-4st
ep s
ize
(η)
iterations
Slow change in the steps
Danai Koutra (CMU) 37
RoadMap
• Problem Definition• What’s different?• BiG-Align
Experiments• Uni-Align• Conclusions
Danai Koutra (CMU) 38
Experimental Setup• Implementation: Matlab• Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres)
• Setup: random permutations noise level: 0 - 20 %
Ground truth
Simulate real-world applications
Danai Koutra (CMU) 39
State-of-the-art
①Umeyama’s algorithm [Umeyama88]: SVD-based
②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach
③Net-Align [Bayati+09]Belief Propagation
BACKGROUND
Danai Koutra (CMU) 40
State-of-the-art
①Umeyama’s algorithm [Umeyama88]: SVD-based
②NMF-based approach [Ding+08]:Builds on top of Umeyama’s approach
③Net-Align [Bayati+09]Belief Propagation
BACKGROUND
Bi-partite Uni-partite
Danai Koutra (CMU) 41
Big-Align: Accuracy vs. Runtime
marker size related to graph size
Umeyama
NetAlign
NMF-based
BiG-Alignskip
BiG-Alignexact
Danai Koutra (CMU) 42
Big-Align: Accuracy vs. Runtime
Big-Align improves both speed and accuracy.
Umeyama
NetAlign
NMF-based
BiG-Alignskip
BiG-Alignexact
Danai Koutra (CMU) 43
Big-Align: Accuracy w.r.t. noise
BiG-Align-exact
BiG-Align-skip
NMF-based NetAlign-deg
NetAlign-fullUmeyama
Danai Koutra (CMU) 44
Big-Align: Accuracy w.r.t. noise
BiG-Align improves the accuracy for almost all levels of noise.
BiG-Align-exact
BiG-Align-skip
NMF-based NetAlign-deg
NetAlign-fullUmeyama
Danai Koutra (CMU) 45
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions
Danai Koutra (CMU) 46
Algorithm: Uni-AlignDETAILS
n nodes
d features• node degree• clustering coeff•… …
min || PAQ - B||F 2
fixed
P
Danai Koutra (CMU) 47
Algorithm: Uni-AlignDETAILS
n nodes
d features
min || PAQ - B||F 2 P
P = g*(A,B,S,U)= = closed-form solution
SVDA = USVT
O(n.d2)
Danai Koutra (CMU) 48
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align
Experiments• Conclusions
Danai Koutra (CMU) 49
Uni-Align• Dataset: Facebook friendship graph
(64K users)
• Setup: uni-partite bi-partite graph Feature extraction
node degree egonet degree edges in egonet mean degree of node’s neighbors
egonet
Danai Koutra (CMU) 50
Uni-Align: Accuracy vs. Runtime
Uni-Align, followed by Net-Align, is more accurate and faster than other approaches.
NMF-based
NetAlign
Umeyama
Uni-Align
Danai Koutra (CMU) 51
Uni-Align: Runtime
Uni-Align is 2x - 31,700x faster depending on graph size.
Umeyama
Uni-Align
NMF-based
NetAlign-deg
NetAlign-full
Danai Koutra (CMU) 52
RoadMap
• Problem Definition• What’s different?• BiG-Align• Uni-Align• Conclusions
Danai Koutra (CMU) 53
Conclusions
• Formulation: new problem / constraints
Danai Koutra (CMU) 54
Conclusions
• Formulation: new problem / constraints • Algorithms:
BiG-Align: optimized alternating projectedgradient descent
Uni-Align: alignment for uni-partite graphs
Danai Koutra (CMU) 55
Conclusions
• Formulation: new problem / constraints • Algorithms:
BiG-Align: optimized alternating projectedgradient descent
Uni-Align: alignment for uni-partite graphs• Evaluations: more accurate and efficient
Danai Koutra (CMU) 56
Beyond BiG-Align: Multi-way Linkage~
(1) All build upon BiG-Align (2) Led to 7 patents
– ~
S1: Dynamic Graph Linkage
– ~
S2: Community-level Linkage
S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage
Top Related