Project 2011-2012……. Tasos Pagalos Loukas Kottas Anna Koutra Loukas Koukoutsis Kwstantinos Litsas.
Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL
-
Upload
sessionsevents -
Category
Technology
-
view
1.216 -
download
1
description
Transcript of Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLconf ATL
Carnegie Mellon University
Making Sense of Large Graphs: Summarization and Similarity
Mlconf ‘14, Atlanta, GA
Danai Koutra Computer Science Department
Carnegie Mellon University
[email protected] http://www.cs.cmu.edu/~dkoutra
Making sense of large graphs
Danai Koutra (CMU) 2
Human Connectome
Project
scalable algorithms and models for understanding massive graphs.
>1.25B users!
Understanding Large Graphs
Danai Koutra (CMU) 3
Part 1 S u m m a r i z a t i o n
Danai Koutra (CMU) 4
79,870 email accounts 288,364 emails
Ever tried visualizing a large graph?
Danai Koutra (CMU) 5
79,870 email accounts 288,364 emails
Ever tried visualizing a large graph?
Enron Summary
Danai Koutra (CMU) 7
VoG Top Near Bipartite Core Ski
excursion
organizers participants
“Affair”
Commenters CC’ed
Problem DeCinition
Danai Koutra (CMU) 8
Given: a graph
Find:
≈ important graph
structures.
a succinct summary with possibly overlapping subgraphs
[Koutra, Kang, Vreeken, Faloutsos. SDM’14]
Danai Koutra (CMU) 8
Lady Gaga Fan Club
Main Ideas
Idea 1: Use well-known structures (vocabulary):
Idea 2: Best graph summary è optimal compression (MDL)
Danai Koutra (CMU) 9
Shortest lossless description
Minimum Description Length
Danai Koutra (CMU) 10
BACKGROUND
a1 x + a0
min L(M) + L(D|M)
a10 x10 + a9 x9 + … + a0
errors
{ }
simple & good explanations
# bits for M
# bits for the data using M
~Occam’s razor
Formally: Minimum Graph Description
Danai Koutra (CMU) 11
Given: - a graph G - vocabulary Ω
Find: model M s.t. min L(G,M) = min{ L(M) + L(E) }
Adjacency A Model M Error E
VoG: Overview
Danai Koutra (CMU) 12
argmin
≈
≈?
VoG: Overview
Danai Koutra (CMU) 13
Pick best (with some criterion)
Summary
Q: Which structures to pick?
Danai Koutra (CMU) 14
A: Those that min description length S of G
2|S| combinations
Runtime
Danai Koutra (CMU) 15
VOG is near-linear on # edges of the input graph.
1.25B users!
Understanding a wiki graph
Danai Koutra (CMU) 16
Nodes: wiki editors Edges: co-edited
I don’t see anything! L
Wiki Controversial Article
Danai Koutra (CMU) 17
Stars: admins, bots, heavy users
Bipartite cores: edit wars
Kiev vs. Kyiv vandals vs. admins
VoG vs. other methods
Danai Koutra (CMU) 18
VoG Bounded-‐Error Summariza@on
Mo@f Simplifica@on
Clustering Methods
Cross-‐Associa@ons
Variety of Structures ✔ ✗ ✗ ✗ ✗ Important Structures ✔ ✗ ✗ ✗ ✗ Low Complexity ✔ ✗ ✗ ✔(?) ✔ Visualiza@on ✔ ✔ ✔ ✗ ✗ Graph Summary ✔ ✔ ✔ ✗ ✗
Stars, cliques near-cliques
[Navlakha+’08] [Dunne+’13] [Chakrabarti+’03]
VoG: summary
Danai Koutra (CMU) 19
• Focus on important • possibly-overlapping structures • with known graph-theoretic properties
www.cs.cmu.edu/~dkoutra/SRC/vog.tar
Understanding Large Graphs
Danai Koutra (CMU) 20
Part 2 S i m i l a r i t i e s
friendship graph ≈ wall posts graph?
Danai Koutra (CMU) 21
Behavioral PaOerns 1
VS.
Are the graphs / behaviors similar?
Why graph similarity?
Danai Koutra (CMU) 22
Classification 2
Temporal anomaly detec@on
3
Intrusion detec@on 4
�! �!12 13 14 22 23
Day 1 Day 2 Day 3 Day 4
sim1 sim2 sim3
Problem DeCinition: Graph Similarity
• Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence
• Find: similarity score s [0,1]
€
∈
Danai Koutra (CMU) 23
GA
GB
Obvious solution?
Edge Overlap (EO) # of common edges (normalized or not)
Danai Koutra 24
GA
GB
… but “barbell”…
EO(B10,mB10) == EO(B10,mmB10)
Danai Koutra 25
GA GA
GB GB’
What makes a similarity function good?
26
• Properties: ² Intuitive
Danai Koutra
ProperFes like: “Edge-‐importance”
✗
What makes a similarity function good?
27
• Properties: ² Intuitive
² Scalable
Danai Koutra
ProperFes like: “Weight-‐awareness”
✗
MAIN IDEA: DELTACON
28
SA = SB =
① Find the pairwise node influence, SA & SB. ② Find the similarity between SA & SB.
Danai Koutra (CMU)
DETAILS
How? Using Belief Propagation Attenuating Neighboring Influence for small ε:
29
S = [I+ε 2D−εA]−1 ≈
≈ [I−εA]−1 = I+εA+ε 2A2 +...
1-hop 2-hops …
Note: ε > ε2 > ..., 0<ε<1
INTUITION
Danai Koutra (CMU)
OUR SOLUTION: DELTACON
30
DETAILS
① Find the pairwise node influence, SA & SB. ② Find the similarity between SA & SB.
Danai Koutra (CMU)
sim( ) = 1
1+ sA,ij − sB,ij( )2
i, j∑SA,SB
SA = SB =
“Root” Euclidean Distance
… but O(n2) …
31
f a s t e r ?
O(m1+m2) in the paper J
Danai Koutra (CMU)
32
• Nodes: email accounts of employees • Edges: email exchange
Day 1 Day 2 Day 3 Day 4 Day 5
sim1 sim2 sim3 sim4
Danai Koutra (CMU)
Temporal Anomaly Detection
33
similarity
consecu@ve days Danai Koutra (CMU)
Feb 4: Lay resigns
Temporal Anomaly Detection
Brain-‐Connectivity Graph Clustering
34
• 114 brain graphs ² Nodes: 70 cortical regions ² Edges: connections
• Attributes: gender, IQ, age…
Danai Koutra (CMU)
Brain-‐Connectivity Graph Clustering
35 Danai Koutra (CMU) 35 Danai Koutra (CMU) Danai Koutra (CMU) 35
t-‐test p-‐value = 0.0057
Graph Understanding via …
• … Summarization … ² VoG: to spot the important graph structures
• … Comparison …
² DeltaCon: to find the similarity between aligned networks ² BiG-Align to align bi/uni-partite ² Uni-Align graphs efficiently
36 Danai Koutra (CMU) Danai Koutra (CMU) 36
Thank you!
www.cs.cmu.edu/~dkoutra/pub.htm [email protected]
Danai Koutra (CMU) 37
summarization similarities Understanding