Fast counting of triangles in large networks without counting: Algorithms and laws
description
Transcript of Fast counting of triangles in large networks without counting: Algorithms and laws
CHARALAMPOS E. TSOURAKAKISSCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY
Fast counting of triangles in large networks without
counting:Algorithms and laws
1
ICDM, Dec. '08
C. E. Tsourakakis
Triangle related problems
Given an undirected, simple graph G(V,E) a triangle is a set of three vertices such that any two of them are connected by an edge of the graph.
Related problems Decide if a graph is triangle-free. Count the total number of triangles Δ(G). Count the number of triangles Δ(v) that vertex
v participates in. List the triangles that each vertex v participates in.
2
ICDM, Dec. '08
Generality
Our focus
C. E. Tsourakakis
Why is Triangle Counting important?From the Graph Mining Perspective
ICDM, Dec. '08
3
Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of
friends are friends” [WF94]Other applications include:Hidden Thematic Structure of the Web [EM02]Motif Detection e.g. biological networks
[YPSB05]Web Spam Detection [BPCG08]
A
CB
C. E. Tsourakakis
Outline
ICDM, Dec. '08
4
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Related Work
ICDM, Dec. '08
5
Fast Low space
Time complexity
O(n2.37) O(n3)
Space complexity
O(n2) O(m)=O(n2)
Fast Low space
Time complexity
O(m0.7n1.2+n2+o(1)) e.g. O( n )
Space complexity
O(n2) (eventually) O(m)
2maxd
Dense graphs
S p a r s e g r a p h s
C. E. Tsourakakis
Outline
ICDM, Dec. '08
6
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Theorem [EigenTriangle]
ICDM, Dec. '08
7
Theorem 1
Δ(G) = # triangles in graph G(V,E) = eigenvalues of
adjacency matrix AG
||
1
3)(6V
iiG
||21 ... V
C. E. Tsourakakis
Theorem [EigenTriangleLocal]
ICDM, Dec. '08
8
Theorem 2
Δ(i) = #Δs vertex i participates at. = i-th eigenvector = j-th entry of
2||
1
3)(2 ij
V
jjui
ijuiu
iu
i
Δ(i) = 2
C. E. Tsourakakis
Outline
ICDM, Dec. '08
9
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
EigenTriangle Algorithm (interactively)
ICDM, Dec. '08
10
I want to compute
the number of
triangles!
Use Lanczos to compute the first
two eigenvalues please!
Is the cube of the
second one significantly smaller than the cube of the first?
NOIterate
then!
After some iterations…(hopefully
few!)
Compute the k-th
eigenvalue.Is
much smaller than
?
3|| k
1
1
3k
i
YES!Algorithm
terminates! The estimated # of Δs
is the sum of cubes of λi’s divided by 6!
C. E. Tsourakakis
EigenTriangle Algorithm
ICDM, Dec. '08
11
C. E. Tsourakakis
EigenTriangleLocal Algorithm
ICDM, Dec. '08
12
Why are these two
algorithms efficient on power law networks?
C. E. Tsourakakis
Typical Spectra of Power Law Networks
ICDM, Dec. '08
13
AirportsPolitical blogs
C. E. Tsourakakis
1st Reason :Top Eigenvalues of Power-Law Graphs
ICDM, Dec. '08
14
Very important for us because:Few eigenvalues contribute a lot!Cubes amplify this even more!Lanczos converges fast due to large spectral gaps [GL89]!
C. E. Tsourakakis
1st Reason :Top Eigenvalues of Power-Law Graphs
ICDM, Dec. '08
15
One of the first to observe that the top eigenvalues follow a power-law were Faloutsos, Faloutsos and Faloutsos [FFF99].
Some years later Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] gave an explanation of this fact.
C. E. Tsourakakis
2nd Reason :Bulk of eigenvalues
ICDM, Dec. '08
16
Almost symmetric around 0!
Sum of cubes almost cancels out!
Political Blogs
Omit!
Keep only 3!
3
C. E. Tsourakakis
Outline
ICDM, Dec. '08
17
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
18
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
19
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
Social Networks
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
20
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
Social Networks
Co-authorship network
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
21
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
Social Networks
Co-authorship network
Information Networks
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
22
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
Social Networks
Co-authorship network
Information Networks
Web Graphs
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
23
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
Social Networks
Co-authorship network
Information Networks
Web Graphs
Internet Graphs
C. E. Tsourakakis
Datasets
ICDM, Dec. '08
24
~3.15M nodes~37M edges
Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M
~37M Wikipedia 2006-Nov-04
~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008
(means over 151 timestamps)
C. E. Tsourakakis
Competitor: Node Iterator 25
Node Iterator algorithm For each node, look at its neighbors, then
check how many edges among them.Complexity: O( )We report the results as the speedup vs.
Node Iterator.
2maxnd
ICDM, Dec. '08
C. E. Tsourakakis
Results: #Eigenvalues vs. Speedup26
ICDM, Dec. '08
C. E. Tsourakakis
Results: #Edges vs. Speedup 27
ICDM, Dec. '08
Observe the trend
C. E. Tsourakakis
Some interesting observations28
6.2 typical rank for at least 95%Speedups are between 33.7x and 1159x.
The mean speedup is 250.Notice the increasing speedup as the size of the network grows.
ICDM, Dec. '08
C. E. Tsourakakis
Evaluating the Local Counting Method
ICDM, Dec. '08
29
Triangles node i participatesTria
ngle
s no
de i
parti
cipa
tes
acco
rdin
g to
our
est
imat
ion
C. E. Tsourakakis
#Eigenvalues vs. ρ for three networks
30
ICDM, Dec. '08
2-3 eigenvaluesalmost ideal results!
C. E. Tsourakakis
Outline
ICDM, Dec. '08
31
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Triangle Participation Power Law (TPPL)
ICDM, Dec. '08
32
EPINIONS
δ = #TrianglesCou
nt o
f nod
es p
artic
ipat
ing
in δ
tria
ngle
s
C. E. Tsourakakis
Triangle Participation Power Law (TPPL)
ICDM, Dec. '08
33
HEP_TH (coauthorship)
Flickr
C. E. Tsourakakis
Degree Triangle Power Law (DTPL)
ICDM, Dec. '08
34
EPINIONS
d , all degrees appearing in the graph
Mea
n #Δ
s ov
er a
ll no
des
with
deg
ree
d
C. E. Tsourakakis
Degree Triangle Power Law (DTPL)
ICDM, Dec. '08
35
Flickr
Reuters
C. E. Tsourakakis
Observations on TPPL & DTPL
ICDM, Dec. '08
36
TTPL:Many nodes few triangles
Few nodes many triangles
C. E. Tsourakakis
Observations on TPPL & DTPL
ICDM, Dec. '08
37
DTPL: Power law fits nicely to the Degree-
Triangle plot. Slope is the opposite of the slope of the
degree distribution (slope complementarity).
C. E. Tsourakakis
Outline
ICDM, Dec. '08
38
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Kronecker graphs
ICDM, Dec. '08
39
Kronecker graphs is a model for generating graphs that mimic properties of real-world networks. The basic operation is the Kronecker product([LCKF05]).0 1 1
1 0 1
1 1 0
Initiator graph
Adjacency matrix A[0]
KroneckerProduct
Adjacency matrix A[1]Adjacency matrix A[2]
Repeat k times Adjacency matrix A[k]
C. E. Tsourakakis
Triangles in Kronecker Graphs
ICDM, Dec. '08
40
Theorem[KroneckerTRC ]Let B = A[k] k-th Kronecker product and Δ(GA),
Δ(GΒ) the total number of triangles in GA , GΒ . Then,
the following equality holds: 06 1 , k)Δ(G ) Δ(G k
Ak
B
C. E. Tsourakakis
Outline
ICDM, Dec. '08
41
Related WorkProposed Method
Theorems Algorithms Explaining efficiency
ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions
C. E. Tsourakakis
Conclusions
ICDM, Dec. '08
42
Triangles can be approximated with high accuracy in power law networks by taking a few, constant number of eigenvalues.
The method is easily parallelizable (matrix-vector multiplications only) and converges fast due to large spectral gaps.
New triangle-related power lawsClosed formula for triangles in Kronecker
graphs.
C. E. Tsourakakis
Future Work
ICDM, Dec. '08
43
Import in HADOOP
PEGASUS (Peta-Graph Mining)
On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research.
C. E. Tsourakakis
Christos Faloutsos
Ioannis Koutis
ICDM, Dec. '08
44
Acknowledgements
For the helpful discussions
C. E. Tsourakakis
Maria Tsiarli
ICDM, Dec. '08
45
Acknowledgements
For the PEGASUS logo
C. E. Tsourakakis
46
ICDM, Dec. '08
C. E. Tsourakakis
References
ICDM, Dec. '08
47
[WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)”
[EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web”
[YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast”
C. E. Tsourakakis
References
ICDM, Dec. '08
48
[BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs
[LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication”
[FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology”
C. E. Tsourakakis
References
ICDM, Dec. '08
49
[MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law”
[CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees”
[GL89] Golub, Van Loan: “Matrix Computations”
C. E. Tsourakakis
References
ICDM, Dec. '08
50
For more references, paper and slides:http://www.cs.cmu.edu/~ctsourak
C. E. Tsourakakis
Questions?
ICDM, Dec. '08
51