A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low...
Transcript of A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low...
![Page 1: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/1.jpg)
A space efficient streaming algorithm for triangle counting
using the birthday paradoxMadhav Jha
(Penn State → Sandia National Labs)
Joint work with C. Seshadhri (Sandia National Labs) and Ali Pinar (Sandia National Labs)
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a
wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National
Nuclear Security Administration under contract DE-AC04-94AL85000."
![Page 2: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/2.jpg)
Real-world graphs: An Example
Two authors are connected if they publish at least one paper together
Graph [SNAP] # nodes (n) # edges (m) # triangles (T)
Ca-HepPh 12K 118K 3.35M
2Source:
http://academic.research.microsoft.com
This work!
Co-authorship network
![Page 3: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/3.jpg)
Real-world graphs
1. Graphs are everywhere.
2. Real-world graphs are huge. (Lots of vertices and edges.)
3. Real-world graphs have lots of triangles.
Graph [SNAP] # nodes (n) # edges (m) # triangles (T)
web-BerkStan 0.6M 6M 64M
orkut 3M 22M 627M
Ca-HepPH 12K 118K 3.35M
cit-Patents 3M 16M 7M
3Photo Credit: a) facebook.com b) http://academic.research.microsoft.com
![Page 4: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/4.jpg)
Transitivity: Triangle “density”
• A wedge is a length 2 path. Namely, a “potential” triangle.
• Transitivity = τ = 3 #Triangles/ #Wedges = fraction of closed wedges
A
F
D E
B
C
G
K
L
J
H
4
![Page 5: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/5.jpg)
Transitivity: Triangle “density”
• A wedge is a length 2 path. Namely, a “potential” triangle.
• Transitivity = τ = 3 #Triangles/ #Wedges = fraction of closed wedges
A
F
D E
B
C
G
K
L
J
H
4
![Page 6: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/6.jpg)
Transitivity: Triangle “density”
• A wedge is a length 2 path. Namely, a “potential” triangle.
• Transitivity = τ = 3 #Triangles/ #Wedges = fraction of closed wedges
A
F
D E
B
C
G
K
L
J
H
E-F-G is an open wedgeB-C-D is a closed wedge
4
![Page 7: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/7.jpg)
Transitivity: Triangle “density”
• A wedge is a length 2 path. Namely, a “potential” triangle.
• Transitivity = τ = 3 #Triangles/ #Wedges = fraction of closed wedges
A
F
D E
B
C
G
K
L
J
H
E-F-G is an open wedgeB-C-D is a closed wedge
Graph [SNAP] # nodes (n) # edges (m) # triangles (T) Transitivity
web-BerkStan 0.6M 6M 64M 0.007
orkut 3M 223M 627M 0.041
Ca-HepPH 12K 118K 3.35M 0.39
cit-Patents 3M 16M 7M 0.0674
![Page 8: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/8.jpg)
Transitivity: Triangle “density”
• A wedge is a length 2 path. Namely, a “potential” triangle.
• Transitivity = τ = 3 #Triangles/ #Wedges = fraction of closed wedges
A
F
D E
B
C
G
K
L
J
H
E-F-G is an open wedgeB-C-D is a closed wedge
Graph [SNAP] # nodes (n) # edges (m) # triangles (T) Transitivity
web-BerkStan 0.6M 6M 64M 0.007
orkut 3M 223M 627M 0.041
Ca-HepPH 12K 118K 3.35M 0.39
cit-Patents 3M 16M 7M 0.0674
[Seshadhri Pinar Kolda 2013] gave algorithm for computing transitivitygiven accesss to the entire graph. This algorithm is the starting point of of work.
![Page 9: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/9.jpg)
Why Count Triangles in Graphs?• Useful in Social Science for positing various theses on behavior
[Burt 09], [Coleman 88], [Welles, Devender, Contractor 10], [Portes 88]
• Applied to spam detection [Becchetti Boldi Castillo Gionis 08]
• Relevant for finding topics on WWW [Eckmann Moses 02]
• Proposed as a guide for community structureStated as a core feature for graph models [Vivar Banks 11]
Cornerstone for Block Two-level Erdos-Renyi (BTER) [Seshadhri Pinar Kolda 12]
• Good descriptor of the underlying graph [Durak Seshadhri Pinar Kolda 12]
• Rich set of algorithmic results spanning various models(exact/approximate/deterministic/randomized/…) X (streaming, map-reduce, parallel etc.)
• Very well-studied: [Ahn Guha McGregorGraph 2012], [Durak Pinar Kolda Seshadhri2012], [Pagh Tsourakakis 2012], [Suri Vassilvitskii 2011], [Tsourakakis Kolountzakis Miller 2011], [Chu Cheng 2011], [Yoon Kim 2011][Kolountzakis Miller Peng Tsourakakis 2010], [Avron 2010],[Tsourakakis Drineas Michelakis Koutis Faloutsos 2009], [Tsourakakis Kang Miller Faloutsos 2009], [Latapy 2008], [Becchetti Boldi Castillo Gionis 2008], [Tsourakakis 08], [Buriol Frahling Leonardi Marchetti-Spaccamela Sohler 2006], [Jowhari Ghodsi 2005], [SchankWagner 2005], [Bar-Yossef Kumar Sivakumar 2002], …
9
![Page 10: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/10.jpg)
• Real-world graphs have a natural time-stamp
• Many massive graphs come from modeling interactions in a dynamic system. – People call each other on the phone
– exchange emails
– become part of a tight unit (e.g., co-authoring a paper)
– computers exchange messages
• These interactions manifest as a stream of edges.
• The edges appear with timestamps, or one at a time.
Graph as stream of edges
⋮
10Photo credit: facebook.com
![Page 11: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/11.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
E
F
A
F
Triangles so far: Graph seen so far:
![Page 12: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/12.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
B
C
Triangles so far: Graph seen so far:
![Page 13: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/13.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
D
B
C
Triangles so far: Graph seen so far:
![Page 14: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/14.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
D E
B
C
Triangles so far: Graph seen so far:
![Page 15: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/15.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
D E
B
C
Triangles so far: Graph seen so far:
1
![Page 16: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/16.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
1
![Page 17: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/17.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
1
![Page 18: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/18.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
2
![Page 19: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/19.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
3
![Page 20: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/20.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
3
![Page 21: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/21.jpg)
Graph as stream of edges
7
C
D
B
C
A
E
K
F
B
D
D
E
B
E
A
D
LimitedMemory
E
F
A
F
A
F
D E
B
C
K
Triangles so far: Graph seen so far:
4
![Page 22: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/22.jpg)
Our Contributions : Theoretical
Theorem:
A single-pass streaming algorithm (for arbitrarily ordered edge stream)
which stores only O 𝑛 edges (for most real world graphs),
requires nearly constant time update per edge, and
estimates # triangles and transitivity.
Analysis based on the classic Birthday Paradox.
22
![Page 23: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/23.jpg)
Our Contributions : Practical• Accurate triangles estimates in low space
Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores only 40 K edges (2% of graph) and reports 0.658 B triangles (less than 5% relative error).
• Accurate transitivity estimates
• Realtime tracking
23
Realtime tracking of # triangles on cit-Patents graph (16M edges), storing only 60K edges from the past.
Estimating transitivity on a variety of dataset. (Our algorithm stores only 40 K edges in all these runs.)
![Page 24: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/24.jpg)
Data Structures of the Algorithm
24
edge_reservoir[]
isClosed[]
wedge_reservoir[]
1 0 1
An array to store edges of size 𝑠𝑒
An array to store wedges of size 𝑠𝑤
A Boolean array of size 𝑠𝑤
Input Parameters: 𝑠𝑒 and 𝑠𝑤.
![Page 25: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/25.jpg)
The Algorithm
11
Update edge_reservoir
Update wedge_reservoir
Update isClosed
edge_reservoir[]
isClosed[]
𝒆𝒕
wedge_reservoir[]
1 0 1
⋯⋯
Let 𝑝 be fraction of 1’s in isClosed[]. Output1. Transitivity, est-𝜏𝑡 = 3𝑝2. Triangles,est-𝑇𝑡 = est-𝜏𝑡 × normalizing-factor
![Page 26: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/26.jpg)
The Algorithm
11
Update edge_reservoir
Update wedge_reservoir
Update isClosed
edge_reservoir[]
isClosed[]
𝒆𝒕
wedge_reservoir[]
1 0 1
⋯⋯
Updates to edge_reservoir very rare!
![Page 27: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/27.jpg)
The Algorithm
27
edge_reservoir[]
𝒆𝒕 ⋯⋯
𝑠𝑒
How many wedges are there in a random pool of 𝑠𝑒 edges?
![Page 28: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/28.jpg)
The Birthday Paradox to Rescue
Idea: Fundamentally, a wedge is a collision of two edges!
Birthday Paradox ⇒ 𝑠𝑒 edges give rise to𝑠𝑒2 ⋅ Pr A single collision
28
1
13
3
3
![Page 29: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/29.jpg)
Experimental Results
29
![Page 30: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/30.jpg)
Our Algorithm vs Buriol et al
We fix 𝑠𝑒 = 20K and vary 𝑠𝑤
Space used in our algorithm: 𝑠𝑒 + 𝑠𝑤Space used in Buriol et al: number of edges sampled
Dataset: web-NotreDame Dataset: amazon0505
Note: The results for Buriol et al is consistent with the analysis and experiments of their paper. 30
![Page 31: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/31.jpg)
Accuracy of Transitivity Estimate
Datasets
Tran
siti
vity
𝑠𝑒 = 20K and 𝑠𝑤 = 20K
31
![Page 32: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/32.jpg)
Accuracy of Triangles Estimate
Datasets
RelativeError
5%
Note: web-BerkStan has very low transitivity 0.007. Therefore, relative error is high.
𝑠𝑒 = 20K and 𝑠𝑤 = 20K
8%
32
![Page 33: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/33.jpg)
Convergence of EstimatesDataset: amazon0505
Various 𝑠𝑒:Various 𝑠𝑒:
𝑠𝑤 𝑠𝑤
33
![Page 34: A space efficient streaming algorithm for triangle ... · •Accurate triangles estimates in low space Example: On Orkut graph (200 M edges and 0.627 B triangles), our algorithm stores](https://reader033.fdocuments.us/reader033/viewer/2022042920/5f63f165b13ac71ad66bd8cd/html5/thumbnails/34.jpg)
Future Work
• Can we go below 𝑛 space bound?
• Can we prove a lower bound on the space required by a 1-pass streaming algorithm to estimate triangle counts?
• Can we extend this approach to handle edge deletions ?
• Can we compute (and track) degree-wise clustering coefficient?
34