Chronos: A Graph Engine for Temporal Graph Analysis
-
Upload
jakeem-adams -
Category
Documents
-
view
66 -
download
2
description
Transcript of Chronos: A Graph Engine for Temporal Graph Analysis
1
Chronos: A Graph Engine for Temporal
Graph Analysis
Wentao Han1,3, Youshan Miao2,3, Kaiwei Li1,3,
Ming Wu3, Fan Yang3, Lidong Zhou3,
Vijayan Prabhakaran3, Wenguang Chen1, Enhong Chen2
Tsinghua University1
University of Science and Technology of China2
Microsoft Research3
2
• Real-world graphs evolve – temporal graphs
• Temporal graph properties bring more insights
2013 20142012
2 01 2 2 01 3 2 01 402468
101214
Year
Use
r Ra
nkin
g
A Social Graph
Temporal Graphs
YEAR
3
2 01 2 2 01 3 2 01 402468
101214
Year
Use
r Ra
nkin
g
A Social Graph
Temporal Graphs
• Real-world graphs evolve – temporal graphs
• Temporal graph properties bring more insights
YEAR
Temporal ranks can tell their differences
2013 20142012
4
2013 20142012
2 01 2 2 01 3 2 01 402468
101214
Year
Use
r Ra
nkin
gYEAR
Temporal Graph AnalysisComputing properties on a series of graph snapshots
Graph snapshot
t0 t2
Static Graph
Analysis
Graph Properties
t1
5
2013 20142012
2 01 2 2 01 3 2 01 402468
101214
Year
Use
r Ra
nkin
g
Temporal Graph Analysis• Existing graph engines: targeting static graph analysis• A possible solution: computing snapshot by snapshot
YEAR
Task 1 Task 2 Task 3
6
Performance Issues
7
Propagation based graph computation model
Vertex Data Array
Edge Array
v2 ...v1 ...... v3 ...
scan
v1 → v2 v1 → v3... ...... v3 → v5 ...
Revisit: Static Graph Analysis
Local computation
Data Propagation
v1
v3
v2
v5
8
Propagation based graph computation model
Vertex Data Array
Edge Array
v2 ...v1 ...... v3 ...
scan
v1 → v2 v1 → v3... ...... v3 → v5 ...
Revisit: Static Graph Analysis
Local computation
Data Propagation
v1
v3
v2
v5
Cache Miss
9
In parallel: Partition graph & computations among CPU cores
Revisit: Static Graph Analysis
v2 ...v1 ...... v3 ...
Core 0 Core 1
scanCore 0 Core 1
v1 → v2 v1 → v3... ...... v3 → v5 ...
Core 0
Core 1
v1
v3
v2
v5
Cross-partition edgeVertex Data Array
Edge Array
Inter-core Communication
10
Temporal Graph Analysis: Snapshot by Snapshot
Computation on multiple graph snapshot – multiple cost
N snapshotsÞ N cache missesÞ N inter-core comm.
v2' ...v1' ...... v3' ...
v2” ...v1” ...... v3” ...
Snapshot2
Snapshot3
Vertex Data Arrays
v2 ...v1 ...... v3 ...
Snapshot1
11
Real-world graph often evolve gradually (Similar snapshots)
Observations
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4
Snapshot 2Snapshot 1 Snapshot 3
' '
''
'
"
"
" "
"
12
Similar propagations across snapshots
Observations
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4'
' '
''
"
""
"
"
Snapshot 2Snapshot 1 Snapshot 3
13
Group propagations by source & target, not by snapshot
Idea
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4
v1
v3
v2
v5
v4'
' '
''
"
""
"
"
Step 1 Step 2 Step 3 Step 4
Step 1 Step 2 Step 3
1 41 3 1 51 2Propagations:
Snapshot 2Snapshot 1 Snapshot 3
14
Chronos: Data Layout
• Place together data for the same vertex across multiple snapshots
fit in a cache line
v2 ...v1 ...... v3 ...
v2' ...v1' ...... v3' ...
v2” ...v1” ...... v3” ...
Snapshot1
Snapshot2
Snapshot3
Vertex Data Arrays (snapshot-by-snapshot)
v2v1 ...... ... v2'v1' ...v2”v1” v3 v3' v3” ...
(with time-locality)Snapshot
1, 2, 3
Vertex Data Array (Chronos)
15
Chronos: Propagation Scheduling• Locality Aware Batch Scheduling (LABS):
• Batching propagating across snapshots
vertex 1 -> vertex 2across snapshots
v2v1 ...... ... v2'v1' ...v2”v1” v3 v3' v3” ...
Vertex Data Array
Edge Array
... v1 → v3 v1'→ v3' v1”→ v3” ...v1 → v2 v1'→ v2' v1”→ v2”
fit in a cache line
scan
vertex 1 -> vertex 3across snapshots
16
Chronos: Propagation Scheduling• Locality Aware Batch Scheduling (LABS):
• Batching propagating across snapshots
v2v1 ...... ... v2'v1' ...v2”v1” v3 v3' v3” ...
Vertex Data Array
Edge Array
... v1 → v3 v1'→ v3' v1”→ v3” ...v1 → v2 v1'→ v2' v1”→ v2”v1 → v2... v1 → v3 v1'→ v3' v1”→ v3” ...v1'→ v2' v1”→ v2”v1 → v2 v1'→ v2' v1”→ v2”... v1 → v3 v1'→ v3' v1”→ v3” ...
fit in a cache line
N propagationsÞ 1 cache misses
Cache Hit
scan
17
Chronos: Propagation Scheduling• Locality Aware Batch Scheduling (LABS):
• Batching propagating across snapshots
v2v1 ...... ... v2'v1' ...v2”v1” v3 v3' v3” ...
Vertex Data Array
Edge Array
... v1 → v3 v1'→ v3' v1”→ v3” ...v1 → v2 v1'→ v2' v1”→ v2”v1 → v2... v1 → v3 v1'→ v3' v1”→ v3” ...v1'→ v2' v1”→ v2”v1 → v2 v1'→ v2' v1”→ v2”... v1 → v3 v1'→ v3' v1”→ v3” ...
Core 0 Core 1
v1 → v2 v1 → v3v1'→ v2' v1”→ v2”... v1'→ v3' v1”→ v3” ...
N propagationsÞ 1 inter-core comm.
access in a batchInter-core Communication
scan
18
LABS: The Key of Chronos
• A graph layout
• Place together vertex/edge data across snapshots
• A scheduling mechanism
• Batch propagations across snapshots
• Efficient
• Reduced cache miss / inter-core comm.
19
Experimental Evaluation
Graph # of Vertices # of Edge Events
Time Span Source
Wiki 1.9 M 40.0 M 6 years Wikipedia graph from KONECT
Twitter 7.5 M 61.6 M 3 months Provided by Twitter
Weibo 27.7 M 4.9 B 3 years Crawled from Sina Weibo
Web 133.6 M 7.2 B 12 months Web graph from DELIS
• Large temporal graphs
• Various graph algorithms• PageRank
• Weakly-connected components (WCC)
• Single-source shortest path (SSSP)
• Maximal independent set (MIS)
• Sparse matrix-vector multiplication (SpMV)
CPU 2.4GHz 16-Core
RAM 128GB
DISK 1TB SSD
• Settings
20
Chronos: Single-Thread Effectiveness
0 8 16 24 32123456789
10
Temporal Graph Analysis on Wiki
WCC
Pagerank
SSSP
BatchSize
Spee
dup
5~9x speedup
Baseline: Snapshot by snapshot
1
21
Chronos: Single-Thread Effectiveness
Reduced cache misses
92%
95%70%
L1d Cache Miss LLC Cache Miss dTLB Miss0
1,0002,0003,0004,0005,0006,0007,0008,0009,000
10,0008,759
649
3,4623,865
584 1,0031,107265 287687 196 160
Cache Miss Reduction
BatchSize=1 BatchSize=4 BatchSize=16 BatchSize=32
Cach
e m
iss #
(in
mill
ions
)
22
Chronos: Multi-Core Performance
More than to 10x faster
0 4 8 12 160
102030405060708090
PageRank on Wiki
Snapshot-by-snapshotChronos
# of Cores
Spee
dup
1
10x
23
Chronos: Multi-Core Performance
2 4 810
100
1000
10000
977.64
2471.64244.2
23.08
58.56105.2
Reduced Inter-Core Communications
No LABSLABS
Number of Cores
Com
mun
icati
on N
um.
(in M
illio
ns)
Reduced inter-core comm.
98%98%
98%
24
More in Paper:
• Graph computation modes
• All benefit from LABS
Push Mode Pull Mode Stream Mode
v1
v2
v3
v4
v5
v2
v1
v6
v7
v8
v3v1
v5v1
v2v6
v2v8
25
More in Paper:
• Incremental graph computation
• Leveraging the previous snapshot’s result
• Computing only the changed part
• Can be enhanced with LABS
26
Conclusion
• Temporal graph analysis• an emerging class of applications
• Chronos • supports analysis of temporal graphs efficiently
• Joint design of data layout and scheduling• Leveraging the temporal similarity of graphs• Exploit data locality esp. in time dimension
27
Thank You!
Questions?
Tsinghua University
University of Science and Technology of
China
MicrosoftResearch
28
BACKUP
• Experiment Environment Details• Real Graphs Similarities over Time• Batch Size Discussion• LABS Locking• LABS with Incremental Computation• LABS on Cluster• Related Work
29
Experiment Setup
CPU 2.4GHz Intel Xeon E5-2665 16-core
RAM 128GB
DISK 1TB SSD (RAID 0 with 372GB1 *3)
Network InfiniBand (DDR, 40Gb/s)
ClusterSize 4
1. SSD model: TOSHIBA MK4001GRZB
30
Temporal Distributions of Graphs• Edges increase gradually
6%13%
19%25%
31%38%
44%50%
56%63%
69%75%
81%88%
94%100%
0%10%20%30%40%50%60%70%80%90%
100%
wiki
Ratio of time range
Num
ber o
f Edg
es
6% 13%
19%
25%
31%
38%
44%
50%
56%
63%
69%
75%
81%
88%
94%
100%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Ratio of time range
Num
ber o
f Edg
es
31
On-disk Temporal Graph
Ci: checkpoint of vi: Edges without time informationaij: j-th activity of vi: Edge changes, e.g., <addE, (v0, v3, w), t2 >
Snapshot Groups
A Snapshot Group
Snapshot Group 0 Snapshot Group 1
Timeindex
......
...... C0 a0,1 ... C1 ...
Vertexindex
a0,t a1,1 a1,t
Edge activities of v0 Edge activities of v1
Edge data for v0 Edge data for v1
32
LABS: In-memory Design
... ...
Vertexindex
Edges of v1
Temporal Edge
(v1)→ v2 110 (v1)→ v3 111 ... ...Edge Array
Vertex Data Array
indicate which snapshots the edge exists in
v2v1 ...v2'v1' v2”v1”...
Vertexindex
Data of v1 Data of v2
v1 → v2 v1'→ v2' v1”→ v2”LogicallyEquals to:
33
Temporal Graph Re-construction• User input time points: 0, 10, 20• Scan the graph activity log [Type, Endpoints, Time]:
addE, v0->v1, 0addE, v0->v2, 15addE, v0->v3, 6delE, v0->v3, 8
• Temporal edges [Endpoints, BitSet]:v0->v1, 111v0->v2, 001
34
Temporal Properties
Chronos System OverviewOn-Disk Temporal Graph
Contains all the graph
evolving activities
Contains only snapshots of
interest
In-Memory Temporal Graph
v2v1 ...v2'v1' v2”v1”...
... ... (v1)→ v2 111 (v1)→ v3 111 ... ...
User input multiple time points
Scanactivities(log)Reconstruct
graph snapshots
35
Greater Batch Size of LABS
• Pros
• Possible to further reduce cache miss / inter-core comm.
• Cons
• Bit wide limit of the instruction: _BitScanForward64
• Less snapshot similarity within a batch
• No more cache miss / inter-core comm. to reduce
• False sharing with locking
36
Compute Snapshot by Snapshot (another way)
Vertex Data Array
v2 ...v1 ...... v3 ...
Þ 3 cache missesÞ 3 inter-core comm.
v2' ...v1' ...... v3' ...
v2” ...v1” ...... v3” ...
Cache Miss
Snapshot1
Snapshot2
Snapshot3
Inter-core communication
Core 0 Core 1
Core 0
Core 1
Core 2
• Snapshot-Parallelism
Partition-Parallelism
Snapshot-Parallelism
LABS-Parallelism
Cache Miss More More Less
Inter-core Communications More No Less
Parallelization -- Summary
37
Snapshot by snapshot LABS
Good partitioning: Num. of intra-partition edge > Num. of inter-partition edge
?
Partition-Parallelism: Computing partitions of the same snapshot in parallelSnapshot-Parallelism: Computing snapshots in parallelLABS-Parallel: Computing LABS-batched partition in parallel
38
LABS Performance on Multi-Core
LABS-Parallelism out-performs
0 4 8 12 160
102030405060708090
PageRank on Wiki
Partition-Parallelism
LABS-Parallelsm
Snapshot-Parallelism
# of Cores
Spee
dup
1
Baseline: Single Core
39
LABS Performance on Cluster
• A small cluster with 4 machines
• Benefit less than in single machine test• The benefit of LABS hided by the high overhead of network
PageRank WCC SSSP10
100
1000
10000 7318 6405
518
20021250
48
Baseline LABS
Tim
e (s
)
Up to 10x speed up
40
Reduced Lock Contentions
• LABS amortizes the lock cost across snapshots• PageRank on the Wiki graph
2 4 8 160
20
40
60
80
100
120
28.85 34.2547.54
96.73
1.32 1.34 1.85 4.02
No LABSLABS
Number of Cores
Lock
tim
e (s
econ
d)
Reduced the time of locking by more than 95%
95% 96%96%
96%
41
LABS with Incremental Computation• Traditional incremental computing
• Incremental computing with LABSSnapshot
0Snapshot
1Snapshot
2Snapshot
3
Snapshot0
Snapshot1
Snapshot2
Snapshot3
Apply LABS(BatchSize = 3)
Incremental Computing
42
Gain of Incremental LABS
1 10 1000%
10%
20%
30%
40%
50%
60%
70%
WCCSSSP
Batch size
Impr
ovem
ent (
%)
Baseline: Traditional Incremental
43
Related work• Existing Graph Engines – static graph engines
• Pregel (SIGMOD’10)• Powergraph (OSDI’12)• GraphLab (VLDB’12)• Grace (ATC’12)• X-stream (SOSP’13)• …
• Active studies on changes and new concepts in evolving graph
• Densification law, “Shrinking diameters” diameter (KDD’05)
• PageRank (CIKM’07), Facebook user activities (EuroSys’09), centrality in
evolving graph (MLG’10), retweet after N friends’ retweets (WWW’11),
Rumors detection (SOMA’10)…