PAGE: A Partition Aware Graph Computation Engine
description
Transcript of PAGE: A Partition Aware Graph Computation Engine
![Page 1: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/1.jpg)
PAGE: A Partition Aware Graph Computation Engine
Yingxia Shao, Junjie Yao, Bin Cui, Lin MaEECS, Peking University, China
![Page 2: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/2.jpg)
Agenda
Background• Design of PAGE• Experiment result• Conclusion
2/19
![Page 3: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/3.jpg)
Background
• Prevalent large scale graphs– Social networks– Web graph – …
• Graph computing systems– Pregel (Google)– Giraph (Apache)– GPS (Stanford)– GraphLab (CMU)– …
3/19
![Page 4: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/4.jpg)
Background
• Graph Partitioning– Offline approach
• METIS (Karypis Lab)– Online approach
• Streaming partitioning• Linear Deterministic Greedy(LDG) algorithm (I. Stanton)
4/19
Problem: The existing graph computation systems cannot efficiently integrate the high-quality graph partitioning.
![Page 5: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/5.jpg)
Inefficient partition integrating
Ave
rage
tim
e(s/
itera
tion)
8 0 o v e ra l l co s t
7 0 s y n c re m o te co m m . co s t
6 0 lo ca l co m m . c o s t 5 0
4 0 3 0
2 0 1 0
0
Partitio n S ch em e
5/19
The high-quality graph partitioning leads to the worse overall performance.
The graph partitioning quality is improved from left to right.
Running PageRank on Giraph with six different graph partition qualities.
![Page 6: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/6.jpg)
Motivation of the PAGE
Call for a novel graph computation engine to efficiently integrate graph partitioning with various qualities.
A Novel Graph Computation Engine
High-Quality Graph PartitionLow-Quality Graph Partition
6/19
![Page 7: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/7.jpg)
Agenda
• BackgroundDesign of PAGE• Experiment result• Conclusion
7/19
![Page 8: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/8.jpg)
Message processor
8/19
Message Process Unit
msg.
msg.
msg.
Message Block
msg.
msg.
msg.
msg.
msg.…
Header
msg.
msg.
msg.
msg.
msg.
Message Process Unit
Message Process Unit
Message Process Unit
Message Process Unit
Message Processor
![Page 9: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/9.jpg)
Inefficient partition integratingA
vera
ge ti
me(
s/ite
ratio
n) 8 0 o v e ra l l co s t
7 0 s y n c re m o te co m m . c o s t
6 0 lo c a l co m m . c o s t 5 0
4 0 3 0
2 0 1 0
0
Pa rtitio n S ch em e
9/19
The local message processing cost dominates the overall cost.
The existing systems cannot provide enough local message processor.
Running PageRank on Giraph with six different graph partition qualities.
![Page 10: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/10.jpg)
Overview of the PAGE
PAGE worker1
Partition Aware Comm.
PAGE worker2
Partition Aware Comm.
PAGE worker3
Partition Aware Comm.
Distributed In-Memory Partitioned Graph
Computation Computation Computation
PAGE applies adaptively tuning mechanism and new cooperation methods.10/19
![Page 11: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/11.jpg)
New Designed PAGE Worker
11/19
Partition Aware
Monitor
DCCM
Communication
Dual Concurrent MP
Sender Receiver
Computation
Remote MP
Local MP
![Page 12: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/12.jpg)
Dual Concurrent MP
Remote MP
Local MP
Dual Concurrent Message Processor
• First type concurrency– A remote MP and a local MP are
embedded• Second type concurrency
– A set of message process units are contained by each message processor
• The concurrency is automatically determined by the system itself.
12/19
![Page 13: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/13.jpg)
Dynamic Concurrency Control Model
• The DCCM determines the proper parameters, such as nmp , nmpl , nmpr .
• The DCCM is built on top of two heuristic rules.– Ability Lower-bound.– Workload Balance Ratio.
• Monitor– Tracks the necessary metrics
Partition Aware
Monitor
DCCM
13/19
![Page 14: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/14.jpg)
Agenda
• Background• Design of PAGEExperiment result• Conclusion
14/19
![Page 15: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/15.jpg)
Environment & Datasets
• Experiment Environment– a 24 nodes cluster
• Dataset: the uk-2007-05-u.– Undirected– Vertex #: 105,153,952 – Edge #: 6,603,753,128
• Benchmark: PageRank
Scheme Edge Cut
Random 98.52%
LDG1 82.88%
LDG2 75.69%
LDG3 66.37%
LDG4 56.34%
METIS 3.48%
Partition qualities
15/19
Balance factor: < 1%.
![Page 16: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/16.jpg)
Partition Awareness in PAGE A
vera
ge ti
me(
s/ite
ratio
n) 3 5
3 0
2 5
2 0 o v erra l l co s t s y n c rem o te co m m . co s t
1 5 s y n c lo ca l co m m . co s t
1 0
5
0
Partitio n S ch em e A
vera
ge ti
me(
s/ite
ratio
n) 7 0
o v era ll co s t 6 0
sy n c rem o t e co m m . co s t
5 0 sy n c lo ca l co m m . co s t
4 0
3 0
2 0
1 0
0
Partitio n S ch e m e
PAGE Giraph
16/19
![Page 17: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/17.jpg)
Compare with the naive solution
Ave
rage
tim
e(s/
itera
tion)
80 G irap h
70 G irap h-G P S o p
P A G E
60
50
40
30
20
10
0
Partition S chem e
17/19
* The Giraph-GPSop is the naive solution.
![Page 18: PAGE: A Partition Aware Graph Computation Engine](https://reader035.fdocuments.us/reader035/viewer/2022062501/56816637550346895dd9a5c2/html5/thumbnails/18.jpg)
Contribution & Conclusion
• We identify the problem of partition unaware inefficiency.
• We set up a new partition aware graph computation engine, PAGE.
• We design a Dynamic Concurrency Control Model based on several heuristic rules to better profile the characters of graph partition.
• At last, we demonstrate PAGE’s robustness and efficiency on different graph partition qualities.
18/19