IBM z/OS Workload Interaction Correlator IBM z/OS Workload ...
Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload ....
Transcript of Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload ....
![Page 1: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/1.jpg)
Graphalytics: Benchmarking Graph-Processing Platforms LDBC TUC Meeting
Barcelona, Spain, March 2015
GRAPHALYTICS A Big Data Benchmark for Graph-Processing Platforms
Mihai Capotã, Yong Guo, Ana Lucia Varbanescu,
Alexandru Iosup,
Jose Larriba Pey, Arnau Prat, Peter Boncz, Hassan Chafi
1
http://bl.ocks.org/mbostock/4062045
GRAPHALYTICS was made possible by a generous contribution from Oracle.
Tim Hegeman, Jorai Rijsdijk,
![Page 2: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/2.jpg)
(TU) Delft – the Netherlands – Europe
pop.: 100,000 pop: 16.5 M
founded 13th century pop: 100,000
founded 1842 pop: 13,000
Barcelona
Delft
![Page 3: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/3.jpg)
The Parallel and Distributed Systems Group at TU Delft
3
Home page
• www.pds.ewi.tudelft.nl Publications
• see PDS publication database at publications.st.ewi.tudelft.nl
Johan Pouwelse
P2P systems File-sharing
Video-on-demand
Henk Sips
HPC systems Multi-cores
P2P systems
Dick Epema
Grids/Clouds P2P systems
Video-on-demand e-Science
Ana Lucia Varbanescu (now UvA)
HPC systems Multi-cores
Big Data/graphs
Alexandru Iosup
Grids/Clouds P2P systems
Big Data/graphs Online gaming
VENI VENI VENI
@large
Winners IEEE TCSC Scale Challenge 2014
![Page 4: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/4.jpg)
Graphs at the Core of Our Society: The LinkedIn Example
4
Feb 2012 100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
A very good resource for matchmaking workforce and prospective employers
Vital for your company’s life, as your Head of HR would tell you
Vital for the prospective employees
![Page 5: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/5.jpg)
Graphs at the Core of Our Society: The LinkedIn Example
5 Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
Apr 2014
![Page 6: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/6.jpg)
The data deluge: large-scale graphs
6
270M MAU 200+ avg followers
>54B edges
1.2B MAU 0.8B DAU 200+ avg followers
>240B edges
300M users
??? edges
![Page 7: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/7.jpg)
The data deluge: large-scale graphs
7
270M MAU 200+ avg followers
>54B edges
1.2B MAU 0.8B DAU 200+ avg followers
>240B edges
company/day: 40-60 posts, 500-700 comments
Oracle 1.2M followers, 132k employees
![Page 8: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/8.jpg)
The data deluge: large-scale graphs
8
270M MAU 200+ avg followers
>54B edges
1.2B MAU 0.8B DAU 200+ avg followers
>240B edges
company/day: 40-60 posts, 500-700 comments
Oracle 1.2M followers, 132k employees Data-intesive workload
10x graph size 100x—1,000x slower
![Page 9: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/9.jpg)
The data deluge: large-scale graphs
9
270M MAU 200+ avg followers
>54B edges
1.2B MAU 0.8B DAU 200+ avg followers
>240B edges
company/day: 40-60 posts, 500-700 comments
Compute-intesive workload more complex analysis ?x slower
Oracle 1.2M followers, 132k employees Data-intesive workload
10x graph size 100x—1,000x slower
![Page 10: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/10.jpg)
The data deluge: large-scale graphs
10
270M MAU 200+ avg followers
>54B edges
1.2B MAU 0.8B DAU 200+ avg followers
>240B edges
company/day: 40-60 posts, 500-700 comments
Compute-intesive workload more complex analysis ?x slower
Dataset-dependent workload unfriendly graphs ??x slower
Oracle 1.2M followers, 132k employees Data-intesive workload
10x graph size 100x—1,000x slower
![Page 11: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/11.jpg)
Graphs at the Core of Our Society: The LinkedIn Example
11
Feb 2012 100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
but fewer visitors (and page views)
3-4 new users every second
Great, if you can process this graph:
opinion mining, hub detection, etc.
Apr 2014 300,000,000 100+ million questions of customer retention,
of (lost) customer influence, of ...
![Page 12: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/12.jpg)
Graphs at the Core of Our Society: The LinkedIn Example
12
Feb 2012 100M Mar 2011, 69M May 2010
Sources: Vincenzo Cosenza, The State of LinkedIn, http://vincos.it/the-state-of-linkedin/ via Christopher Penn, http://www.shiftcomm.com/2014/02/state-linkedin-social-media-dark-horse/
but fewer visitors (and page views)
3-4 new users every second
Great, if you can process this graph:
opinion mining, hub detection, etc.
Apr 2014 300,000,000 100+ million questions of customer retention,
of (lost) customer influence, of ...
Periodic and/or continuous analytics
at full scale
![Page 13: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/13.jpg)
![Page 14: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/14.jpg)
The “sorry, but…” moment
Supporting multiple users 10x number of users ????x slower
![Page 15: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/15.jpg)
Graph Processing @large
16
A Graph Processing Platform
Streaming not considered in this presentation. Interactive processing not considered in this presentation.
Algorithm ETL
Active Storage (filtering, compression,
replication, caching)
Distribution to processing
platform
![Page 16: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/16.jpg)
Graph Processing @large
17
A Graph Processing Platform
Streaming not considered in this presentation. Interactive processing not considered in this presentation.
Algorithm ETL
Active Storage (filtering, compression,
replication, caching)
Distribution to processing
platform Ideally,
N cores/disks Nx faster
Ideally, N cores/disks Nx faster
![Page 17: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/17.jpg)
Graph-Processing Platforms
• Platform: the combined hardware, software, and programming system that is being used to complete a graph processing task
18
Trinity
2
Which to choose? What to tune?
![Page 18: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/18.jpg)
What is the performance of graph-processing platforms?
• Graph500 • Single application (BFS), Single class of synthetic datasets
• Few existing platform-centric comparative studies
• Prove the superiority of a given system, limited set of metrics
• GreenGraph500, GraphBench, XGDBench
• Representativeness, systems covered, metrics, …
19
Metrics Diversity
Graph Diversity
Algorithm Diversity
Graphalytics = comprehensive benchmarking suite for graph processing
across all platforms
![Page 19: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/19.jpg)
Graphalytics = A Challenging Benchmarking Process • Methodological challenges
• Challenge 1. Evaluation process • Challenge 2. Selection and design of performance metrics • Challenge 3. Dataset selection and analysis of coverage • Challenge 4. Algorithm selection and analysis of coverage
• Practical challenges • Challenge 5. Scalability of evaluation, selection processes • Challenge 6. Portability • Challenge 7. Result reporting
Y. Guo, A. L. Varbanescu, A. Iosup, C. Martella, T. L. Willke:
Benchmarking graph-processing platforms: a vision. ICPE 2014: 289-292
![Page 20: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/20.jpg)
Graphalytics = Many Classes of Algorithms • Literature survey of of metrics, datasets, and algorithms
• 10 top research conferences: SIGMOD, VLDB, HPDC … • Key word: graph processing, social network • 2009–2013, 124 articles
21
Class Examples % Graph Statistics Diameter, PageRank 16.1
Graph Traversal BFS, SSSP, DFS 46.3
Connected Component Reachability, BiCC 13.4
Community Detection Clustering, Nearest Neighbor 5.4
Graph Evolution Forest Fire Model, PAM 4.0
Other Sampling, Partitioning 14.8
Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L.
Willke. How Well do Graph-Processing Platforms Perform? An Empirical
Performance Evaluation and Analysis, IPDPS’14.
Future work
![Page 21: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/21.jpg)
Graphalytics = Real & Synthetic Datasets
22
The Game Trace Archive
https://snap.stanford.edu/ http://www.graph500.org/ http://gta.st.ewi.tudelft.nl/
Y. Guo and A. Iosup. The Game
Trace Archive, NETGAMES 2012.
Interaction graphs (possible work)
LDBC
Social Network
Generator
![Page 22: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/22.jpg)
Graphalytics = Advanced Harness
23
Cloud support technically feasible,
methodologically difficult
![Page 23: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/23.jpg)
Graphalytics = Enhanced LDBC Datagen
• A battery of graphs covering a rich set of configurations
• Datagen extensions to • More diverse degree distributions • Clustering coefficient and
assortativity
24
LDBC D3.3.34 http://ldbcouncil.org/sites/default/files/LDBC_D3.3.34.pdf
and Orri Erling et al. The LDBC Social Network Benchmark: Interactive
Workload, SIGMOD’15
Ongoing work
![Page 24: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/24.jpg)
Graphalytics = Advanced Monitoring & Logging System
25
Time Div
erse
met
rics:
CPU
, IO
PS, N
etw
ork,
Mem
ory
use,
…
• Automatic analysis matching the programming model
Ongoing work
A. Iosup et al., Towards Benchmarking IaaS and
PaaS Clouds for Graph Analytics. WBDB 2014
![Page 25: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/25.jpg)
Graphalytics = Choke-Point Analysis
• Choke points are crucial technological challenges that platforms are struggling with
• Examples • Network traffic • Access locality • Skewed execution
• Challenge: Select benchmark workload based on
real-world scenarios, but make sure they cover the important choke points
26
near-future work
![Page 26: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/26.jpg)
Graphalytics = Advanced Software Engineering Process
• All significant modifications to Graphalytics are peer-reviewed by developers • Internal release to LDBC partners (Feb 2015) • Public release, announced first through LDBC (Apr 2015*)
• Jenkins continuous integration server • SonarQube software quality analyzer
27
https://github.com/mihaic/graphalytics/
![Page 27: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/27.jpg)
Graphalytics in Practice
• Missing results = failures of the respective systems
28
5 classes of algorithms
10 platforms tested w prototype implementation
Many more metrics supported
Data ingestion not included here!
6 real-world datasets + 2 synthetic generators
![Page 28: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/28.jpg)
Key Findings So Far
Towards Benchmarking Graph-Processing Platforms
• Performance is function of (Dataset, Algorithm, Platform, Deployment) • Previous performance studies lead to tunnel vision
• Platforms have their specific drawbacks
(crashes, long execution time, tuning, etc.) • Best-performing system depends on stakeholder needs
• Some platforms can scale up reasonably with cluster size
(horizontally) or number of cores (vertically) • Strong vs weak scaling still a challenge—workload scaling tricky • Single-algorithm is not workflow/multi-tenancy
29
Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L.
Willke. How Well do Graph-Processing Platforms Perform? An Empirical
Performance Evaluation and Analysis,IPDPS’14.
![Page 29: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/29.jpg)
Thank you for your attention! Comments? Questions? Suggestions?
Alexandru Iosup [email protected]
30 Contributors
GRAPHALYTICS was made possible by a generous contribution from Oracle.
http://graphalytics.ewi.tudelft.nl https://github.com/mihaic/graphalytics/
PELGA 2015, May 15 http://sites.google.com/site/pelga2015/
![Page 30: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/30.jpg)
A few extra slides
31
![Page 31: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/31.jpg)
Discussion
• How much preprocessing should we allow in the ETL phase?
• How to choose a metric that captures the preprocessing?
32
http://graphalytics.ewi.tudelft.nl
![Page 32: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/32.jpg)
Discussion
• How should we asses the correctness of algorithms that produce approximate results?
• Are sampling algorithms acceptable as trade-off time to benchmark vs benchmarking result?
33
http://graphalytics.ewi.tudelft.nl
![Page 33: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/33.jpg)
Discussion
• How to setup the platforms? Should we allow algorithm-specific platform setups or should we require only one setup to be used for all algorithms?
34
http://graphalytics.ewi.tudelft.nl
![Page 34: Graphalytics: a Big Data Benchmark for Graph-Processing ...€¦ · Compute-intesive workload . more complex analysis ?x slower . Dataset-dependent workload . unfriendly graphs ??x](https://reader034.fdocuments.us/reader034/viewer/2022052002/6015855fe12acc0d5866f59d/html5/thumbnails/34.jpg)
Discussion
• Towards full use cases, full workflows, and inter-operation of big data processing systems
• How to benchmark the entire chain needed to produce useful results, perhaps even the human in the loop?
35
http://graphalytics.ewi.tudelft.nl
A. Iosup, T. Tannenbaum, M. Farrellee, D. H. J. Epema, M. Livny: Inter-
operating grids through Delegated MatchMaking. Scientific Programming
16(2-3): 233-253 (2008)