Post on 18-Nov-2014
description
http://bigdata.com http://mapgraph.io
SYSTAP, LLC
GraphsGraph Databases
Graph Analytics on GPUs
SYSTAP™, LLC© 2006-2014 All Rights Reserved
19/19/2014
http://bigdata.com http://mapgraph.io
Graphs• This talk is about recent advances in large scale graph processing
on GPUs. – The motivation is extreme performance.– Everything we do (as a company) is focused on graphs.
• Graph Database• Graph processing
• Common characteristics:– irregular data shape, irregular access patterns, and irregular parallelism.
• A lot of data can be mapped onto graphs– Sparse matrices and graphs are very close data structures– Graphs, as we deal with them, have attributes on vertices and edges.
• A lot of algorithms can be mapped onto graphs– Including many machine learning algorithms.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
2http://www.bigdata.com/blog
http://bigdata.com http://mapgraph.io
SYSTAP, LLC
Graph Database• High performance, Scalable
– 50B edges/node– High level query language– Efficient Graph Traversal– High 9s solution
• Open Source– Subscriptions
GPU Analytics• Extreme Performance
– 5-100x faster than graphlab– 10,000x faster than graphdbs
• DARPA funding• Disruptive technology
– Early adopters– Huge ROIs
• Open Source
Small Business, Founded 2006 100% Employee Owned
• SYSTAP™, LLC• © 2006-2014 All Rights Reserved
39/19/2014
http://bigdata.com http://mapgraph.io
Related “Graph” Technologies
BigdataGraph Query (RDF/SPARQL)
Embedded HASingle Server
Scale-Out
MapGraphGraph Traversal & Mining
Redpoint“Graph Database”
Single GPU
2DCluster
SPARQL
Redpoint repositions existing technology, adding interoperability for blueprints and gremlin.
Scale-Out
MapGraph compares favorably with high end hardware solutions from YARC, Oracle, and SAP, but is open source and uses commodity hardware.
Pair up bigdata and MapGraph
STTR
SYSTAP™, LLC© 2006-2014 All Rights Reserved
49/19/2014
http://bigdata.com http://mapgraph.io
Embedded, Single Server, HA, Scale-out
• RDF/SPARQL• Property graphs
– Blueprints, gremlin, rexter• REST API (NSS)• Extension points
– Stored queries for custom application logic on the server.
– Custom services & indices– Custom functions– Vertex-centric programs
• Embedded Server
• Standalone Server
JVM
Journal
WAR
Journal
SYSTAP™, LLC© 2006-2014 All Rights Reserved
59/19/2014
http://bigdata.com http://mapgraph.io
High Availability• Shared nothing architecture
– Same data on each node– Coordinate only at commit– Transparent load balancing
• Scaling– 50 billion triples or quads– Query throughput scales linearly
• Self healing– Automatic failover– Automatic resync after disconnect– Online single node disaster recovery
• Online Backup– Online snapshots (full backups)– HA Logs (incremental backups)
• Point in time recovery (offline)
HAService
Quorumk=3
size=3
follower
leader
HAService
HAService
SYSTAP™, LLC© 2006-2014 All Rights Reserved
69/19/2014
http://bigdata.com http://mapgraph.io
Embedded, Single Server, HA, Scale-out
Distributed Index Management and Query
RDF Data and SPARQL Query
Managem
ent Functions
Client Service
Registrar
Data Service
Client Service
Client Service
Data Service Data Service Data Service
Data Service Data Service Data Service
Zookeeper
Shard Locator
Transaction Mgr
Load Balancer
Unified API
ApplicationClient
ApplicationClient
ApplicationClient
ApplicationClient
ApplicationClient
Client Service
SPARQL XMLSPARQL JSON
RDF/XMLN-TriplesN-Quads
TurtleTriG
RDF/JSON
SYSTAP™, LLC© 2006-2014 All Rights Reserved
79/19/2014
http://bigdata.com http://mapgraph.io
And now on GPUs
SYSTAP™, LLC© 2006-2014 All Rights Reserved
89/19/2014
http://bigdata.com http://mapgraph.io
Similar models, different problems• Graph query and graph analytics (traversal/mining)
– Related data models– Very different computational requirements
• Many technologies are a bad match or limited solution– Key-value stores (bigtable, Accumulo, Cassandra, HBase)– Map-reduce
• Anti-pattern– Dump all data into “big bucket”
SYSTAP™, LLC© 2006-2014 All Rights Reserved
99/19/2014
http://bigdata.com http://mapgraph.io
Similar models, different problems• Graph query and graph analytics (traversal/mining)
– Related data models– Very different computational requirements
• Many technologies are a bad match or limited solution– Key-value stores (bigtable, Accumulo, Cassandra, HBase)– Map-reduce
• Anti-pattern– Dump all data into “big bucket”
Storage and computation patterns must be correctly matched for high performance.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
109/19/2014
http://bigdata.com http://mapgraph.io
Optimize for the right problem• Graph analytics
– Parallelism – work must be distributed and balanced.– Memory bandwidth – memory, not disk, is the bottleneck– 2D partitioning – O(log(N)) communications pattern (versus O(N*N))
• 1D design looses locality when updating link weights for reverse indices.
• Storage and computation patterns must be correctly matched for high performance.
BFS PR
SYSTAP™, LLC© 2006-2014 All Rights Reserved
119/19/2014
http://bigdata.com http://mapgraph.io
• Graphs are a hard problem• Non-locality• Data dependent parallelism• Memory, PCIe bus and network are
bottlenecks• Recent performance gains driven by
innovations in bottom-up search, data layout, and partitioning.
• GPUs deliver effective parallelism• 10x CPU FLOPS• 10x CPU/RAM bandwidth
• Significant speeds up over CPU• 3 GTEPS on one GPU• 32 GTEPS on 64 GPU cluster
GPUs – A Game Changer for Graph Analytics
1 10 100 1000 10000 1000000
500
1000
1500
2000
2500
3000
3500NVIDIA Tesla C2050 Multicore per socketSequential
Average Traversal Depth
Mill
ion
Trav
erse
d Ed
ges
per
Seco
nd
0
1 12
1
1
2
22
2
1
3
2
3
2
1 2
2
Breadth-First Search on Graphs10x Speedup on GPUs
http://bigdata.com http://mapgraph.io
GPU Hardware Trends
• K40 GPU (today)• 12G RAM/GPU• 288 GB/s bandwidth• PCIe Gen 3
• Pascal GPU (Q1 2016)• 24G RAM/GPU• 1 TB/s bandwidth• Unified memory
across CPU, GPUs
SYSTAP™, LLC© 2006-2014 All Rights Reserved
139/19/2014
http://bigdata.com http://mapgraph.io
Full Bandwidth Access to CPU RAM
SYSTAP™, LLC© 2006-2014 All Rights Reserved
149/19/2014
http://bigdata.com http://mapgraph.io
Architecture shapes performance• The data was a scale-free graph with 2.7M vertices and 5.6M
– MapGraph used a larger version of the graph (24M vertices, 25M edges)• The query was a 5-degree subgraph (depth-limited BFS)• Two main takeaways
– Horizontal scaling for titan is very expensive – wrong abstraction.– GPUs are ridiculously fast.
platform load (s) query (ms) commentstitan 497.00 935 4 node cluster using Cassandraneo4j 608.00 668 single node community editionbigdata 396.00 281 single node (open source)MapGraph 0.08 27 NVIDIA K20 GPU
SYSTAP™, LLC© 2006-2014 All Rights Reserved
159/19/2014
http://bigdata.com http://mapgraph.io
MapGraph
Graph Processing on GPUs
http://MapGraph.io
SYSTAP™, LLC© 2006-2014 All Rights Reserved
169/19/2014
http://bigdata.com http://mapgraph.io
Think Like a Vertex• Simple APIs
pageRank(Message m) { total = m.value(); vertex.val = .15 * .85 + total; for(nbr : out_neighbors) { SendMsg(nbr, vertex.val/num_out_nbrs); }}
• Lots of algorithms– BFS, SSSP, Page Rank, Connected Components, Louvain Modularity,
Jaccard Distance, k-means clustering, Betweenness-Centrality, Personalized Page Rank, Loopy Belief Propagation, Graph search (crisp and approximate), etc.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
179/19/2014
http://bigdata.com http://mapgraph.io
GAS – a Graph-Parallel Abstraction• Graph-Parallel Vertex-Centric API ala GraphLab• “Think like a vertex”
• Gather: collect information about my neighborhood
• Apply: update my value
• Scatter: signal adjacent vertices• Can write all sorts of graph algorithms this way
– BFS, PageRank, Connected Component, Triangle Counting, Max Flow, etc.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
189/19/2014
http://bigdata.com http://mapgraph.io
MapGraph
• High-level graph processing framework• High programmability GPU architecture Optimization techniques CUDA• High performance Comparable to low-level approach
SYSTAP™, LLC© 2006-2014 All Rights Reserved
199/19/2014
http://bigdata.com http://mapgraph.io
MapGraph
• High-level graph processing framework• High programmability GPU architecture Optimization techniques CUDA• High performance Comparable to low-level approach
SYSTAP™, LLC© 2006-2014 All Rights Reserved
209/19/2014
http://bigdata.com http://mapgraph.io
Single GPU MapGraph (BFS)Dataset #vertices #edges Max Degree Milliseconds
Webbase 1,000,005 3,105,536 23 1.2Delaunay 2,097,152 6,291,408 4,700 24.5
Bitcoin 6,297,539 28,143,065 4,075,472 345.3Wiki 3,566,907 45,030,389 7,061 51.0Kron 1,048,576 89,239,674 131,505 47.7
Webbase Delaunay Bitcoin Wiki Kron0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
154.0
513.6
74.8
821.3
1870.9
MTE
PS
• SYSTAP™, LLC• © 2006-2014 All Rights Reserved
219/19/2014
http://bigdata.com http://mapgraph.io
BFS Results : MapGraph vs GraphLab
Webbase Delaunay Bitcoin Wiki Kron 0.10
1.00
10.00
100.00
1,000.00
MapGraph Speedup vs GraphLab (BFS)
GL-2GL-4GL-8GL-12MPG
Spee
dup
SYSTAP™, LLC© 2006-2014 All Rights Reserved
229/19/2014
http://bigdata.com http://mapgraph.io
PageRank : MapGraph vs GraphLab
Webbase Delaunay Bitcoin Wiki Kron 0.10
1.00
10.00
100.00
MapGraph Speedup vs GraphLab (Page Rank)
GL-2GL-4GL-8GL-12MPG
Spee
dup
SYSTAP™, LLC© 2006-2014 All Rights Reserved
239/19/2014
http://bigdata.com http://mapgraph.io
Graph Mining on GPU Clusters
• 2D partitioning (aka vertex cuts)• Minimizes the communication volume.• Batch parallel Gather in row, Scatter in
column.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
249/19/2014
http://bigdata.com http://mapgraph.io
Accelerated Graph Analytics
SYSTAP™, LLC© 2006-2014 All Rights Reserved
259/19/2014
http://bigdata.com http://mapgraph.io
Scale 25 Traversal• Work spans multiple orders of magnitude.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
269/19/2014
http://bigdata.com http://mapgraph.io
Strong Scaling• Speedup on a constant problem size with more GPUs• Problem scale 25
– 2^25 vertices (33,554,432)– 2^26 directed edges (1,073,741,824)
Strong scalingGPUs GTEPS Time (s)
16 14.3 0.07525 16.4 0.06636 18.1 0.05964 22.7 0.047
SYSTAP™, LLC© 2006-2014 All Rights Reserved
279/19/2014
http://bigdata.com http://mapgraph.io
Weak Scaling• Scaling the problem size with more GPUs
Weak scaling
GPUs Scale Vertices Edges Time (s) GTEPS
1 21 2,097,152 67,108,864 0.0254 3
4 23 8,388,608 268,435,456 0.0429 6
16 25 33,554,432 1,073,741,824 0.0715 15
64 27 134,217,728 4,294,967,296 0.1478 29
SYSTAP™, LLC© 2006-2014 All Rights Reserved
289/19/2014
http://bigdata.com http://mapgraph.io
Highlights• For algorithms on large graphs
– Memory is the bottleneck• CPUs quickly saturate the memory bus.• CPU cache thrashing limits scaling for graph traversal.• Continued performance gains for CPUs focus on reducing the #of visited edges to reduce
bandwidth.• Hybrid CPU/GPU architectures offload either small degree vertices (reduce cache thrashing) or
high degree vertices (if the algorithm is FLOPS bound on the CPU, e.g., BC)– Many core is the future. – GPUs are primarily known for their FLOPS, but they have high memory bandwidth and can
deliver effective parallelism on parallel graph problems (with sophisticated kernels).• Scaling to very large graphs on large compute clusters
– Communications bound. • Communication must be constant for perfect scaling
– Hybrid partitioning seeks to reduce #of messages, size of messages, and optimize for asynchronous communications and degree-aware layouts for bottom-up search to reduce memory bandwidth.
SYSTAP™, LLC© 2006-2014 All Rights Reserved
29http://www.bigdata.com/blog
http://bigdata.com http://mapgraph.io
Bryan ThompsonSYSTAP, LLC
bryan@systap.com
http://bigdata.com http://mapgraph.io
SYSTAP™, LLC© 2006-2014 All Rights Reserved
309/19/2014