Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú...

24
Graph Analytics using Vertica Relational Database Alekh Jindal - Samuel Madden, Malú Castellanos - Meichun Hsu 1

Transcript of Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú...

Page 1: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Graph Analytics using Vertica Relational Database

Alekh Jindal - Samuel Madden, Malú Castellanos - Meichun Hsu

1

Page 2: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Introduction● High demand for graph analytics

● Popularity of distributed graph computing systems○ Vertex-centric systems: Pregel, Giraph, GraphLab

● Question:

Are traditional relational database systems not good enough for graph analytics?

2

Page 3: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

IntroductionLimitations of distributed graph computing systems:

● Data is initially collected and stored in a relational database● Graph processing is slow for very large graphs

○ Users have to choose a subgraph to run the algorithm ● Preparation might include operations that relational databases are

optimized for.○ Pre-processing or post-processing

● Some graph algorithms compute aggregates over a large neighbourhood ○ Hard to express in vertex-centric systems

3

Page 4: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Goal● Show how vertex-centric graph processing can be translated,

optimized and run on Vertica ○ SSSP, PageRank, Connected Components

● Compare Performance with two vertex-centric distributed systems for graph analysis (Giraph and GraphLab)

● Vertica → Enterprise column-store database management system○ Supports parallel processing

4

Page 5: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Vertex-Centric Model● The user provides a vertex.compute function (UDF):

○ The UDF will be executed at each node.

○ Will update the node’s state.

○ And communicate the changes to the neighbours.

5

Page 6: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Giraph Physical Plan1. Input Superstep: Workers reading

the data, building “Server Data stores”

2. Intermediate step: Run UDF, shuffle messages, wait for everyone, synchronize.

3. Output Superstep: Produce the output.

6

Page 7: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Giraph Logical PlanSame query plan but in relational logic:

1. V join E2. (V join E) join M: messages from

previous superstep3. Run UDF4. Produce new state for vertex (V’)

and messages for the next superstep (M’).

7

Page 8: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Overview● Translation to SQL queries

● Query Optimization

● Query Execution

● Extending Vertica

8

Page 9: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Translation to SQL1) Eliminate the message table

9

Page 10: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Translation to SQL2) Translate vertex compute function

Logical plan

10

Page 11: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Query Optimizations1) Update Vs. Replace

11

Page 12: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Query Optimizations2) Incremental Evaluation

12

Page 13: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Query Optimizations2) Join Elimination

Join Elimination in PageRank

13

Page 14: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Query Execution● Physical Design

○ Encoding and compression, sort orders, multiple table projections

● Join Optimization○ Join directly over compressed data, choose from hash join and merge join

● Query Pipelining○ Avoids materializing intermediate output and repeated access to disk

● Intra-query Parallelism○ Process subgraphs in parallel across cpu cores using GroupBy

14

Page 15: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Query Execution Plan of SSSPDifferent from Giraph execution pipeline:

1. Filter unnecessary tuples as early as possible.

2. Fully pipelines the execution flow.

3. Picks the best join execution strategy.

15

Page 16: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Extending Vertica● Running unmodified vertex programs

○ As table UDFs without translating to relational operators

16

Page 17: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Extending Vertica● Avoiding Intermediate Disk I/O

○ Load and store graph in shared memory, higher memory footprint

17

Page 18: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

ExperimentsSetup:

● Cluster of 4 machines● 48 GB memory● 1.4 TB Disk

Dataset:

18

Page 19: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

ExperimentsData Preparation:

Runtime:

19

Page 20: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

ExperimentsMemory Usage (PageRank):

20

Page 21: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

ExperimentsIn memory Graph Analysis:

21

Page 22: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

ExperimentsMixed Graph and Relational Analysis :

More Complicated Graph Processing:

22

Page 23: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Conclusion● Vertica can be tuned to offer good end-to-end performance on graph

queries (because it is optimized for scans, joins and aggregates).

● Users can trade memory with reduced I/O cost in iterative graph analysis.

● Relational databases can combine graph processing with relational analysis as pre-processing or post-processing steps.

● Features of relational databases can be combined with graph processing systems and it might be a good idea to stitch these systems together.

23

Page 24: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity

Thank you for your attention.

24