Data-Intensive Distributed Computing - RoegiestData-Intensive Distributed Computing Part 8:...

Data-Intensive Distributed Computing

Part 8: Analyzing Graphs, Redux (1/2)

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

CS 431/631 451/651 (Winter 2019)

Adam RoegiestKira Systems

March 21, 2019

These slides are available at http://roegiest.com/bigdata-2019w/

Graph Algorithms, again?(srsly?)

What makes graphs hard?

Irregular structureFun with data structures!

Irregular data access patternsFun with architectures!

IterationsFun with optimizations!

✗

Characteristics of Graph Algorithms

Parallel graph traversalsLocal computations

Message passing along graph edges

Iterations

n0

n3n2

n1

n7

n6

n5

n4

n9

n8

Visualizing Parallel BFS

Given page x with inlinks t1…tn, where

C(t) is the out-degree of t

is probability of random jump

N is the total number of nodes in the graph

X

t1

t2

tn

…

PageRank: Defined

n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]

n2 n4 n3 n5 n1 n2 n3n4 n5

n2 n4n3 n5n1 n2 n3 n4 n5

n5 [n1, n2, n3]n1 [n2, n4] n2 [n3, n5] n3 [n4] n4 [n5]

Map

Reduce

PageRank in MapReduce

Map

Reduce

PageRank BFS

PR/N d+1

sum min

PageRank vs. BFS




Iterations

reduce

map

HDFS

HDFS

Convergence?

BFS

Convergence?reduce

map

HDFS

HDFS

map

HDFS

PageRank

MapReduce Sucks

Hadoop task startup time

Stragglers

Needless graph shuffling

Checkpointing at each iteration

reduce

HDFS

…

map

HDFS

reduce

map

HDFS

reduce

map

HDFS

Let’s Spark!

reduce

HDFS

…

map

reduce

map

reduce

map

reduce

HDFS

map

reduce

map

reduce

map

Adjacency Lists PageRank Mass



…

join

HDFS

map

join

map

join

map




…

join

join

join

…

HDFS HDFS

Adjacency Lists PageRank vector

PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

join

join

join

…

HDFS HDFS


PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

Cache!

PageRankPerformance

171

80

72

28

020406080100120140160180

30 60

Tim

eperIteration(s)

Numberofmachines

Hadoop

Spark

Source: http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-part-2-amp-camp-2012-standalone-programs.pdf

MapReduce vs. Spark




Iterations

Even faster?

Big Data Processing in a Nutshell

Partition

Replicate

Reduce cross-partition communication

Simple Partitioning Techniques

Hash partitioning

Range partitioning on some underlying linearizationWeb pages: lexicographic sort of domain-reversed URLs

“Best Practices”

Lin and Schatz. (2010) Design Patterns for Efficient Graph Algorithms in MapReduce.

PageRank over webgraph(40m vertices, 1.4b edges)

How much difference does it make?

+18%1.4b

674m




+18%

-15%

1.4b

674m




+18%

-15%

-60%

1.4b

674m

86m




Schimmy Design Pattern

Basic implementation contains two dataflows:Messages (actual computations)Graph structure (“bookkeeping”)

Schimmy: separate the two dataflows, shuffle only the messagesBasic idea: merge join between graph structure and messages


S T

both relations sorted by join key

S1 T1 S2 T2 S3 T3

both relations consistently partitioned and sorted by join key

join

join

join

…

HDFS HDFS


PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

+18%

-15%

-60%

1.4b

674m

86m




+18%

-15%

-60%-69%

1.4b

674m

86m





Hash partitioning

Range partitioning on some underlying linearizationWeb pages: lexicographic sort of domain-reversed URLsWeb pages: lexicographic sort of domain-reversed URLs

Social networks: sort by demographic characteristics

Ugander et al. (2011) The Anatomy of the Facebook Social Graph.

Analysis of 721 million active users (May 2011)

54 countries w/ >1m active users, >50% penetration

Country Structure in Facebook


Hash partitioning


Social networks: sort by demographic characteristicsWeb pages: lexicographic sort of domain-reversed URLs

Social networks: sort by demographic characteristicsGeo data: space-filling curves

Aside: Partitioning Geo-data

Geo-data = regular graph

Space-filling curves: Z-Order Curves

Space-filling curves: Hilbert Curves


Hash partitioning


Social networks: sort by demographic characteristicsGeo data: space-filling curves

But what about graphs in general?

Source: http://www.flickr.com/photos/fusedforces/4324320625/

General-Purpose Graph Partitioning

Graph coarsening

Recursive bisection

Karypis and Kumar. (1998) A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs.

General-Purpose Graph Partitioning

Karypis and Kumar. (1998) A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs.

Graph Coarsening

Chicken-and-Egg

To coarsen the graph you need to identify dense local regions

To identify dense local regions quickly you to need traverse local edgesBut to traverse local edges efficiently you need the local structure!

To efficiently partition the graph, you need to already know what the partitions are!

Industry solution?

Big Data Processing in a Nutshell

Partition

Replicate

Reduce cross-partition communication

Partition

Partition

What’s the fundamental issue?




Iterations

Partition

FastFast

Slow

State-of-the-Art Distributed Graph Algorithms

Fast asynchronous iterations


Periodic synchronization

Source: Wikipedia (Waste container)

Graph Processing Frameworks

join

join

join

…

HDFS HDFS


PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

Cache!

Pregel: Computational Model

Based on Bulk Synchronous Parallel (BSP)Computational units encoded in a directed graphComputation proceeds in a series of supersteps

Message passing architecture

Each vertex, at each superstep:Receives messages directed at it from previous superstep

Executes a user-defined function (modifying state)Emits messages to other vertices (for the next superstep)

Termination:A vertex can choose to deactivate itselfIs “woken up” if new messages received

Computation halts when all vertices are inactive

superstep t

superstep t+1

superstep t+2

Source: Malewicz et al. (2010) Pregel: A System for Large-Scale Graph Processing. SIGMOD.

Pregel: Implementation

Master-Worker architectureVertices are hash partitioned (by default) and assigned to workers

Everything happens in memory

Processing cycle:Master tells all workers to advance a single superstep

Worker delivers messages from previous superstep, executing vertex computationMessages sent asynchronously (in batches)

Worker notifies master of number of active vertices

Fault toleranceCheckpointing

Heartbeat/revert

class ShortestPathVertex : public Vertex<int, int, int> {void Compute(MessageIterator* msgs) {int mindist = IsSource(vertex_id()) ? 0 : INF;for (; !msgs->Done(); msgs->Next())

mindist = min(mindist, msgs->Value());if (mindist < GetValue()) {

*MutableValue() = mindist;OutEdgeIterator iter = GetOutEdgeIterator();for (; !iter.Done(); iter.Next())SendMessageTo(iter.Target(),

mindist + iter.GetValue());}VoteToHalt();

}};


Pregel: SSSP

class PageRankVertex : public Vertex<double, void, double> {public:virtual void Compute(MessageIterator* msgs) {if (superstep() >= 1) {

double sum = 0;for (; !msgs->Done(); msgs->Next())sum += msgs->Value();

*MutableValue() = 0.15 / NumVertices() + 0.85 * sum;}

if (superstep() < 30) {const int64 n = GetOutEdgeIterator().size();SendMessageToAllNeighbors(GetValue() / n);

} else {VoteToHalt();

}}

};


Pregel: PageRank

class MinIntCombiner : public Combiner<int> {virtual void Combine(MessageIterator* msgs) {

int mindist = INF;for (; !msgs->Done(); msgs->Next())mindist = min(mindist, msgs->Value());Output("combined_source", mindist);

}

};


Pregel: Combiners

Giraph Architecture

Master – Application coordinatorSynchronizes supersteps

Assigns partitions to workers before superstep begins

Workers – Computation & messagingHandle I/O – reading and writing the graph

Computation/messaging of assigned partitions

ZooKeeperMaintains global application state

Part 0

Part 1

Part 2

Part 3

Compute / Send

Messages

Wo

rker

1

Compute / Send

Messages

Mas

ter

Wo

rker

0

In-memory graph

Send stats / iterate!

Compute/Iterate

2

Wo

rke

r 1

Wo

rker

0 Part 0

Part 1

Part 2

Part 3

Output format

Part 0

Part 1

Part 2

Part 3

Storing the graph

3

Split 0

Split 1

Split 2

Split 3

Wo

rker

1

Mas

ter

Wo

rker

0

Input format

Load / Send

Graph

Load / Send

Graph

Loading the graph

1

Split 4

Split

Giraph Dataflow

Active Inactive

Vote to Halt

Received Message

Vertex Lifecycle

Giraph Lifecycle

Output

All Vertices Halted?

Input

Compute Superstep

No

Master halted?

No

Yes

Yes

Giraph Lifecycle

Giraph Example

5

15

2

5

5

25

5

5

5

5

1

2

Processor 1

Processor 2

Time

Execution Trace

join

join

join

…

HDFS HDFS


PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

Cache!

State-of-the-Art Distributed Graph Algorithms



Periodic synchronization

Source: Wikipedia (Waste container)

Graph Processing Frameworks

GraphX: Motivation

GraphX = Spark for Graphs

Integration of record-oriented and graph-oriented processing

Extends RDDs to Resilient Distributed Property Graphs

class Graph[VD, ED] {val vertices: VertexRDD[VD]val edges: EdgeRDD[ED]

}

Property Graph: Example

Underneath the Covers

GraphX Operators

val vertices: VertexRDD[VD] val edges: EdgeRDD[ED] val triplets: RDD[EdgeTriplet[VD, ED]]

“collection” view

Transform vertices and edgesmapVerticesmapEdgesmapTriplets

Join vertices with external table

Aggregate messages within local neighborhood

Pregel programs

join

join

join

…

HDFS HDFS


PageRank vector

flatMap

reduceByKey

PageRank vector

flatMap

reduceByKey

Cache!

Source: Wikipedia (Japanese rock garden)

Data-Intensive Distributed Computing - RoegiestData-Intensive Distributed Computing Part 8:...

Documents

Transcript of Data-Intensive Distributed Computing - RoegiestData-Intensive Distributed Computing Part 8:...