Distributed Graph Processing

55
Distributed Graph Processing Abhishek Verma CS425

description

Distributed Graph Processing. Abhishek Verma CS425. Guess the Graph?. Guess the Graph?. Human Brain. The Internet. LinkedIn Social Network. Graph Problems. Finding shortest paths Routing Internet traffic and UPS trucks Finding minimum spanning trees Google Fiber Finding Max Flow - PowerPoint PPT Presentation

Transcript of Distributed Graph Processing

PowerPoint Presentation

Distributed Graph ProcessingAbhishek VermaCS425Guess the Graph?

2Guess the graph2Guess the Graph?

The InternetHuman BrainLinkedIn Social Network3Guess the graph3Graph ProblemsFinding shortest pathsRouting Internet traffic and UPS trucksFinding minimum spanning treesGoogle FiberFinding Max FlowSchedulingBipartite matchingDating websitesIdentify special nodes and communitiesSpread of diseases, terrorists44Routing internet traffic, UPS trucksGoogle FiberSchedulingSpread of disease, influencersDating website

OutlineSingle Source Shortest Path (SSSP)SSSP MapReduce ImplementationPregelComputation ModelSSSP ExampleWriting a Pregel programSystem DesignMapReduce vs Pregel for Graph processing5Single Source Shortest Path (SSSP)ProblemFind shortest path from a source node to all target nodes

Solution for single processor machineDijkstras algorithm6Example: SSSP Dijkstras Algorithm0523219746ABCDE107Example: SSSP Dijkstras Algorithm010510523219746ABCDE8Example: SSSP Dijkstras Algorithm085147523219746ABCDE109Example: SSSP Dijkstras Algorithm085137523219746ABCDE1010Example: SSSP Dijkstras Algorithm08597523219746ABCDE1011Example: SSSP Dijkstras Algorithm08597523219746ABCDE1012Example 2

13Single Source Shortest Path (SSSP)ProblemFind shortest path from a source node to all target nodes

Solution on single processor machineDijkstras algorithmDistributed solution: parallel breadth-first searchMapReducePregel

14MapReduce Execution Overview

15Example: SSSP Parallel BFS in MapReduceAdjacency matrix

Adjacency ListA: (B, 10), (D, 5)B: (C, 1), (D, 2)C: (E, 4)D: (B, 3), (C, 9), (E, 2)E: (A, 7), (C, 6)

010523219746ABCDEABCDEA105B12C4D392E76ABCDEABCDE16010523219746ABCDEMap input:

Map output:

Flushed to local disk!!Example: SSSP Parallel BFS in MapReduce17Reduce input:

010523219746ABCDEExample: SSSP Parallel BFS in MapReduce18Reduce input:

010523219746ABCDEExample: SSSP Parallel BFS in MapReduce19Reduce output: = Map input for next iteration

Map output:

010510523219746ABCDE

Flushed to DFS!!Flushed to local disk!!Example: SSSP Parallel BFS in MapReduce20Reduce input:

010510523219746ABCDEExample: SSSP Parallel BFS in MapReduce21Reduce input:

010510523219746ABCDEExample: SSSP Parallel BFS in MapReduce22Reduce output: = Map input for next iteration

the rest omitted

08511710523219746ABCDEFlushed to DFS!!Example: SSSP Parallel BFS in MapReduce23SlowProcessorProblems with MapReduceMap-Reduce cannot express iterative algorithms efficiently :DataDataDataDataDataDataDataDataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3IterationsBarrierBarrierBarrier2424MapAbuse: Iterative MapReduceOnly a subset of data needs computation:DataDataDataDataDataDataDataDataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3IterationsBarrierBarrierBarrier2525MapAbuse: Iterative MapReduceSystem is not optimized for iteration:DataDataDataDataDataDataDataDataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3DataDataDataDataDataDataDataCPU 1CPU 2CPU 3IterationsDisk PenaltyDisk PenaltyDisk PenaltyStartupPenaltyStartup PenaltyStartup Penalty2626BarrierPregel ModelBulk Synchronous Parallel Model:

ComputeCommunicate27Computation ModelInputOutputSupersteps(a sequence of iterations)

28

Computation ModelThink like a vertexInspired by Valiants Bulk Synchronous Parallel model (1990)Source: http://en.wikipedia.org/wiki/Bulk_synchronous_parallel 29

Computation ModelSuperstep: the vertices compute in parallelEach vertex Receives messages sent in the previous superstepExecutes the same user-defined functionModifies its value or that of its outgoing edgesSends messages to other vertices (to be received in the next superstep)Mutates the topology of the graphVotes to halt if it has no further work to doTermination conditionAll vertices are simultaneously inactiveThere are no messages in transit

30Intermission31Example: SSSP Parallel BFS in Pregel01052321974632Example: SSSP Parallel BFS in Pregel01052321974610533Example: SSSP Parallel BFS in Pregel01051052321974634Example: SSSP Parallel BFS in Pregel0105105232197461171281435Example: SSSP Parallel BFS in Pregel0851171052321974636Example: SSSP Parallel BFS in Pregel08511710523219746914131537Example: SSSP Parallel BFS in Pregel085971052321974638Example: SSSP Parallel BFS in Pregel08597105232197461339Example: SSSP Parallel BFS in Pregel085971052321974640

C++ APIWriting a Pregel programSubclassing the predefined Vertex classOverride this!in msgsout msg41Example: Vertex Class for SSSP

42System DesignPregel system also uses the master/worker modelMaster Maintains workerRecovers faults of workersProvides Web-UI monitoring tool of job progressWorkerProcesses its taskCommunicates with the other workersPersistent data is stored as files on a distributed storage system (such as GFS or BigTable)Temporary data is stored on local disk43Execution of a Pregel ProgramMany copies of the program begin executing on a clusterThe master assigns a partition of input to each workerEach worker loads the vertices and marks them as activeThe master instructs each worker to perform a superstepEach worker loops through its active vertices & computes for each vertexMessages are sent asynchronously, but are delivered before the end of the superstepThis step is repeated as long as any vertices are active, or any messages are in transitAfter the computation halts, the master may instruct each worker to save its portion of the graph44Fault ToleranceCheckpointingThe master periodically instructs the workers to save the state of their partitions to persistent storagee.g., Vertex values, edge values, incoming messagesFailure detection Using regular ping messagesRecoveryThe master reassigns graph partitions to the currently available workersThe workers all reload their partition state from most recent available checkpoint45ExperimentsEnvironmentH/W: A cluster of 300 multicore commodity PCsData: binary trees, log-normal random graphs (general graphs)

Nave SSSP implementationThe weight of all edges = 1No checkpointing46ExperimentsSSSP 1 billion vertex binary tree: varying # of worker tasks

47ExperimentsSSSP binary trees: varying graph sizes on 800 worker tasks

48ExperimentsSSSP Random graphs: varying graph sizes on 800 worker tasks

49Differences from MapReduceGraph algorithms can be written as a series of chained MapReduce invocationPregelKeeps vertices & edges on the machine that performs computationUses network transfers only for messagesMapReducePasses the entire state of the graph from one stage to the nextNeeds to coordinate the steps of a chained MapReduce50MapReduce vs Graph ProcessingWhen to use MapReduce vs Graph Processing Frameworks?Two models can be reduced to each otherMR on Pregel: Every vertex to every other vertexPregel on MR: Similar to PageRank on MR

Use graph fw for sparse graphs and matrixesComm. limited to small neighborhood (partitioning)Faster than the shuffle/sort/reduce of MRLocal state kept between iterations (side-effects)

51MapReduce vs Graph Processing (2)Use MR for all-to-all problemsDistributed joinsWhen little or no state between tasks (otherwise have to written to disk)Graph processing is middleground between message passing (MPI) and data-intensive computing

52Backup Slides53Pregel vs GraphLabDistributed system modelsAsynchronous model and Synchronous modelWell known tradeoff between two modelsSynchronous: concurrency-control/failures easy, poor perf.Asynchronous: concurrency-control/failures hard, good perf.Pregel is a synchronous systemNo concurrency control, no worry of consistencyFault-tolerance, check point at each barrierGraphLab is asynchronous systemConsistency of updates harder (sequential, vertex)Fault-tolerance harder (need a snapshot with consistency)

54Pregel vs GraphLab (2)Synchronous model requires waiting for every nodeGood for problems that require itBad when waiting for stragglers or load-imbalanceAsynchronous model can make faster progressSome problems work in presence of driftFast, but harder to reason about computationCan load balance in scheduling to deal with load skew

55