Giraph - University of Notre Dame · 2018. 10. 2. · •Developed by Apache •Based off...

Post on 17-Aug-2020

2 views 0 download

Transcript of Giraph - University of Notre Dame · 2018. 10. 2. · •Developed by Apache •Based off...

Giraph

NeilButcher

Background• Giraph scalableplatformforimplementinggraphalgorithms

• DevelopedbyApache• Basedoff‘Pregel’• UtilizesHadoopMapReduceframeworktotargetgraphproblems

• OpenSource

1

Advantages of Solving Problems with Giraph• Message-basedcommunication:nolocks• Globalsynchronization:nosemaphores• Simpletoprogram• Massivelyparallel:taskbasedprogramming• Faulttolerant:Savesintermediateresults

2

Giraph Algorithms: Basic Idea• Algorithmsarewrittenfromtheperspectiveofavertex

• Verticessendmessagestoeachothertosharepertinentinformation

3

How it Works• ’compute’ functionhasabilityto:– modifystateofvertexanditsoutgoingedges– Cansendmessagestoothervertices– Receivemessagessentinprevioussuperstep

• Thingsthathappenduringasuperstep:– A‘compute’functionisinvokedoneachvertexthatreceivedamessageintheprevioussuperstep

– Nextsuperstep beginsonly afterallverticeshavecompletedtheirwork

– Ifnomessagesareinflight,haltprogram4

Single Source Shortest Path Algorithm

5

Readupdatesfromothervertices,findminimum

Senddistancetoothervertices

Single Source Shortest Path Example

6

Single Source Shortest Path Example

7

Single Source Shortest Path Example

8

Single Source Shortest Path Example

9

Single Source Shortest Path Example

10

More Complex Example: PageRank

11

Giraph Job Lifetime

12

Implementing Algorithm in Giraph• DefineaVertex class– Subclassofexistingimplementations

• DefineaVertexInputFormat toreadthegraph• DefineVertexOutputFormat thatdefineshowtoextractresultbasedonVertexfinalstate

• Manyotherfeaturescanbeutilizedtoimproveperformance

13

Aggregators• Eachvertexcanstorevaluesthatcanbereadbyallverticesinproceedingsuperstep

• Canmaintainvalues(sum,min,max,accumulate,userdefined,ect)

• Aggregatorsmustberegisteredonmaster

14

Combiners• Userdefinedfunctiontocombinemessagesbeforebeingsentordelivered

• Savesonnetworkandmemory

15

Checkpointing• Canbeexpensivebutnecessary• Ensuresnosinglepointoffailure• Storeworkatuserdefinedintervals• Restartonfailure

16

Zookeeper Responsibilities: Computation State • Handlespartition/workermapping• Globalstate• Checkpointpaths,aggregatorvalues,statistics

17

Master Responsibilties: Coordination

• Assignspartitionstoworkers– Hashmapping isdefault– Canbeuserdefined

• Monitorsworkers• Coordinatessupersteps (ending,startingect)

18

Worker Responsibilities: Vertices

• Workersareassignedvertices• Performcompute• Passmessagesbetweenvertices• Computeslocalaggregationvalues

19