Post on 17-Aug-2020
Giraph
NeilButcher
Background• Giraph scalableplatformforimplementinggraphalgorithms
• DevelopedbyApache• Basedoff‘Pregel’• UtilizesHadoopMapReduceframeworktotargetgraphproblems
• OpenSource
1
Advantages of Solving Problems with Giraph• Message-basedcommunication:nolocks• Globalsynchronization:nosemaphores• Simpletoprogram• Massivelyparallel:taskbasedprogramming• Faulttolerant:Savesintermediateresults
2
Giraph Algorithms: Basic Idea• Algorithmsarewrittenfromtheperspectiveofavertex
• Verticessendmessagestoeachothertosharepertinentinformation
3
How it Works• ’compute’ functionhasabilityto:– modifystateofvertexanditsoutgoingedges– Cansendmessagestoothervertices– Receivemessagessentinprevioussuperstep
• Thingsthathappenduringasuperstep:– A‘compute’functionisinvokedoneachvertexthatreceivedamessageintheprevioussuperstep
– Nextsuperstep beginsonly afterallverticeshavecompletedtheirwork
– Ifnomessagesareinflight,haltprogram4
Single Source Shortest Path Algorithm
5
Readupdatesfromothervertices,findminimum
Senddistancetoothervertices
Single Source Shortest Path Example
6
Single Source Shortest Path Example
7
Single Source Shortest Path Example
8
Single Source Shortest Path Example
9
Single Source Shortest Path Example
10
More Complex Example: PageRank
11
Giraph Job Lifetime
12
Implementing Algorithm in Giraph• DefineaVertex class– Subclassofexistingimplementations
• DefineaVertexInputFormat toreadthegraph• DefineVertexOutputFormat thatdefineshowtoextractresultbasedonVertexfinalstate
• Manyotherfeaturescanbeutilizedtoimproveperformance
13
Aggregators• Eachvertexcanstorevaluesthatcanbereadbyallverticesinproceedingsuperstep
• Canmaintainvalues(sum,min,max,accumulate,userdefined,ect)
• Aggregatorsmustberegisteredonmaster
14
Combiners• Userdefinedfunctiontocombinemessagesbeforebeingsentordelivered
• Savesonnetworkandmemory
15
Checkpointing• Canbeexpensivebutnecessary• Ensuresnosinglepointoffailure• Storeworkatuserdefinedintervals• Restartonfailure
16
Zookeeper Responsibilities: Computation State • Handlespartition/workermapping• Globalstate• Checkpointpaths,aggregatorvalues,statistics
17
Master Responsibilties: Coordination
• Assignspartitionstoworkers– Hashmapping isdefault– Canbeuserdefined
• Monitorsworkers• Coordinatessupersteps (ending,startingect)
18
Worker Responsibilities: Vertices
• Workersareassignedvertices• Performcompute• Passmessagesbetweenvertices• Computeslocalaggregationvalues
19