GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I...
Transcript of GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I...
![Page 1: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/1.jpg)
GraphLab: A New Framework For Parallel MachineLearning
Amir H. [email protected]
Amirkabir University of Technology(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 1 / 42
![Page 2: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/2.jpg)
Reminder
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 2 / 42
![Page 3: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/3.jpg)
Data-Parallel Model for Large-Scale Graph Processing
I The platforms that have worked well for developing parallel applica-tions are not necessarily effective for large-scale graph problems.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 3 / 42
![Page 4: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/4.jpg)
Graph-Parallel Processing
I Restricts the types of computation.
I New techniques to partition and distribute graphs.
I Exploit graph structure.
I Executes graph algorithms orders-of-magnitude faster than moregeneral data-parallel systems.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 4 / 42
![Page 5: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/5.jpg)
Data-Parallel vs. Graph-Parallel Computation
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 5 / 42
![Page 6: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/6.jpg)
Pregel
I Vertex-centric
I Bulk Synchronous Parallel (BSP)
I Runs in sequence of iterations (supersteps)
I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42
![Page 7: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/7.jpg)
Pregel
I Vertex-centric
I Bulk Synchronous Parallel (BSP)
I Runs in sequence of iterations (supersteps)
I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42
![Page 8: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/8.jpg)
Pregel
I Vertex-centric
I Bulk Synchronous Parallel (BSP)
I Runs in sequence of iterations (supersteps)
I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42
![Page 9: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/9.jpg)
Pregel
I Vertex-centric
I Bulk Synchronous Parallel (BSP)
I Runs in sequence of iterations (supersteps)
I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42
![Page 10: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/10.jpg)
Pregel Limitations
I Inefficient if different regions of the graph converge at differentspeed.
I Can suffer if one task is more expensive than the others.
I Runtime of each phase is determined by the slowest machine.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 7 / 42
![Page 11: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/11.jpg)
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 8 / 42
![Page 12: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/12.jpg)
Data Model
I A directed graph that stores the program state, called data graph.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 9 / 42
![Page 13: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/13.jpg)
Vertex Scope
I The scope of vertex v is the data stored in vertex v, in all adjacentvertices and adjacent edges.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 10 / 42
![Page 14: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/14.jpg)
Programming Model (1/3)
I Rather than adopting a message passing as in Pregel, GraphLaballows the user defined function of a vertex to read and modify anyof the data in its scope.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 11 / 42
![Page 15: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/15.jpg)
Programming Model (2/3)
I Update function: user-defined function similar to Compute in Pregel.
I Can read and modify the data within the scope of a vertex.
I Schedules the future execution of other update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 12 / 42
![Page 16: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/16.jpg)
Programming Model (3/3)
I Sync function: similar to aggregate in Pregel.
I Maintains global aggregates.
I Performs periodically in the background.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 13 / 42
![Page 17: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/17.jpg)
Execution Model
I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.
I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42
![Page 18: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/18.jpg)
Execution Model
I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.
I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42
![Page 19: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/19.jpg)
Execution Model
I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.
I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42
![Page 20: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/20.jpg)
Example: PageRank
GraphLab_PageRank(i)
// compute sum over neighbors
total = 0
foreach(j in in_neighbors(i)):
total = total + R[j] * wji
// update the PageRank
R[i] = 0.15 + total
// trigger neighbors to run again
foreach(j in out_neighbors(i)):
signal vertex-program on j
R[i] = 0.15 +∑
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 15 / 42
![Page 21: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/21.jpg)
Data Consistency (1/3)
I Overlapped scopes: race-condition in simultaneous execution of twoupdate functions.
I Full consistency: during the execution f(v), no other function readsor modifies data within the v scope.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 16 / 42
![Page 22: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/22.jpg)
Data Consistency (1/3)
I Overlapped scopes: race-condition in simultaneous execution of twoupdate functions.
I Full consistency: during the execution f(v), no other function readsor modifies data within the v scope.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 16 / 42
![Page 23: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/23.jpg)
Data Consistency (2/3)
I Edge consistency: during the execution f(v), no other functionreads or modifies any of the data on v or any of the edges adja-cent to v.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 17 / 42
![Page 24: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/24.jpg)
Data Consistency (3/3)
I Vertex consistency: during the execution f(v), no other functionwill be applied to v.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 18 / 42
![Page 25: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/25.jpg)
Sequential Consistency (1/2)
I Proving the correctness of a parallel algorithm: sequential consistency
I Sequential consistency: if for every parallel execution, there exists asequential execution of update functions that produces an equivalentresult.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 19 / 42
![Page 26: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/26.jpg)
Sequential Consistency (1/2)
I Proving the correctness of a parallel algorithm: sequential consistency
I Sequential consistency: if for every parallel execution, there exists asequential execution of update functions that produces an equivalentresult.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 19 / 42
![Page 27: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/27.jpg)
Sequential Consistency (2/2)
I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.
• The full consistency model is used.
• The edge consistency model is used and update functions do not modifydata in adjacent vertices.
• The vertex consistency model is used and update functions only accesslocal vertex data.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42
![Page 28: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/28.jpg)
Sequential Consistency (2/2)
I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.
• The full consistency model is used.
• The edge consistency model is used and update functions do not modifydata in adjacent vertices.
• The vertex consistency model is used and update functions only accesslocal vertex data.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42
![Page 29: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/29.jpg)
Sequential Consistency (2/2)
I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.
• The full consistency model is used.
• The edge consistency model is used and update functions do not modifydata in adjacent vertices.
• The vertex consistency model is used and update functions only accesslocal vertex data.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42
![Page 30: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/30.jpg)
Sequential Consistency (2/2)
I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.
• The full consistency model is used.
• The edge consistency model is used and update functions do not modifydata in adjacent vertices.
• The vertex consistency model is used and update functions only accesslocal vertex data.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42
![Page 31: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/31.jpg)
Consistency vs. Parallelism
Consistency vs. Parallelism
[Low, Y., GraphLab: A Distributed Abstraction for Large Scale Machine Learning (Doctoral dissertation, University of
California), 2013.]
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 21 / 42
![Page 32: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/32.jpg)
GraphLab Implementation
I Shared memory implementation
I Distributed implementation
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 22 / 42
![Page 33: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/33.jpg)
GraphLab Implementation
I Shared memory implementation
I Distributed implementation
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 22 / 42
![Page 34: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/34.jpg)
Tasks Schedulers (1/2)
I In what order should the tasks (vertex-update function pairs) becalled?
• A collection of base schedules, e.g., round-robin, and synchronous.• Set scheduler: enables users to compose custom update schedules.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 23 / 42
![Page 35: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/35.jpg)
Tasks Schedulers (1/2)
I In what order should the tasks (vertex-update function pairs) becalled?
• A collection of base schedules, e.g., round-robin, and synchronous.• Set scheduler: enables users to compose custom update schedules.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 23 / 42
![Page 36: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/36.jpg)
Tasks Schedulers (2/2)
I How to add new task in the queue?
• FIFO: only permits task creation but do not permit task reordering.• Prioritized: permits task reordering at the cost of increased overhead.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 24 / 42
![Page 37: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/37.jpg)
Tasks Schedulers (2/2)
I How to add new task in the queue?• FIFO: only permits task creation but do not permit task reordering.• Prioritized: permits task reordering at the cost of increased overhead.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 24 / 42
![Page 38: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/38.jpg)
Consistency
I Implemented in C++ using PThreads for parallelism.
I Consistency: read-write lock
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock)• Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks)• Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 25 / 42
![Page 39: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/39.jpg)
Consistency
I Implemented in C++ using PThreads for parallelism.
I Consistency: read-write lock
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock)• Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks)• Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 25 / 42
![Page 40: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/40.jpg)
GraphLab Implementation
I Shared memory implementation
I Distributed implementation
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 26 / 42
![Page 41: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/41.jpg)
Distributed Implementation
I Graph partitioning• How to efficiently load, partition and distribute the data graph across
machines?
I Consistency• How to achieve consistency in the distributed setting?
I Fault tolerance
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 27 / 42
![Page 42: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/42.jpg)
Graph Partitioning
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 28 / 42
![Page 43: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/43.jpg)
Graph Partitioning - Phase 1 (1/2)
I Two-phase partitioning.
I Partitioning the data graph into k parts, called atom.• k � number of machines
I meta-graph: the graph of atoms (one vertex for each atom).
I Atom weight: the amount of data it stores.
I Edge weight: the number of edges crossing the atoms.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42
![Page 44: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/44.jpg)
Graph Partitioning - Phase 1 (1/2)
I Two-phase partitioning.
I Partitioning the data graph into k parts, called atom.• k � number of machines
I meta-graph: the graph of atoms (one vertex for each atom).
I Atom weight: the amount of data it stores.
I Edge weight: the number of edges crossing the atoms.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42
![Page 45: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/45.jpg)
Graph Partitioning - Phase 1 (1/2)
I Two-phase partitioning.
I Partitioning the data graph into k parts, called atom.• k � number of machines
I meta-graph: the graph of atoms (one vertex for each atom).
I Atom weight: the amount of data it stores.
I Edge weight: the number of edges crossing the atoms.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42
![Page 46: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/46.jpg)
Graph Partitioning - Phase 1 (1/2)
I Two-phase partitioning.
I Partitioning the data graph into k parts, called atom.• k � number of machines
I meta-graph: the graph of atoms (one vertex for each atom).
I Atom weight: the amount of data it stores.
I Edge weight: the number of edges crossing the atoms.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42
![Page 47: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/47.jpg)
Graph Partitioning - Phase 1 (2/2)
I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.
I Each atom file is a simple binary that stores interior and the ghostsof the partition information.
I Ghost: set of vertices and edges adjacent to the partition boundary.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42
![Page 48: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/48.jpg)
Graph Partitioning - Phase 1 (2/2)
I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.
I Each atom file is a simple binary that stores interior and the ghostsof the partition information.
I Ghost: set of vertices and edges adjacent to the partition boundary.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42
![Page 49: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/49.jpg)
Graph Partitioning - Phase 1 (2/2)
I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.
I Each atom file is a simple binary that stores interior and the ghostsof the partition information.
I Ghost: set of vertices and edges adjacent to the partition boundary.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42
![Page 50: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/50.jpg)
Graph Partitioning - Phase 2
I Meta-graph is very small.
I A fast balanced partition of the meta-graph over the physical ma-chines.
I Assigning graph atoms to machines.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 31 / 42
![Page 51: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/51.jpg)
Consistency
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 32 / 42
![Page 52: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/52.jpg)
Consistency
I To achieve a serializable parallel execution of a set of dependenttasks.
• Chromatic engine• Distributed locking engine
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 33 / 42
![Page 53: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/53.jpg)
Consistency - Chromatic Engine
I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.
I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.
I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.
I Vertex consistency: assigning all vertices the same color.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42
![Page 54: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/54.jpg)
Consistency - Chromatic Engine
I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.
I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.
I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.
I Vertex consistency: assigning all vertices the same color.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42
![Page 55: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/55.jpg)
Consistency - Chromatic Engine
I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.
I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.
I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.
I Vertex consistency: assigning all vertices the same color.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42
![Page 56: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/56.jpg)
Consistency - Chromatic Engine
I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.
I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.
I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.
I Vertex consistency: assigning all vertices the same color.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42
![Page 57: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/57.jpg)
Consistency - Distributed Locking Engine
I Associating a readers-writer lock with each vertex.
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42
![Page 58: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/58.jpg)
Consistency - Distributed Locking Engine
I Associating a readers-writer lock with each vertex.
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42
![Page 59: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/59.jpg)
Consistency - Distributed Locking Engine
I Associating a readers-writer lock with each vertex.
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42
![Page 60: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/60.jpg)
Consistency - Distributed Locking Engine
I Associating a readers-writer lock with each vertex.
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42
![Page 61: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/61.jpg)
Consistency - Distributed Locking Engine
I Associating a readers-writer lock with each vertex.
I Vertex consistency• Central vertex (write-lock)
I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)
I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)
I Deadlocks are avoided by acquiring locks sequentially following acanonical order.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42
![Page 62: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/62.jpg)
Fault Tolerance
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 36 / 42
![Page 63: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/63.jpg)
Fault Tolerance - Synchronous
I The systems periodically signals all computation activity to halt.
I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.
I Simple, but eliminates the systems advantage of asynchronous com-putation.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42
![Page 64: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/64.jpg)
Fault Tolerance - Synchronous
I The systems periodically signals all computation activity to halt.
I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.
I Simple, but eliminates the systems advantage of asynchronous com-putation.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42
![Page 65: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/65.jpg)
Fault Tolerance - Synchronous
I The systems periodically signals all computation activity to halt.
I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.
I Simple, but eliminates the systems advantage of asynchronous com-putation.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42
![Page 66: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/66.jpg)
Fault Tolerance - Asynchronous
I Based on the Chandy-Lamport algorithm.
I The snapshot function is implemented as an update function invertices.
I The snapshot update takes priority over all other update functions.
I Edge Consistency is used on all update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42
![Page 67: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/67.jpg)
Fault Tolerance - Asynchronous
I Based on the Chandy-Lamport algorithm.
I The snapshot function is implemented as an update function invertices.
I The snapshot update takes priority over all other update functions.
I Edge Consistency is used on all update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42
![Page 68: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/68.jpg)
Fault Tolerance - Asynchronous
I Based on the Chandy-Lamport algorithm.
I The snapshot function is implemented as an update function invertices.
I The snapshot update takes priority over all other update functions.
I Edge Consistency is used on all update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42
![Page 69: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/69.jpg)
Fault Tolerance - Asynchronous
I Based on the Chandy-Lamport algorithm.
I The snapshot function is implemented as an update function invertices.
I The snapshot update takes priority over all other update functions.
I Edge Consistency is used on all update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42
![Page 70: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/70.jpg)
Fault Tolerance - Asynchronous
I Based on the Chandy-Lamport algorithm.
I The snapshot function is implemented as an update function invertices.
I The snapshot update takes priority over all other update functions.
I Edge Consistency is used on all update functions.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42
![Page 71: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/71.jpg)
Summary
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 39 / 42
![Page 72: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/72.jpg)
GraphLab Summary
I Asynchronous model
I Vertex-centric
I Communication: distributed shared memory
I Three consistency levels: full, edge-level, and vertex-level
I Partitioning: two-phase partitioning
I Consistency: chromatic engine (graph coloring), distributed lockengine (reader-writer lock)
I Fault tolerance: synchronous, asynchronous (chandy-lamport)Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 40 / 42
![Page 73: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/73.jpg)
GraphLab Limitations
I Poor performance on Natural graphs.
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 41 / 42
![Page 74: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit](https://reader033.fdocuments.us/reader033/viewer/2022060521/604fb68a8016334b57661302/html5/thumbnails/74.jpg)
Questions?
Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 42 / 42