Distributed Systems CS 15-440 Overview and Introduction Lecture 1, Aug 25, 2014 Mohammad Hammoud.
Distributed Systems CS 15-440
description
Transcript of Distributed Systems CS 15-440
![Page 1: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/1.jpg)
Distributed SystemsCS 15-440
Programming Models- Part IV
Lecture 16, Nov 4, 2013
Mohammad Hammoud
1
![Page 2: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/2.jpg)
Today…
Last Session: Programming Models – Part III: MapReduce
Today’s Session: Programming Models – Part IV: Pregel & GraphLab
Announcements: Project 3 is due on Saturday Nov 9, 2013 by midnight PS3 is due on Wednesday Nov 13, 2013 by midnight Quiz 2 is on Nov 20, 2013 Final Exam is on Dec 8, 2013 at 9:00AM in room # 2051 Last day of classes will be Wednesday Dec 4, 2013 (we will hold
an overview session) 2
![Page 3: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/3.jpg)
Objectives
Discussion on Programming Models
Why parallelizing our programs?
Parallel computer architectures
Traditional Models of parallel programming
Types of Parallel Programs
Message Passing Interface (MPI)
MapReduce, Pregel and GraphLab
MapReduce, Pregel and GraphLab
Last 3 Sessions
Cont’d
![Page 4: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/4.jpg)
The Pregel Analytics Engine
4
Pregel
Motivation & Definition
The Computation & Programming
Models
Input and Output
Architecture & Execution Flow
Fault-Tolerance
![Page 5: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/5.jpg)
How to implement algorithms to process Big Graphs?
Create a custom distributed infrastructure for each new algorithm
Rely on existing distributed analytics engines like MapReduce
Use a single-computer graph algorithm library like BGL, LEDA, NetworkX etc.
Use a parallel graph processing system like Parallel BGL or CGMGraph
Motivation for Pregel
5
Difficult!
Inefficient and Cumbersome!
Usually Big Graphs are too large to fit on a single machine!
Not suited for Large-Scale Distributed Systems!
![Page 6: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/6.jpg)
What is Pregel?
Pregel is a large-scale graph-parallel distributed analytics engine
Some Characteristics:• In-Memory (opposite to MapReduce)• High scalability• Automatic fault-tolerance• Flexibility in expressing graph algorithms• Message-Passing programming model• Tree-style, master-slave architecture• Synchronous
Pregel is inspired by Valiant’s Bulk Synchronous Parallel (BSP) model
6
![Page 7: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/7.jpg)
The Pregel Analytics Engine
7
Pregel
Motivation & Definition
The Computation & Programming
Models
Input and Output
Architecture & Execution Flow
Fault-Tolerance
![Page 8: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/8.jpg)
The BSP Model
8
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1CPU 1
CPU 2CPU 2
CPU 3CPU 3
CPU 1CPU 1
CPU 2CPU 2
CPU 3CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1CPU 1
CPU 2CPU 2
CPU 3CPU 3
Iterations
Bar
rier
Bar
rier
Data
Data
Data
Data
Data
Data
Data
Bar
rier
Super-Step 1 Super-Step 2 Super-Step 3
![Page 9: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/9.jpg)
Entities and Super-Steps
The computation is described in terms of vertices, edges and a sequence of super-steps
You give Pregel a directed graph consisting of vertices and edges Each vertex is associated with a modifiable
user-defined value Each edge is associated with a source vertex, value
and a destination vertex
During a super-step: A user-defined function F is executed at each vertex V F can read messages sent to V in superset S – 1 and send messages to other
vertices that will be received at superset S + 1 F can modify the state of V and its outgoing edges F can alter the topology of the graph 9
![Page 10: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/10.jpg)
Topology Mutations The graph structure can be modified during any super-step
Vertices and edges can be added or deleted
Mutating graphs can create conflicting requests where multiple vertices at a super-step might try to alter the same edge/vertex
Conflicts are avoided using partial ordering and handlers Partial orderings:
Edges are removed before vertices Vertices are added before edges Mutations performed at super-step S are only effective at
super-step S + 1 All mutations precede calls to actual computations
Handlers: Among multiple conflicting requests, one request is selected arbitrarily
10
![Page 11: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/11.jpg)
Algorithm Termination
Algorithm termination is based on every vertex voting to halt
In super-step 0, every vertex is active All active vertices participate in the computation of any given super-step A vertex deactivates itself by voting
to halt and enters an inactive state A vertex can return to active state
if it receives an external message
A Pregel program terminates when all vertices are simultaneously inactive and there are no messages in transit
11
Active InactiveInactive
Vote to Halt
Message Received
Vertex State Machine
![Page 12: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/12.jpg)
Finding the Max Value in a Graph
3 6 2 1
3 6 2 16 2 66
6 6 2 66 6
6 6 6 66
Blue Arrows are messages
Blue vertices have voted to halt
6
S:
S + 1:
S + 2:
S + 3:
![Page 13: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/13.jpg)
The Programming Model
Pregel adopts the message-passing programming model
Messages can be passed from any vertex to any other vertex in the graph Any number of messages can be passed The message order is not guaranteed Messages will not be duplicated
Combiners can be used to reduce
the number of messages passed
between super-steps
Aggregators are available for reduction operations (e.g., sum, min, and max)
13
![Page 14: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/14.jpg)
The Pregel API in C++
A Pregel program is written by sub-classing the Vertex class:
14
template <typename VertexValue,typename EdgeValue,typename MessageValue>
class Vertex {public:
virtual void Compute(MessageIterator* msgs) = 0;
const string& vertex_id() const;int64 superstep() const;const VertexValue& GetValue();VertexValue* MutableValue();OutEdgeIterator GetOutEdgeIterator();
void SendMessageTo(const string& dest_vertex,const MessageValue& message);
void VoteToHalt();};
Override the compute function to
define the computation at each superstep
To pass messages to other
vertices
To define the types for vertices, edges and messages
To get the value of the current vertex
To modify the value of the vertex
![Page 15: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/15.jpg)
Pregel Code for Finding the Max Value
Class MaxFindVertex: public Vertex<double, void, double> {
public:virtual void Compute(MessageIterator* msgs) {
int currMax = GetValue();SendMessageToAllNeighbors(currMax);for ( ; !msgs->Done(); msgs->Next()) {
if (msgs->Value() > currMax)currMax = msgs->Value();
}if (currMax > GetValue())
*MutableValue() = currMax;else VoteToHalt();
}};
![Page 16: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/16.jpg)
The Pregel Analytics Engine
16
Pregel
Motivation & Definition
The Computation & Programming
Models
Input and Output
Architecture & Execution Flow
Fault-Tolerance
![Page 17: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/17.jpg)
Input, Graph Flow and Output
The input graph in Pregel is stored in a distributed storage layer (e.g., GFS or Bigtable)
The input graph is divided into partitions consisting of vertices and outgoing edges Default partitioning function is hash(ID) mod N, where N is the # of partitions
Partitions are stored at node memories for the duration of computations (hence, an in-memory model & not a disk-based one)
Outputs in Pregel are typically graphs isomorphic (or mutated) to input graphs Yet, outputs can be also aggregated statistics mined from input graphs
(depends on the graph algorithms)
![Page 18: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/18.jpg)
The Pregel Analytics Engine
18
Pregel
Motivation & Definition
The Computation & Programming
Models
Input and Output
Architecture & Execution Flow
Fault-Tolerance
![Page 19: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/19.jpg)
The Architectural Model Pregel assumes a tree-style network topology and a
master-slave architecture
19
Core Switch
Rack Switch
Worker1 Worker2 Worker3
Rack Switch
Worker4 Worker5 Master
Push work (i.e., partitions) to all workers
Send Completion Signals
When the master receives the completion signal from every worker in super-step S, it starts super-step S + 1
![Page 20: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/20.jpg)
The Execution Flow
Steps of Program Execution in Pregel:
1. Copies of the program code are distributed across all machines
1.1 One copy is designated as the master and every other copy is deemed as a worker/slave
2. The master partitions the graph and assigns workers partition(s), along with portions of input “graph data”
3. Every worker executes the user-defined function on each vertex
4. Workers can communicate among each others
20
![Page 21: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/21.jpg)
The Execution Flow
Steps of Program Execution in Pregel:
5. The master coordinates the execution of super-steps
6. The master calculates the number of inactive vertices after each super-step and signals workers to terminate if all vertices are inactive (and no messages are in transit)
7. Each worker may be instructed to save its portion of the graph
21
![Page 22: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/22.jpg)
The Pregel Analytics Engine
22
Pregel
Motivation & Definition
The Computation & Programming
Models
Input and Output
Architecture & Execution Flow
Fault-Tolerance
![Page 23: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/23.jpg)
Fault Tolerance in Pregel
Fault-tolerance is achieved through checkpointing At the start of every super-step the master may instruct the
workers to save the states of their partitions in a stable storage
Master uses “ping” messages to detect worker failures
If a worker fails, the master re-assigns corresponding vertices and input graph data to another available worker, and restarts the super-step The available worker re-loads the partition state of the failed
worker from the most recent available checkpoint
23
![Page 24: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/24.jpg)
How Does Pregel Compare to MapReduce?
24
![Page 25: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/25.jpg)
Pregel versus MapReduce
25
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Computation Model Synchronous Synchronous
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Computation Model Synchronous Synchronous
Parallelism Model Data-Parallel Graph-Parallel
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Computation Model Synchronous Synchronous
Parallelism Model Data-Parallel Graph-Parallel
Architectural Model Master-Slave Master-Slave
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Computation Model Synchronous Synchronous
Parallelism Model Data-Parallel Graph-Parallel
Architectural Model Master-Slave Master-Slave
Task/Vertex Scheduling Model
Pull-Based Push-Based
Aspect Hadoop MapReduce Pregel
Programming Model Shared-Memory (abstraction)
Message-Passing
Computation Model Synchronous Synchronous
Parallelism Model Data-Parallel Graph-Parallel
Architectural Model Master-Slave Master-Slave
Task/Vertex Scheduling Model
Pull-Based Push-Based
Application Suitability Loosely-Connected/Embarrassingly Parallel
Applications
Strongly-Connected Applications
![Page 26: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/26.jpg)
Objectives
Discussion on Programming Models
Why parallelizing our programs?
Parallel computer architectures
Traditional Models of parallel programming
Types of Parallel Programs
Message Passing Interface (MPI)
MapReduce, Pregel and GraphLab
![Page 27: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/27.jpg)
The GraphLab Analytics Engine
27
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 28: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/28.jpg)
Motivation for GraphLab
There is an exponential growth in the scale of Machine Learning and Data Mining (MLDM) algorithms
Designing, implementing and testing MLDM at large-scale are challenging due to: Synchronization Deadlocks Scheduling Distributed state management Fault-tolerance
The interest on analytics engines that can execute MLDM algorithms automatically and efficiently is increasing MapReduce is inefficient with iterative jobs (common in MLDM algorithms) Pregel cannot run asynchronous problems (common in MLDM algorithms)
28
![Page 29: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/29.jpg)
What is GraphLab?
GraphLab is a large-scale graph-parallel distributed analytics engine
Some Characteristics:• In-Memory (opposite to MapReduce and similar to Pregel)• High scalability• Automatic fault-tolerance• Flexibility in expressing arbitrary graph algorithms (more flexible
than Pregel)• Shared-Memory abstraction (opposite to Pregel but similar to
MapReduce)• Peer-to-peer architecture (dissimilar to Pregel and MapReduce)• Asynchronous (dissimilar to Pregel and MapReduce)
29
![Page 30: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/30.jpg)
The GraphLab Analytics Engine
30
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 31: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/31.jpg)
GraphLab assumes problems modeled as graphs
It adopts two phases, the initialization and the execution phases
Input, Graph Flow and Output
31
Initialization PhaseInitialization Phase GraphLab Execution PhaseGraphLab Execution Phase
Distributed File system
(MapReduce) Graph Builder
Distributed File system
Raw Graph Data
Raw Graph Data
Raw Graph Data
Raw Graph Data
Parsing + PartitioningParsing +
Partitioning
Atom Collection
Atom Collection
Index Construction
Index Construction
Atom IndexAtom Index
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Cluster Distributed File system
TCP RPC Comms
TCP RPC Comms
Atom IndexAtom Index
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Atom File
Monitoring + Atom
Placement
Monitoring + Atom
Placement
GL EngineGL Engine
GL EngineGL Engine
GL EngineGL Engine
![Page 32: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/32.jpg)
Components of the GraphLab Engine: The Data-Graph
The GraphLab engine incorporates three main parts:1. The data-graph, which represents the user program state at a cluster machine
32
Data-Graph
Vertex
Edge
![Page 33: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/33.jpg)
The GraphLab engine incorporates three main parts:2. The update function, which involves two main functions:
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
vv
Sv
The scope of a vertex v (i.e., Sv) is the data stored in v and in all v’s adjacent edges and vertices
Components of the GraphLab Engine: The Update Function
![Page 34: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/34.jpg)
Components of the GraphLab Engine: The Update Function
The GraphLab engine incorporates three main parts:2. The update function, which involves two main functions:
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
The update function
Schedule v
![Page 35: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/35.jpg)
Components of the GraphLab Engine: The Update Function
The GraphLab engine incorporates three main parts:2. The update function, which involves two main functions:
2.1- Altering data within a scope of a vertex
2.2- Scheduling future update functions at neighboring vertices
CPU 1CPU 1
CPU 2CPU 2
ee ff gg
kkjjiihh
ddccbbaa bb
iihh
aa
ii
bb ee ff
jj
cc
Sch
edule
rSch
edule
r
The process repeats until the scheduler is emptyThe process repeats until the scheduler is empty
![Page 36: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/36.jpg)
Components of the GraphLab Engine: The Sync Operation
The GraphLab engine incorporates three main parts:3. The sync operation, which maintains global statistics describing data
stored in the data-graph
Global values maintained by the sync operation can be written by all update functions across the cluster machines
The sync operation is similar to Pregel’s aggregators
A mutual exclusion mechanism is applied by the sync operation to avoid write-write conflicts
For scalability reasons, the sync operation is not enabled by default
![Page 37: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/37.jpg)
The GraphLab Analytics Engine
37
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 38: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/38.jpg)
The Architectural Model
GraphLab adopts a peer-to-peer architecture All engine instances are symmetric Engine instances communicate together using Remote Procedure Call
(RPC) protocol over TCP/IP The first triggered engine has an additional responsibility of being a
monitoring/master engine
Advantages: Highly scalable Precludes centralized bottlenecks and single point of failures
Main disadvantage: Complexity
![Page 39: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/39.jpg)
The GraphLab Analytics Engine
39
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 40: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/40.jpg)
The Programming Model
GraphLab offers a shared-memory programming model
It allows scopes to overlap and vertices to read/write from/to their scopes
![Page 41: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/41.jpg)
Consistency Models in GraphLab
GraphLab guarantees sequential consistency Provides the same result as a sequential execution of the computational steps
User-defined consistency models Full Consistency Vertex Consistency Edge Consistency
41
Vertex v
![Page 42: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/42.jpg)
Consistency Models in GraphLab
D1 D2 D3 D4 D5
D1↔2D1↔2 D2↔3
D2↔3 D3↔4D3↔4 D4↔5
D4↔5
1 2 3 4 5
D1 D2 D3 D4 D5
D1↔2D1↔2 D2↔3
D2↔3 D3↔4D3↔4 D4↔5
D4↔5
1 2 3 4 5
ReadWrite
ReadWrite
D1 D2 D3 D4 D5
D1↔2D1↔2 D2↔3
D2↔3 D3↔4D3↔4 D4↔5
D4↔5
1 2 3 4 5
ReadWrite
![Page 43: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/43.jpg)
The GraphLab Analytics Engine
43
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 44: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/44.jpg)
The Computation Model
GraphLab employs an asynchronous computation model
It suggests two asynchronous engines Chromatic Engine Locking Engine
The chromatic engine executes vertices partially asynchronous It applies vertex coloring (e.g., no adjacent vertices share the same color) All vertices with the same color are executed before proceeding to a different color
The locking engine executes vertices fully asynchronously Data on vertices and edges are susceptible to corruption It applies a permission-based distributed mutual exclusion mechanism to
avoid read-write and write-write hazards
![Page 45: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/45.jpg)
The GraphLab Analytics Engine
45
GraphLab
Motivation &
Definition
The Programming
Model
Input, Output &
Components
The Architectural
Model
Fault-Tolerance
The Computation
Model
![Page 46: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/46.jpg)
Fault-Tolerance in GraphLab
GraphLab uses distributed checkpointing to recover from machine failures
It suggests two checkpointing mechanisms Synchronous checkpointing (it suspends the entire execution of GraphLab) Asynchronous checkpointing
![Page 47: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/47.jpg)
How Does GraphLab Compare to MapReduce and Pregel?
47
![Page 48: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/48.jpg)
GraphLab vs. Pregel vs. MapReduceAspect Hadoop
MapReducePregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Aspect Hadoop MapReduce
Pregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Computation Model
Synchronous Synchronous Asynchronous
Aspect Hadoop MapReduce
Pregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Computation Model
Synchronous Synchronous Asynchronous
Parallelism Model
Data-Parallel Graph-Parallel Graph-Parallel
Aspect Hadoop MapReduce
Pregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Computation Model
Synchronous Synchronous Asynchronous
Parallelism Model
Data-Parallel Graph-Parallel Graph-Parallel
Architectural Model
Master-Slave Master-Slave Peer-to-Peer
Aspect Hadoop MapReduce
Pregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Computation Model
Synchronous Synchronous Asynchronous
Parallelism Model
Data-Parallel Graph-Parallel Graph-Parallel
Architectural Model
Master-Slave Master-Slave Peer-to-Peer
Task/Vertex Scheduling
Model
Pull-Based Push-Based Push-Based
Aspect Hadoop MapReduce
Pregel GraphLab
Programming Model
Shared-Memory Message-Passing Shared-Memory
Computation Model
Synchronous Synchronous Asynchronous
Parallelism Model
Data-Parallel Graph-Parallel Graph-Parallel
Architectural Model
Master-Slave Master-Slave Peer-to-Peer
Task/Vertex Scheduling
Model
Pull-Based Push-Based Push-Based
Application Suitability
Loosely-Connected/
Embarrassingly Parallel Applications
Strongly-Connected Applications
Strongly-Connected Applications (more
precisely MLDM apps)
![Page 49: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/49.jpg)
Next Class
Fault-Tolerance
![Page 50: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/50.jpg)
Back-up Slides
50
![Page 51: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/51.jpg)
PageRank PageRank is a link analysis algorithm
The rank value indicates an importance of a particular web page
A hyperlink to a page counts as a vote of support
A page that is linked to by many pages with high PageRank receives a high rank itself
A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link will be directed to the document with the 0.5 PageRank
![Page 52: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/52.jpg)
PageRank (Cont’d) Iterate:
Where: α is the random reset probability L[j] is the number of links on page j
1 32
4 65
![Page 53: Distributed Systems CS 15-440](https://reader036.fdocuments.us/reader036/viewer/2022062804/56814a71550346895db78a28/html5/thumbnails/53.jpg)
pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }
;][)1(][][
iNj
ji jRWiR
PageRank Example in GraphLab PageRank algorithm is defined as a per-vertex operation working on the scope
of the vertex
Dynamic computation