Distributed Algorithms (CAS 769) · PDF fileDistributed Algorithms (CAS 769) Week 1:...
Transcript of Distributed Algorithms (CAS 769) · PDF fileDistributed Algorithms (CAS 769) Week 1:...
Distributed Algorithms (CAS 769)Week 1: Introduction, Logical clocks, Snapshots
Dr. Borzoo Bonakdarpour
Department of Computing and SoftwareMcMaster University
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 1/44
Presentation outline
Introduction
Logical Clocks
Snapshots (Global States)
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 2/44
Acknowledgments
Most of the contents of these slides are obtained from thefollowing books:
I Distributed Algorithms: An Intuitive Approach - Wan FokkinkI Elements of Distributed Computing - Vijay K. Garg
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 3/44
Distributed Systems
Some DefinitionsThere is no universally accepted definition of a distributed system.
What makes a system distributed?
One man’s constant is another man’s variable.
- Alan Perlis
A distributed system is a system where I can’t get my work donebecause a computer has failed that Ive never even heard of.
A distributed system is one in which the failure of a computer youdidn’t even know existed can render your own computer unusable.
- Leslie Lamport
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 4/44
Distributed Systems
Some DefinitionsA distributed system is one that
I has multiple machinesI is connected by a networkI is cooperating on some task
Communication in Distributed SystemsI Message passingI Shared memory
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 5/44
Distributed Systems
We begin with message passing systems.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 6/44
Preliminaries
Message Passing FrameworkIn a message passing framework, a distributed system
I consists of a finite graph of N processes (a process is a running programand each process has its local state)
I Each process carries a unique IDI Processes communicate through FIFO channels
Characteristics of CommunicationI Communication is asynchronous; i.e., sending and receiving messages
are distinct events, respectivelyI Delay in channels is arbitrary but finiteI There are no garbled, duplicated or lost messages
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 7/44
Preliminaries
Other AssumptionsI Absence of a shared clockI Absence of shared memoryI Absence of accurate failure detection
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 8/44
Example
{x1=0}Process P1(){e1
0 : send(P2,m1);e1
1 : x1=5;e1
2 : x1=10;e1
3 : recv(m2);}
{x2=0}Process P2(){e2
0 : recv(m1);e2
1 : x2=15;e2
2 : x2=20;e2
3 : send(P1,m2);}
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 9/44
Preliminaries
Transition SystemsThe behavior of a distributed algorithm, which runs on a distributed system isoften captured by a transition system, which consists of:
I A set C of configurations (i.e., the composition of local states of itsprocesses plus the messages in transit)
I A binary transition relation→ on CI A set I ⊆ C of initial configurations
A configuration γ is terminal, if there does not exist γ′ ∈ C such that γ → γ′
An execution of the distributed system is a sequence γ̄ = γ0γ1γ2 · · · suchthat:
I γ0 ∈ II for all i ≥ 0, we have γi → γi+1
A configuration δ is reachable if there is a γ0 ∈ I and a finite executionγ0γ1 · · · γk , such that γk = δ.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 10/44
Example
For example, in the distributed algorithm on Slide 9:
I Configuration (x1 = 0, x2 = 0) is the only initial configuration.I Configuration (x1 = 10, x2 = 20) is the only terminal configuration.I (x1 = 0, x2 = 0)→ (x1 = 5, x2 = 0)→ (x1 = 10, x2 = 0)→ (x1 =
10, x2 = 15)→ (x1 = 10, x2 = 20) is a valid execution.I And so is (x1 = 0, x2 = 0)→ (x1 = 5, x2 = 0)→ (x1 = 5, x2 = 15)→
(x1 = 10, x2 = 15)→ (x1 = 10, x2 = 20).
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 11/44
Preliminaries
Question: Is configuration reachability decidable?
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 12/44
Preliminaries
A transition between two configurations is associate to an event.
A process can perform an internal (i.e., change of local state of a process),send, or receive event.
A process if called an initiator if its first event is either internal or send.
An assertion is a predicate on the configuration of an algorithms (e.g.,x ≥ y + 1). We use assertions to define safety properties.
An assertion P is an invariant if:
I P(γ) for all γ ∈ I, andI if γ → γ′ and P(γ), then P(γ′).
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 13/44
Example
For example, in the distributed algorithm on Slide 9:
I Instruction x1 = 5 is an internal event.I Process P1 is an initiator.I (x1 ≤ 100 ∧ x2 ≤ 50) is an invariant.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 14/44
Preliminaries
PropertiesA property is a set of executions.
Safety PropertiesA safety property typically expresses that something bad will never happen.For example:
I The temprature of a boiler never reaches 100 degress.I If an interrupt occurs, a message will be printed in one second.
Formally, a safety property is a set S of infinite executions where:
∀γ̄ 6∈ S : ∃α ≤ γ̄ : ∀γ̄′ : α ≤ γ̄′ ⇒ γ̄′ 6∈ S
where α ≤ γ denotes the fact that α is a prefix of γ.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 15/44
Preliminaries
Liveness PropertiesA liveness property typically expresses that something good will eventuallyhappen. Formally, if L is a liveness property, then the following holds:
∀α : ∃γ̄ : αγ̄ ∈ L
where α is a finite execution and γ̄ is an infinite execution.Examples of liveness properties:
I Non-starvation.I If an interrupt occurs, a message will be printed .
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 16/44
Presentation outline
Introduction
Logical Clocks
Snapshots (Global States)
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 17/44
Causal Order
In an asynchronous distributed system, in each configuration, different eventscan occur in different processes.
Such occurrence of events are independent.
The causal order ≺ is a binary relation on events in an execution, such thata ≺ b iff event a happened before event b. I.e., events in an executioncannot be reordered, so that a happens after b.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 18/44
Causal Order
Causal Order (Happened Before)Formally, the causal order (also called happened before) ≺ is the smallestbinary relation, where
I if a and b are events at the same process and a occurs before b, thena ≺ b,
I if a is a send event and b the corresponding receive event, then a ≺ b,and
I if a ≺ b and b ≺ c, then a ≺ c.
Notice that the happened before relation is a partial order.
We write a � b if either a ≺ b or a = b.
If a 6≺ b and b 6≺ a, then we say a and b are concurrent events.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 19/44
Computation
A permutation of concurrent events in an execution does not affect the resultof the execution.
P1P2
e11
e12
e13
e10
e22
e23
e21
e20
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 20/44
Computation
The set of all permutations form the computation lattice.
e11, e21 e10, e
22
e12 e11, e20 e10, e
21
e10
e10, e20
e12, e20
e12, e21 e11, e
22 e10, e
23
e11, e23e12, e
22
e11
e12, e23
e13, e23
{}
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 21/44
Happened before Vs. Physical Time
Question: If a safety property holds in the happened beforerelation, does it hold in physical time as well?
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 22/44
Logical Clocks
Since a physical shared clock does not exists in a distributed system, we uselogical clocks.
A logical clock C maps occurrences of events in a computation to a partiallyordered set such that
a ≺ b ⇒ C(a) < C(b)
Lamport’s clock LC assigns to each event a the length k of a longestcausality chain a1 ≺ · · · ≺ ak = a in the computation.
Obviously,a ≺ b ⇒ LC(a) < LC(b)
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 23/44
Logical Clocks
Algorithm for Handling Lamport’s clocksConsider an event a, and let k be the clock value of the previous event at thesame process (k = 0 if there is no such previous event).
I If a is an internal or send event, then
LC(a) = k + 1
I If a is a receive event and b the corresponding send event, then
LC(a) = max{k , LC(b)}+ 1
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 24/44
Vector Clocks
The vector clock VC has the property
a ≺ b ⇔ VC(a) < VC(b)
Let a distributed system consist of processes p0, . . . , pN−1. The vector clockassigns events a computation values in NN , whereby this set is provided witha partial order defined by:
(k0, . . . , kN−1) ≤ (l0, . . . , lN−1) ⇔ ki ≤ li , for all i ∈ {0, . . . ,N − 1}
The vector clock is defined as follows: VC(a) = (k0, . . . , kN−1), where ki isthe length of a longest causality chain ai
1 ≺ · · · ≺ aiki
of events at process pi
with aiki� a.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 25/44
Example
Demonstrate the evolution of the vector clock for this computation:
P1P2
e11
e12
e13
e10
e22
e23
e21
e20
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 26/44
Presentation outline
Introduction
Logical Clocks
Snapshots (Global States)
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 27/44
Sanpshot (Global State)
DefinitionsSnapshot cannot be defined based on physical time (e.g., the composition ofall local state at the same time instant).
We use the happened before relation to compute concurrent local states and,hence, snapshots.
A (global) snapshot of an execution of a distributed algorithm is aconfiguration of this execution, consisting of the local states of the processesand the messages in transit.
Intuitively, a snapshot is consistent if it represents a configuration of thecurrent execution or a configuration of an execution in the same computation.
Snapshots are useful to determine stable properties of a distributed system(i.e., properties that when become true, will remain true). E.g., deadlock,termination, loss of a token, etc.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 28/44
Sanpshot
The ChallengeWhy is it difficult to compute a snapshot of a distributed system at run time?
Taking a global snapshot is like taking the picture of the sky: the scene is sobig that it cannot be captured by a single photograph. The challenge is takingmultiple photographs at the same time is not quite possible.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 29/44
Sanpshot
TerminologySuppose we design an algorithm that takes a snapshot of another distributedalgorithm. We call the messages of the underlying algorithm basic messagesand messages of the snapshot algorithm control messages.
An event is called presnapshot if it occurs at a process before the localsnapshot at this process is taken.
Otherwise it’s called postsnapshot.
Consistent SnapshotA snapshot is consistent if
I for each presnapshot event a, all events that are causally before a arealso presnapshot,
I a basic message included in a channel state iff the correspondingsend event is presnapshot while the corresponding receive event ispostsnapshot.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 30/44
Example
m2
P1
P2
P3
m1
G2
m3
G1
G1 is not a consistent snapshot, but G2 is.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44
Example
m2
P1
P2
P3
m1
G2
m3
G1
G1 is not a consistent snapshot, but G2 is.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 31/44
Chandy-Lamport Algorithm
AssumptionAll channels are FIFO.
ChallengesI All recorded local state are mutually concurrentI The state of all channels are captured correctly.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 32/44
Chandy-Lamport Algorithm
SolutionI We associate with each process a variable called color that is either red
or white. All processes are initially white.I Intuitively, the computed global snapshot corresponds to the state of the
system just before the processes turn red.I The algorithm relies on special control messages called markersI Once a process turns red, it send a marker along all its outgoing
channels before it sends out any message.I A process turns red on receiving a marker if it has not already done so.I No white process receives a marker from a red process. Why?I This guarantees that local states are mutually concurrent. Why?
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 33/44
Chandy-Lamport Algorithm
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 34/44
Chandy-Lamport Algorithm
Classification of Basic MessagesI (ww messages) These messages are sent by a white process to a white
process. These message correspond to the messages sent andreceived before the global snapshot.
I (rr messages) These message correspond to the messages sent andreceived after the global snapshot.
I (rw messages) These messages cross the global snapshot in thebackward directions. Such a message will make the snapshotinconsistent. It is not possible to have such messages, if a marker isused. Why?
I (wr messages) These messages cross the global snapshot in theforward directions and participate in the state of the channel in thesnapshot, because they are in transit when the snapshot is taken.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 35/44
Chandy-Lamport Algorithm
rrwr
rw
P2
P1
ww
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 36/44
Chandy-Lamport Algorithm (Example)
A B
C
A
m1, 〈
mkr〉
〈mkr〉
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 37/44
Chandy-Lamport Algorithm (Example)
A B
C
A
m1, 〈
mkr〉
〈mkr〉
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 37/44
Chandy-Lamport Algorithm (Example)
A B
C
A
m1, 〈
mkr〉
B〈mkr〉
m2
B computes the state of channel AB as {}.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 38/44
Chandy-Lamport Algorithm (Example)
A B
C
A
C
m1
B
〈mkr〉, m
2
C computes the state of channel AC as {}.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 39/44
Chandy-Lamport Algorithm (Example)
A B
C
A
C
m1
B
{m2}
B computes the state of channel CB as {m2}.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 40/44
Chandy-Lamport Algorithm (Example)
Question: Is the computed snapshot a configuration of theactual execution?
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 41/44
Lai-Yang Algorithm
AssumptionsThis algorithm does not assume FIFO channels.
But it assumes message piggybacking.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 42/44
Lai-Yang Algorithm
The AlgorithmI Any initiator can decide to take a local snapshot.I As long as a process hs not taken a local snapshot, it appends false to
its outgoing basic messages.I When a process has taken its local snapshot, it appends true to each
outgoing basic message.I When a process that hasn’t yet taken a snapshot receives a message
with true or a control message (see next slide) for the first time, it takes alocal snapshot of its state before reception of this message.
I A process q computes as channel state of pq the basic messageswithout the tag true that it receives via pq after its local snapshot.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 43/44
Lai-Yang Algorithm
The AlgorithmQuestion: How does q know when it can determine the channel state of pq?
p sends a control message to q, informing q how many basic messageswithout the tag true p sent into pq.
These control messages also ensure that all processes eventually take alocal snapshot.
Dr. Borzoo Bonakdarpour Distributed Algorithms (CAS 769) - McMaster University 44/44