Distributed algorithms - Chapter 7 : Failure Detectors...
Transcript of Distributed algorithms - Chapter 7 : Failure Detectors...
1
Distributed Algorithms
Master 2 IFI, CSSR
Chapter 7 : Failure Detectors, Consensus, Self-Stabilization
Francesco Bongiovanni INRIA Sophia Antipolis Research Center
OASIS Team
[email protected] web site :
deptinfo.unice.fr/~baude/AlgoDistNov. 2009
2
Acknowledgement
The slides for this lecture are based on ideas and materials from the following sources:
Introduction to Reliable Distributed Programming Guerraoui, Rachid, Rodrigues, Luís, 2006, 300 p., ISBN: 3-540-28845-7 (+ teaching material)
ID2203 Distributed Systems Advanced Course by Prof. Seif Haridi from KTH – Royal Institute of Technology (Sweden)
CS5410/514: Fault-tolerant Distributed Computer Systems Course by Prof. Ken Birman from Cornell University
Distributed Systems : An Algorithmic Approach by Sukumar, Ghosh, 2006, 424 p.,ISBN:1-584-88564-5 (+teaching material)
Various research papers
3
Outline 1. Failure Detectors
Definition Properties – completeness and accuracy Classes of FDs Two algorithms : PFD and EPFD Leader Election vs Failure Detector
2. Consensus Definition Properties Types of Consensus : regular and uniform Algorithm: hierarchical consensus
3. Self-stabilization Principle Example: Dijkstra's Token ring
4
Failure detectors
5
System models
synchronous distributed system each message is received within bounded time each step in a process takes lb < time < ub each local clock’s drift has a known bound
asynchronous distributed system no bounds on process execution no bounds on message transmission delays arbitrary clock drifts
the Internet is an asynchronous distributed system
6
Arbitrary (Byzantine)
Crashes and recoveries
Omissions
Crashes
Failure model
First we must decide what do we mean by failure? Different types of failures
Crash-stop (fail-stop)
A process halts and
does not execute any
further operations Crash-recovery
A process halts, but then
recovers (reboots) after
a while
Crash-stop failures can be detected in synchronous systems
Next: detecting crash-stop failures in asynchronous systems
7
What's a Failure Detector ?
Pi
Pj
8
What's a Failure Detector ?
Pi
Pj
Crash failure
9
What's a Failure Detector ?
Pi
Pj
Crash failure
Needs to know about PJ's failure
10
1. Ping-ack protocol
Pi
Pj
Needs to know about PJ's failure
- Pi queries P
j once every T time units
- if Pj does not respond within T time units,
Pi marks p
j as failed
If pj fails, within T time units, p
i will
send it a ping message, and will time out within another T time units.
Detection time = 2T
ping
ack
- Pj replies
11
2. Heart-beating protocol
Pi
Pj
Needs to know about PJ's failure
- if Pi has not received a new heartbeat for the past
T time units, Pi declares P
j as failed
If pj has sent x heartbeats until the time it fails, then p
i will
timeout within (x+1)*T time units in the worst case, and will detect pj as failed.
heartbeat
- Pj maintains a sequence
number
- Pj send P
i a heartbeat with
incremented seq. number after T' (=T) time units
12
Failure Detectors
Abstracting time
FD provide information (not necessary fully accurate) about which processes have crashed
Use failure detectors to encapsulate timing assumptions Black box giving suspicions regarding node failures Accuracy of suspicions depends on model strength
13
Failure Detectors
Basic properties
Completeness
Every crashed process is suspected
Accuracy
No correct process is suspected
Both properties comes in two flavours Strong and Weak
14
Failure Detectors
Strong Completeness Every crashed process is eventually suspected by every correct process
Weak Completeness Every crashed process is eventually suspected by at least one correct process
Strong Accuracy No correct process is ever suspected
Weak Accuracy There is at least one correct process that is never suspected
15
Failure Detectors
Accuracy
CompletenessWeak
Weak Strong Eventually Weak
EventuallyStrong
W Q Eventual Weak◊W
◊Q
Strong
StrongS
PerfectP
Eventual Strong
◊S
Eventually Perfect
◊P
Classes of FDs
Synchronous Systems Asynchronous Systems
16
Perfect Failure Detector (P)
17
Perfect Failure Detector (P)
18
Correctness of P PFD1 (strong completeness)
A crashed node doesn’t send <heartbeat> Eventually every node will notice the absence of <heartbeat>
PFD2 (strong accuracy) Assuming local computation is negligible Maximum time between 2 heartbeats
γ + δ time units
If alive, all nodes will recv hb in time No inaccuracy
19
Eventually Perfect Failure Detector
20
Eventually Perfect Failure Detector
21
Correctness of EPFD
PFD1 (strong completeness) Same as before
PFD2 (eventual strong accuracy) Each time p is inaccurately suspected by a correct q
Timeout T is increased at q Eventually system becomes synchronous, and T becomes larger than the unknown
bound δ (T>γ +δ)
q will receive HB on time, and never suspect p again
22
Leader Election
23
Leader Election vs Failure Detection
Failure detection captures failure behavior Detect failed nodes
Leader election (LE) also captures failure behavior Detect correct nodes (a single & same for all)
Formally, leader election is an FD Always suspects all nodes except one (leader) Ensures some properties regarding that node
24
Leader Election vs Failure Detection
We’ll define two leader election algorithm
Leader election (LE) which “matches” P Eventual leader election (Ω) which “matches” eventual P
25
Matching LE and P P’s properties
P always eventually detects failures (strong completeness) P never suspects correct nodes (strong accuracy)
Completeness of LE Informally: eventually ditch crashed leaders Formally: eventually every correct node trusts some correct node
Accuracy of LE Informally: never ditch a correct leader Formally: No two correct nodes trust different correct nodes
Is this really accuracy? Yes! Assume two nodes trust different correct nodes One of them must eventually switch, i.e. leaving a correct node
26
LE desirable properties LE always eventually detects failures
Eventually every correct node trusts some correct node
LE is always accurate No two correct nodes trust different correct nodes
But the above two permit the following
But P1 is “inaccurately” leaving a correct leader
27
LE desirable properties To avoid “inaccuracy” we add
Local Accuracy: If a node is elected leader by p
i, all previously elected leaders
by pi have crashed
Not allowed, as P1 is
correct
28
Leader election - interface
29
Leader election - algorithm
30
Matching Ω and EPFD Eventual P weakens P by only providing eventual accuracy
Weaken LE to Ω by only guaranteeing eventual agreement
LE Properties:
LE1 (eventual completeness). Eventually every correct node trusts some correct node
LE2 (agreement). No two correct nodes trust different correct nodes
LE3 (local accuracy).If a node is elected leader by pi,
all previously elected leaders by pi have crashed
eventual
31
Eventual Leader election - interface
32
Eventual Leader election - algorithm
See in the book...
33
Consensus (agreement)
34
Consensus In the consensus problem, the processes propose values and have
to agree on one among these values
Solving consensus is key to solving many problems in distributed computing (e.g., total order broadcast, atomic commit, terminating reliable broadcast)
BA
C
35
Consensus – cannonical application
a set of servers implement a distributed database
a subset of servers participate in a particular transaction some of the servers may fail the remaining servers must agree on whether to install the
results of the transaction to the database or discard them
36
Consensus – cannonical application
37
Consensus – cannonical application
38
Consensus – basic properties
Termination Every correct node eventually decides
Agreement No two correct processes decide differently
Validity Any value decided is a value proposed
Integrity: A node decides at most once
39
FLP impossibility result Consensus in Asynchronous System
Impossibility of consensus in the fail-silent model
FPL (Fischer, Lynch and Peterson 1985) : consensus is impossible in the fail-silent model with deterministic processes, even if only one process crashes
No way to satisfy agreement (safety) and termination (liveness) together
40
How to solve consensus in asynchronous systems with crashes ?
How to solve consensus in the presence of crashes ?
Either we relaxed our system model, that is, we assume partial synchrony
Either we modify the specifications Constraining the set of inputs Change the termination property: terminates with some
probability
Or...
41
How to solve consensus in asynchronous systems with crashes ?
Intuitively consensus is impossible to solve because : 1) the decision depends on one process 2) we have no idea if this process is alive (we have to wait for its
message) or dead.
Thus we add to the asynchronous system what it needs in order to solve the consensus: Failure detectors
42
(regular) Consensus
43
(regular) Consensus
Sample execution
Question : does it satisfy consensus ?
44
Uniform consensus
45
Uniform consensus
Question: Does it satisfy uniform consensus ?
46
Hierarchical consensus
Use perfect fd (P) and best-effort bcast (BEB)
Each node stores its proposal in proposal Possible to adopt another proposal by changing proposal Store identity of last adopted proposer in lastprop
Loop through rounds 1 to N In round i
node i is leader and broadcasts proposal v, and decides proposal v
other nodes adopt i’s proposal v and remember lastprop i or detect crash of i
47
Hierarchical consensus idea
Basic idea of hierarchical consensus There must be a first correct leader p,
P decides its value v and bcasts v BEB ensures all correct nodes get v
Every correct node adopts v Future rounds will only propose v
48
Problem with orphan messages...
Only adopt from node i if i>lastProp?
49
Invariant to avoid orphans
Leader in round r might crash, but much later affect some node in round>r
Invariant adopt if proposer p is ranked lower than lastprop otherwise p has crashed and should be ignored
50
Execution without failure...
51
Execution with failure...
Is it uniform ?
52
Hierarchical consensus Impl. (1)
Last adopted proposal and Last adopted proposer id
53
Hierarchical consensus Impl. (2)
set node’s initial proposal,unless it has alreadyadopted another node’s
If I am leader
Trigger onceper round
Trigger if I have proposal
Permanently decide
Next round ifdeliver or crash
Invariant: only adopt “newer” than what you have
54
Correctness Validity
Always decide own proposal or adopted value
Integrity Rounds increase monotonically A node only decide once in the round it is leader
Termination Every correct node makes it to the round it is leader in
If some leader fails, completeness of P ensures progress
If leader correct, validity of BEB ensures delivery
55
Correctness (2) Agreement
No two correct nodes decide differently
Take correct leader with minimum id i By termination it will decide v It will BEB v
Every correct node gets v and adopts it
No older proposals can override the adoption
All future proposals and decisions will be v
How many failures can it tolerate? N-1
56
Self-stabilization
57
Recall
Main challenges in distributed systems: Failures Concurrency
In presence of (permanent) failures, a robust algorithm guarantees Liveness properties are eventually achieved Safety properties are never violated
58
Self-Stabilization
Self-stabilization is a different approach to fault tolerance
it considers transient (temporary) failures it is more optimistic
If bad thing happen (safety is violated), the system will recover within a finite time, and will behave nicely afterwards.
59
Definition
“A system is self-stabilizing when, regardless of its initial state, it is guaranteed to arrive at a legitimate state in a finite number of steps.” 1
Edsger W. Dijkstra
[1] Edsger W. Dijkstra, Self-stabilizing systems in spite of distributed control, Communications of the ACM, v.17 n.11, p.643644,Nov. 1974
60
Self-Stabilization
System S is self-stabilizing with respect to predicate P that identifies the legitimate states, if:
Convergence Starting from any arbitrary configuration, S is guaranteed to
reach a configuration satisfying P, within a finite number of state transitions.
Closure P is closed under the execution of S. That is, once in a
legitimate state, it will stay in a legitimate state.
61
Some advantages of Self-Stabilizing systems
No need for consistent initialization. Starting in any arbitrary state, the system will converge to a
legitimate state.
Possibility of sequential composition without the need for termination detection.
62
A self-stabilizing algorithm:Dijkstra's Token ring
63
Dijkstra's Token ring
A single token circulates over the ring and grants privilege to the process holding it.
N+1 processes: P0, P
2,….,P
n
Connected in a ring Predecessor of P
i
pred(Pi ) = P
(i-1) mod N+1
Successor of Pi
succ(Pi ) = P
(i+1) mod N+1
64
Token Ring stabilization
Pi has a local variable X
i
Xi can take values from 0 to K-1
(K >= N) Each process, can read the value
of its predecessor (Shared Memory Model)
There is a scheduler, which selects a process at each step, in a random but fair manner.
65
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
66
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
67
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
68
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
You have the token.
69
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
You have the token.
70
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
Fire: change your state.
71
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
Fire: change your state.
72
Token Ring stabilization
Transition rule for P1 to P
n
if Xi != X
i-1
Xi := X
i-1
Transition rule for P0
if X0= X
n
X0 := (X
0 + 1) mod K
Fire: change your state.
73
Legitimate or illegitimate?
74
Legitimate or illegitimate?
75
Legitimate or illegitimate?
76
Legitimate or illegitimate?
77
Legitimate or illegitimate?
78
Legitimate or illegitimate?
79
Proof of closure
80
Proof of closure
81
Proof of closure
82
Proof of closure
If there is only a single token in the ring, when the machine that owns the token fires, it loses the token and will give it to its successor, and to no one else.
This single token is handed over along the ring.
83
Proof of convergence
Lemma 1. P0 eventually receives the token.
- assume it does not have the token, i.e. X0!= X
n
- let j be the minimum value such that Xj!= X
0
- for all i < j: Xi= X
0 → X
j!= X
j+1 → P
j is privileged
- Pj will fire, thus increasing j if j < N, or making X
0 = X
n
if j = N.
→ P0 will eventually receive the token.
84
Proof of convergence
Initially all the process states are white.
85
Proof of convergence
Initially all the process states are white.
Whenever P0 fires, we colour
its state.
86
Proof of convergence
Initially all the process states are white.
Whenever P0 fires, we colour
its state. Whenever a state is copied
from a coloured state, it gets the colour
87
Proof of convergence
Initially all the process states are white.
Whenever P0 fires, we colour
its state. Whenever a state is copied
from a coloured state, it gets the colour
Whenever a state is checked upon a coloured state it gets the colour.
88
Proof of convergence
Initially all the process states are white.
Whenever P0 fires, we colour
its state. Whenever a state is copied
from a coloured state, it gets the colour
Whenever a state is checked upon a coloured state it gets the colour.
89
Proof of convergence
Initially all the process states are white.
Whenever P0 fires, we colour
its state. Whenever a state is copied
from a coloured state, it gets the colour
Whenever a state is checked upon a coloured state it gets the colour.
90
Proof of convergence
Lemma 2. At most after N firings at P0 , all the local
states are coloured.
- Assume h is the number of times that P0 fires when P
n is
white.
- For each firing, X0 has to be the same as X
n.
- So, Xn has taken h distinct values.
- These values can only be copied from other nodes in the ring.
- At the time of first firing, we can have at most N distinct values in the ring.
- Therefore, h is bounded to N.
91
Proof of convergence
If P0 initially starts at state K-1, the first N firings of P
0 have
created states 0 to N-1. (K >= N)
When P0 is in state N-1,
All the nodes are coloured. Scanning from P
0 to P
n, the state of the nodes is in a non-
increasing order . next firing at P
0 will happen when X
0 = X
n = N-1.
→At the time of Nth firing of P0, all the states are N-1.
92
Proof of convergence
93
Proof of correctness
Starting in an arbitrary state, we ended up in a legitimate state.
=> Convergence
We also showed that, once in a legitimate state, we will remain in a legitimate state.
=> Closure