Fast Leader (Full) Recovery despite Dynamic Faults
Ajoy K. Datta
Stéphane Devismes
Lawrence L. Larmore
Sébastien Tixeuil
Join Work
ICDCN, 04/01/2013, Mumbia
Ajoy K. Datta & Lawrence L. Larmore
Sébastien Tixeuil
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
A fault = a process state corruption
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Self-Stabilization [Dijkstra,74]
ICDCN, 04/01/2013, Mumbia
Recover after any number of
transient faults
Price of the Versatility
1. Several impossibility results– E.g., Leader Election and Token
Circulation in anonymous networks
2. The stabilization time usually depends on global parameters
(diameter, size of the network …)
ICDCN, 04/01/2013, Mumbia
Price of the Versatility
1. Several impossibility results– E.g., Leader Election and Token
Circulation in Anonymous Networks
2. The stabilization time usually depends on global parameters
(diameter, size of the network …)
ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system
• Self-Stabilization: Ω(D) rounds
ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system
• Self-Stabilization: Ω(D) rounds
• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]
– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]
• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]
ICDCN, 04/01/2013, Mumbia
When a few number of faults hit the system
• Self-Stabilization: Ω(D) rounds
• Stronger forms:– Fault Containment [Ghosh et al, Dist Comp 2007]
– k-adaptive Self-Stabilization [Burman et al, OPODIS’05]
• Weakened forms:– k-stabilization [Beauquier et al, PODC’98]
ICDCN, 04/01/2013, Mumbia
Fault-Containment
• Pros– Self-stabilizing– If f ≤ k faults, stabilization time in O(f) rounds– Containment radius– Fault gap is small
• Cons (currently) – k=1, or– Surrounded by a majority of correct processes, or – Synchronous setting, or– Probabilistic recovery
ICDCN, 04/01/2013, Mumbia
Fault gap• The minimum time between consecutive faulty
transitions to have O(f) recovery time
ICDCN, 04/01/2013, Mumbia
Legitimate
Illegitimate
≥ Fault gap
O(f)
Fault gap• The minimum time between consecutive faulty
transitions to have O(f) recovery time
ICDCN, 04/01/2013, Mumbia
Legitimate
Illegitimate
< fault gap
>Ω(D)
Time-Adaptive Self-stabilization
• Self-Stabilization
• If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), – “output” stabilization in O(f) rounds
ICDCN, 04/01/2013, Mumbia
Output vs. State Stabilization
ICDCN, 04/01/2013, Mumbia
Legitimate
Correct OutputO(f)
>Ω(D)
Illegitimate
f ≤ k faults
Output vs. State Stabilization
ICDCN, 04/01/2013, Mumbia
Legitimate
Correct OutputO(f)
>Ω(D)
Illegitimate
f ≤ k faults
The fault gap depends on global parameters
k-Stabilization (first definition)
ICDCN, 04/01/2013, Mumbia
If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous,the system eventually recoversOtherwise no guarantee
k-Stabilization (first definition)
• Pros– Can solve more problems than self-stabilization– Usually, only-k-dependent stabilization time– Usually, only-k-dependent fault gap
• Cons– Not self-stabilizing– Static faults: f ≤ k faults should occur in a single
transition ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization
• Faulty transition = one process state corruption
• Dynamic faults: – if f ≤ k faulty transitions occur
in an arbitrary manner• The system eventually recovers
ICDCN, 04/01/2013, Mumbia
Our definition of k-stabilization
ICDCN, 04/01/2013, Mumbia
Legitimate
Illegitimate
1 fault 1 fault 1 fault
f ≤ k faults
Our contribution
• Leader recovery protocol– On an anonymous (yet oriented) ring– Asynchronous atomic read/write
– k-stabilizing if n ≥ 18k + 1– Stabilization time O(k2) rounds– Log(k) bits per process– This problem is unsolvable in self-stabilizing setting
ICDCN, 04/01/2013, Mumbia
Our contribution
ICDCN, 04/01/2013, Mumbia
The system stars in a legitimate configuration where one process is elected
Our contribution
ICDCN, 04/01/2013, Mumbia
Some faulty transitions occurs in an arbitrary manner
Our contribution
ICDCN, 04/01/2013, Mumbia
Some faulty transitions occurs in an arbitrary manner
Fault propagation
Our contribution
ICDCN, 04/01/2013, Mumbia
Some faulty transitions occurs in an arbitrary manner
Fault propagation
Our contribution
ICDCN, 04/01/2013, Mumbia
If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds
Our contribution
ICDCN, 04/01/2013, Mumbia
If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds
Our contribution
ICDCN, 04/01/2013, Mumbia
If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds
Our contribution
ICDCN, 04/01/2013, Mumbia
If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds
Our contribution
ICDCN, 04/01/2013, Mumbia
If n ≥ 18k + 1, the system recovers the same leader inO(k2) rounds
Fault gap
ICDCN, 04/01/2013, Mumbia
Legitimate
Illegitimate
f ≤ k faulty transition
f ≤ k faulty transitions
0 0O(k2) rounds
Main ideas of the algorithm
ICDCN, 04/01/2013, Mumbia
Vote = Relative Address {-∈3k..3k} { }∪ ⊥
ICDCN, 04/01/2013, Mumbia
0
⊥⊥
3
2
1-1
-2
-3
⊥
3k
Interval of relevance:6+1 votes
After k faults
ICDCN, 04/01/2013, Mumbia
0
⊥⊥
3
2
1-1
-2
-3
⊥
After k faults
ICDCN, 04/01/2013, Mumbia
0
⊥⊥
3
0
1-1
-2
-3
⊥
After k faults
ICDCN, 04/01/2013, Mumbia
1
⊥⊥
3
0
1 0
-2
-3
⊥
At most 3k processes change their votes
After k faults
ICDCN, 04/01/2013, Mumbia
1
⊥⊥
3
0
1 0
-2
-3
⊥
At most 3k processes change their votes
Always a majority of votes for the previous leader
Rumors
ICDCN, 04/01/2013, Mumbia
1
1
Vote
Rumor
In a legitimate state, Vote = Rumor, for all process
Main idea:Vote: hard to change Rumor: easy to change
Rumors
ICDCN, 04/01/2013, Mumbia
1
2
Vote
Rumor If Rumor ≠ Vote• If Rumor ≠ ⊥
• Candidate ← Rumor• Else
• Candidate ← VoteInitiate Query(Candidate)
Rumors
ICDCN, 04/01/2013, Mumbia
1
2
Vote
Rumor Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and
Count the votes for the candidate
Query Return
• If at least 3k+1 votes for the Candidate
– If Rumor ≠ ≠ Candidate⊥• Initiate a Denial of rumor in its interval of relevance
– Vote←Candidate
– Rumor←Candidate
• Else
– If Rumor = Candidate, then Rumor←⊥– Initiate a Denial of Candidate in its interval of relevance
– If Vote = Candidate, then Vote← ⊥
ICDCN, 04/01/2013, Mumbia
Query Tracks
ICDCN, 04/01/2013, Mumbia
Other tracks
• Denial (to kill a rumor)
• To manage lost queries– Probe wave– Report
(see the paper)
ICDCN, 04/01/2013, Mumbia
Deadlock Prevention
• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers
ICDCN, 04/01/2013, Mumbia
Deadlock Prevention
• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers
• Only a process that holds both its left and right resources can initiate a query
ICDCN, 04/01/2013, Mumbia
Deadlock Prevention
• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers
• Only a process that holds both its left and right resources can initiate a query
• So, at any time at most n/2 pending initiated query
ICDCN, 04/01/2013, Mumbia
Deadlock Prevention
• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers
• Only a process that holds both its left and right resources can initiate a query
• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-
initiated queries
ICDCN, 04/01/2013, Mumbia
Deadlock Prevention
• Each two neighboring processes share a resource– Think of chopstick between 2 philosophers
• Only a process that holds both its left and right resources can initiate a query
• So, at any time at most n/2 pending initiated query• Now, we can have up to 9k rogue queries, i.e., non-
initiated queries• So, n > n/2+9k, that is n ≥ 18k + 1
ICDCN, 04/01/2013, Mumbia
Conclusion
• Less restrictive definition of k-stabilization
• Using this definition, we solve a problem having no self-stabilizing solution:– Leader recovery protocol
• On an anonymous (yet oriented) ring• Only-k-dependent complexity:
– Stabilization time O(k2) rounds– Log(k) bits per process
ICDCN, 04/01/2013, Mumbia
Thank You!ICDCN, 04/01/2013, Mumbia
Top Related