UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and...
-
date post
18-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and...
![Page 1: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/1.jpg)
UPV / EHU
Distributed Algorithms forFailure Detection and Consensus in
Crash, Crash-Recovery andOmission Environments
Mikel Larrea
Distributed Systems Group
University of the Basque Country, UPV/EHU
![Page 2: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/2.jpg)
2
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Context and Seminal Papers
• In the Consensus problem, all correct processes propose a value and must reach a unanimous and irrevocable decision on some proposed value
• [FLP85] M. Fischer, N. Lynch, M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 1985
• [CT96] T. Chandra, S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 1996
• [CHT96] T. Chandra, V. Hadzilacos, S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 1996
![Page 3: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/3.jpg)
3
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Motivation
![Page 4: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/4.jpg)
4
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Motivation++
(Zurich, July 2010)
![Page 5: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/5.jpg)
5
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Crash Failure Detectors [CT96]
![Page 6: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/6.jpg)
6
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Strengthening Completeness
![Page 7: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/7.jpg)
7
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Guest Stars: P and Omega
P: strong completeness, eventual strong accuracy– Eventually every process that crashes is
permanently suspected by every correct process– There is a time after which correct processes are
not suspected by any correct process
• Omega satisfies the following property:– There is a time after which all the correct
processes always trust the same correct process
• What is a correct process?– It depends on the failure model :-)
![Page 8: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/8.jpg)
8
UPV / EHU
Mikel Larrea − Mannheim, May 2011
FD-based Consensus
![Page 9: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/9.jpg)
9
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Fault-tolerant Architecture
![Page 10: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/10.jpg)
10
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Outline
• Part I: Crash Environments– (Near-) Communication-efficient algorithms for P– Communication-optimal algorithms for P
• Part II: Crash-Recovery Environments– Implementing Omega with/without stable storage– Communication-efficient algorithms for Omega– From Omega to P– Fault-tolerant aggregator election and data aggregation
in wireless sensor networks
• Part III: Omission Environments– Secure failure detection and consensus in TrustedPals– Communication-efficient algorithm for P
![Page 11: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/11.jpg)
UPV / EHU
Part I:
P in Crash Environments
Joint work withRoberto Cortiñas, Alberto Lafuente, Iratxe Soraluze, Joachim Wieland
![Page 12: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/12.jpg)
12
UPV / EHU
Mikel Larrea − Mannheim, May 2011
The First P Algorithm [CT96]
![Page 13: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/13.jpg)
13
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part I. Summary of Results
• Efficient implementations of P– Nearly communication-efficient algorithms (n+C
links are used forever) Q-based, transformations
– Communication-efficient algorithms (n links)• Pure ring-based, optimizations
• Optimal implementations of P– Communication-optimal algorithms (C links)
• RBcast-based, one-to-one, one-to-all
![Page 14: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/14.jpg)
14
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Reliable Broadcast [CT96]
“All correct processes deliverthe same set of messages”
![Page 15: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/15.jpg)
15
UPV / EHU
Mikel Larrea − Mannheim, May 2011
P in Crash Environments
• [WLL07] J. Wieland, M. Larrea, A. Lafuente. An evaluation of ring-based algorithms for the Eventually Perfect failure detector class. 15th International Conference on Parallel, Distributed and Network-based Processing, 2007
• [LSCL08] M. Larrea, I. Soraluze, R. Cortiñas, A. Lafuente. An Evaluation of Communication-Optimal P Algorithms. 16th International Conference on Parallel, Distributed and Network-based Processing, 2008
![Page 16: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/16.jpg)
UPV / EHU
Joint work withJosé Javier Astrain, Ernesto Jiménez,
Cristian Martín, Iratxe Soraluze
Part II:
Omega in Crash-Recovery Environments
![Page 17: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/17.jpg)
17
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part II. Summary of Results
• Redefinition of Omega– Take into account unstable processes– Take into account the availability of stable
storage
• Implementation of Omega– With and without stable storage– Efficient algorithms
• From Omega to P
• Fault-tolerant aggregator election and data aggregation in wireless sensor networks
![Page 18: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/18.jpg)
18
UPV / EHU
Mikel Larrea − Mannheim, May 2011
From Omega to P
![Page 19: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/19.jpg)
UPV / EHU
Joint work withRoberto Cortiñas, Felix Freiling, Marjan
Ghajar-Azadanlou, Alberto Lafuente, Lucia Penso, Iratxe Soraluze
Part III:
P in Omission Environments
![Page 20: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/20.jpg)
20
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part III. Summary of Results
• Reduction from Byzantine to omission– Processes are equipped with tamper proof
security modules (e.g., smartcards)
• Actually, omission + buffering/timing attacks
• Omission models– send | receive | general– permanent | transient– non-selective | selective
![Page 21: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/21.jpg)
21
UPV / EHU
Mikel Larrea − Mannheim, May 2011
Part III. Summary of Results
• Impossibility result P is impossible to implement in the (transient)
general omission model
• Redefinition and implementation of P– In-connected and out-connected processes– All-to-all communication, sequence numbers,
connectivity matrix
P-based Consensus– Termination: every in-connected process
eventually decides– Adaptation of Chandra-Toueg’s algorithm
![Page 22: UPV / EHU Distributed Algorithms for Failure Detection and Consensus in Crash, Crash-Recovery and Omission Environments Mikel Larrea Distributed Systems.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d235503460f949f9694/html5/thumbnails/22.jpg)
UPV / EHU
Distributed Algorithms forFailure Detection and Consensus in
Crash, Crash-Recovery andOmission Environments
Mikel Larrea
Distributed Systems Group
University of the Basque Country, UPV/EHU
Thank [email protected]