07 Ali Lecture
-
Upload
nishat-korada -
Category
Documents
-
view
213 -
download
0
Transcript of 07 Ali Lecture
-
7/28/2019 07 Ali Lecture
1/15
Distributed Systems Overview
Ali Ghodsi
-
7/28/2019 07 Ali Lecture
2/15
Replicated State Machine (RSM)
Distributed Systems 101
Fault-tolerance(partial, byzantine, recovery,...)
Concurrency(ordering, asynchrony, timing,...)
Genericsolution for distributed systems:
Replicated State Machine approach
Represent your system with a deterministic state machine
Replicate the state machine
Feed input to all replicas in the same order
-
7/28/2019 07 Ali Lecture
3/15
Total Order Reliable Broadcast
aka Atomic Broadcast
Reliable broadcast
All or none correct nodes get the message
(even if src fails)
Atomic Broadcast
Reliable broadcast that guarantees:
All messages delivered in the same order
Replicated state machine trivial with atomicbroadcast
-
7/28/2019 07 Ali Lecture
4/15
Consensus?
Consensus problem All nodes propose a value
All correct nodes must agree on one of the values
Must eventuallyreach a decision (availability)
Atomic Broadcast Consensus Broadcast proposal, Decide on first received value
Consensus Atomic Broadcast Unreliably broadcast message to all
1 consensus per round:
propose set of messages seen but not delivered
Each round deliver one decided message
Atomic Broadcast equivalent to Atomic Broadcast
-
7/28/2019 07 Ali Lecture
5/15
Consensus impossible
No deterministic 1-crash-robust consensus algorithmexists for the asynchronous model
1-crash-robust
Up to one node may crash
Asynchronous model
No global clock
No bounded message delay
Life after impossibility of consensus? What to do?
-
7/28/2019 07 Ali Lecture
6/15
Solving Consensus with Failure Detectors
Black box that tells us if a node has failed
Perfect failure detector
Completeness It will eventually tell us if a node has failed
Accuracy(no lying)
It will never tell us a node
has failed if it hasnt
Perfect FD Consensus
xi = input
for r:=1 to N do
ifr=p then
forall j do send to j;
decide xiifcollect from r then
xi = x;
end
decide xi
-
7/28/2019 07 Ali Lecture
7/15
Solving Consensus
Consensus Perfect FD?
No. Dont know if a node actually failed or not!
Whats the weakest FD to solve consensus?
Least assumptions on top of asynchronous model!
-
7/28/2019 07 Ali Lecture
8/15
Enter Omega
Leader Election Eventually every correct node trusts some correct node
Eventually no two correct nodes trust different correct nodes
Failure detection and leader election are the same Failure detection captures failure behavior
detect failed nodes
Leader election also captures failure behavior Detect correct nodes (a single & same for all)
Formally, leader election is an FD Always suspects all nodes except one (leader)
Ensures some properties regarding that node
-
7/28/2019 07 Ali Lecture
9/15
Weakest Failure Detector for Consensus
Omega the weakest failure detector for
consensus
How to prove it?
Easy to implement in practice
-
7/28/2019 07 Ali Lecture
10/15
High Level View of Paxos
Elect a single proposer using Proposer imposes its proposal to everyone Everyone decides Done!
Problem with Several nodes might initially be proposers (contention)
Solution is abortable consensus
Proposer attempts to enforce decision Might abortif there is contention (safety) ensures eventually 1 proposer succeeds (liveness)
10
-
7/28/2019 07 Ali Lecture
11/15
Replicated State Machine
Paxos approach(Lamport)
Client sends input to leader Paxos
Leader executes Paxos instance to agree on command
Well-understood, many papers, optimizations
View-stamp approach (Liskov)
Have one leader that writes commands to a quorum
(no Paxos)
When failures happen, use Paxos to agree
Less understood (Mazieres tutorial)
-
7/28/2019 07 Ali Lecture
12/15
Paxos Siblings
Cheap Paxos (LM04)
Fewer messages
Directly contact a quorum (e.g. 3 nodes out of 5)
If fail to get response from 3, expand to 5
Fast Paxos (L06)
Reduce from 3 delays to 2 delays (delays ~ delays) Clients optimistically write to a quorum
Requires recovery
-
7/28/2019 07 Ali Lecture
13/15
Paxos Siblings
Gaios/SMARTER (Bolosky11)
Make logging to disk efficient for crash-recovery
Uses pipelining and batching
Generalized Paxos (LM05)
Commutative operations for repl. state machine
-
7/28/2019 07 Ali Lecture
14/15
Atomic Commit
Atomic Commit
Commit IFF no failures and everyone votes commit
Else Abort
Consensus on Transaction Commit (LG04)
One Paxos instance for every TM
Only commit if every instance said Commit
-
7/28/2019 07 Ali Lecture
15/15
Reconfigurable Paxos
Change the set of nodes Replace failed nodes
Add/remove new nodes (change size of quorum)
Lamports idea Part of the state of state-machine: set of nodes
SMART (Eurosys06) Many problems (e.g. {A,B,C}->{A,B,D} and A fails)
Basic idea, run multiple Paxos instances side by side