07 Ali Lecture

7/28/2019 07 Ali Lecture

1/15

Distributed Systems Overview

Ali Ghodsi

[email protected]


2/15

Replicated State Machine (RSM)

Distributed Systems 101

Fault-tolerance(partial, byzantine, recovery,...)

Concurrency(ordering, asynchrony, timing,...)

Genericsolution for distributed systems:

Replicated State Machine approach

Represent your system with a deterministic state machine

Replicate the state machine

Feed input to all replicas in the same order


3/15

Total Order Reliable Broadcast

aka Atomic Broadcast

Reliable broadcast

All or none correct nodes get the message

(even if src fails)

Atomic Broadcast

Reliable broadcast that guarantees:

All messages delivered in the same order

Replicated state machine trivial with atomicbroadcast


4/15

Consensus?

Consensus problem All nodes propose a value

All correct nodes must agree on one of the values

Must eventuallyreach a decision (availability)

Atomic Broadcast Consensus Broadcast proposal, Decide on first received value

Consensus Atomic Broadcast Unreliably broadcast message to all

1 consensus per round:

propose set of messages seen but not delivered

Each round deliver one decided message

Atomic Broadcast equivalent to Atomic Broadcast


5/15

Consensus impossible

No deterministic 1-crash-robust consensus algorithmexists for the asynchronous model

1-crash-robust

Up to one node may crash

Asynchronous model

No global clock

No bounded message delay

Life after impossibility of consensus? What to do?


6/15

Solving Consensus with Failure Detectors

Black box that tells us if a node has failed

Perfect failure detector

Completeness It will eventually tell us if a node has failed

Accuracy(no lying)

It will never tell us a node

has failed if it hasnt

Perfect FD Consensus

xi = input

for r:=1 to N do

ifr=p then

forall j do send to j;

decide xiifcollect from r then

xi = x;

end

decide xi


7/15

Solving Consensus

Consensus Perfect FD?

No. Dont know if a node actually failed or not!

Whats the weakest FD to solve consensus?

Least assumptions on top of asynchronous model!


8/15

Enter Omega

Leader Election Eventually every correct node trusts some correct node

Eventually no two correct nodes trust different correct nodes

Failure detection and leader election are the same Failure detection captures failure behavior

detect failed nodes

Leader election also captures failure behavior Detect correct nodes (a single & same for all)

Formally, leader election is an FD Always suspects all nodes except one (leader)

Ensures some properties regarding that node


9/15

Weakest Failure Detector for Consensus

Omega the weakest failure detector for

consensus

How to prove it?

Easy to implement in practice


10/15

High Level View of Paxos

Elect a single proposer using Proposer imposes its proposal to everyone Everyone decides Done!

Problem with Several nodes might initially be proposers (contention)

Solution is abortable consensus

Proposer attempts to enforce decision Might abortif there is contention (safety) ensures eventually 1 proposer succeeds (liveness)

10


11/15

Replicated State Machine

Paxos approach(Lamport)

Client sends input to leader Paxos

Leader executes Paxos instance to agree on command

Well-understood, many papers, optimizations

View-stamp approach (Liskov)

Have one leader that writes commands to a quorum

(no Paxos)

When failures happen, use Paxos to agree

Less understood (Mazieres tutorial)


12/15

Paxos Siblings

Cheap Paxos (LM04)

Fewer messages

Directly contact a quorum (e.g. 3 nodes out of 5)

If fail to get response from 3, expand to 5

Fast Paxos (L06)

Reduce from 3 delays to 2 delays (delays ~ delays) Clients optimistically write to a quorum

Requires recovery


13/15

Paxos Siblings

Gaios/SMARTER (Bolosky11)

Make logging to disk efficient for crash-recovery

Uses pipelining and batching

Generalized Paxos (LM05)

Commutative operations for repl. state machine


14/15

Atomic Commit

Atomic Commit

Commit IFF no failures and everyone votes commit

Else Abort

Consensus on Transaction Commit (LG04)

One Paxos instance for every TM

Only commit if every instance said Commit


15/15

Reconfigurable Paxos

Change the set of nodes Replace failed nodes

Add/remove new nodes (change size of quorum)

Lamports idea Part of the state of state-machine: set of nodes

SMART (Eurosys06) Many problems (e.g. {A,B,C}->{A,B,D} and A fails)

Basic idea, run multiple Paxos instances side by side

07 Ali Lecture

Documents

Transcript of 07 Ali Lecture