07 Ali Lecture

download 07 Ali Lecture

of 15

Transcript of 07 Ali Lecture

  • 7/28/2019 07 Ali Lecture

    1/15

    Distributed Systems Overview

    Ali Ghodsi

    [email protected]

  • 7/28/2019 07 Ali Lecture

    2/15

    Replicated State Machine (RSM)

    Distributed Systems 101

    Fault-tolerance(partial, byzantine, recovery,...)

    Concurrency(ordering, asynchrony, timing,...)

    Genericsolution for distributed systems:

    Replicated State Machine approach

    Represent your system with a deterministic state machine

    Replicate the state machine

    Feed input to all replicas in the same order

  • 7/28/2019 07 Ali Lecture

    3/15

    Total Order Reliable Broadcast

    aka Atomic Broadcast

    Reliable broadcast

    All or none correct nodes get the message

    (even if src fails)

    Atomic Broadcast

    Reliable broadcast that guarantees:

    All messages delivered in the same order

    Replicated state machine trivial with atomicbroadcast

  • 7/28/2019 07 Ali Lecture

    4/15

    Consensus?

    Consensus problem All nodes propose a value

    All correct nodes must agree on one of the values

    Must eventuallyreach a decision (availability)

    Atomic Broadcast Consensus Broadcast proposal, Decide on first received value

    Consensus Atomic Broadcast Unreliably broadcast message to all

    1 consensus per round:

    propose set of messages seen but not delivered

    Each round deliver one decided message

    Atomic Broadcast equivalent to Atomic Broadcast

  • 7/28/2019 07 Ali Lecture

    5/15

    Consensus impossible

    No deterministic 1-crash-robust consensus algorithmexists for the asynchronous model

    1-crash-robust

    Up to one node may crash

    Asynchronous model

    No global clock

    No bounded message delay

    Life after impossibility of consensus? What to do?

  • 7/28/2019 07 Ali Lecture

    6/15

    Solving Consensus with Failure Detectors

    Black box that tells us if a node has failed

    Perfect failure detector

    Completeness It will eventually tell us if a node has failed

    Accuracy(no lying)

    It will never tell us a node

    has failed if it hasnt

    Perfect FD Consensus

    xi = input

    for r:=1 to N do

    ifr=p then

    forall j do send to j;

    decide xiifcollect from r then

    xi = x;

    end

    decide xi

  • 7/28/2019 07 Ali Lecture

    7/15

    Solving Consensus

    Consensus Perfect FD?

    No. Dont know if a node actually failed or not!

    Whats the weakest FD to solve consensus?

    Least assumptions on top of asynchronous model!

  • 7/28/2019 07 Ali Lecture

    8/15

    Enter Omega

    Leader Election Eventually every correct node trusts some correct node

    Eventually no two correct nodes trust different correct nodes

    Failure detection and leader election are the same Failure detection captures failure behavior

    detect failed nodes

    Leader election also captures failure behavior Detect correct nodes (a single & same for all)

    Formally, leader election is an FD Always suspects all nodes except one (leader)

    Ensures some properties regarding that node

  • 7/28/2019 07 Ali Lecture

    9/15

    Weakest Failure Detector for Consensus

    Omega the weakest failure detector for

    consensus

    How to prove it?

    Easy to implement in practice

  • 7/28/2019 07 Ali Lecture

    10/15

    High Level View of Paxos

    Elect a single proposer using Proposer imposes its proposal to everyone Everyone decides Done!

    Problem with Several nodes might initially be proposers (contention)

    Solution is abortable consensus

    Proposer attempts to enforce decision Might abortif there is contention (safety) ensures eventually 1 proposer succeeds (liveness)

    10

  • 7/28/2019 07 Ali Lecture

    11/15

    Replicated State Machine

    Paxos approach(Lamport)

    Client sends input to leader Paxos

    Leader executes Paxos instance to agree on command

    Well-understood, many papers, optimizations

    View-stamp approach (Liskov)

    Have one leader that writes commands to a quorum

    (no Paxos)

    When failures happen, use Paxos to agree

    Less understood (Mazieres tutorial)

  • 7/28/2019 07 Ali Lecture

    12/15

    Paxos Siblings

    Cheap Paxos (LM04)

    Fewer messages

    Directly contact a quorum (e.g. 3 nodes out of 5)

    If fail to get response from 3, expand to 5

    Fast Paxos (L06)

    Reduce from 3 delays to 2 delays (delays ~ delays) Clients optimistically write to a quorum

    Requires recovery

  • 7/28/2019 07 Ali Lecture

    13/15

    Paxos Siblings

    Gaios/SMARTER (Bolosky11)

    Make logging to disk efficient for crash-recovery

    Uses pipelining and batching

    Generalized Paxos (LM05)

    Commutative operations for repl. state machine

  • 7/28/2019 07 Ali Lecture

    14/15

    Atomic Commit

    Atomic Commit

    Commit IFF no failures and everyone votes commit

    Else Abort

    Consensus on Transaction Commit (LG04)

    One Paxos instance for every TM

    Only commit if every instance said Commit

  • 7/28/2019 07 Ali Lecture

    15/15

    Reconfigurable Paxos

    Change the set of nodes Replace failed nodes

    Add/remove new nodes (change size of quorum)

    Lamports idea Part of the state of state-machine: set of nodes

    SMART (Eurosys06) Many problems (e.g. {A,B,C}->{A,B,D} and A fails)

    Basic idea, run multiple Paxos instances side by side