CS 294-8 Consensus cs.berkeley/~yelick/294
-
Upload
theodore-witt -
Category
Documents
-
view
28 -
download
3
description
Transcript of CS 294-8 Consensus cs.berkeley/~yelick/294
CS294, Yelick Consensus, p1
CS 294-8Consensus
http://www.cs.berkeley.edu/~yelick/294
CS294, Yelick Consensus, p2
Agenda• Overview and Administrivia• Specifications and verification• Consensus• Practical Issues in Consensus
• Note: due in part to unreliable network and lack of reliability in ppt, most slides are stolen from Lamport.
CS294, Yelick Consensus, p3
Administrivia• So far: readings in distributed and
fault tolerant systems• Next: specifying and reasoning
about these systems• Readings for next few weeks will
be set by Thursday• For Thursday:
– SmartBridge (for talk)– Frangiapani (for discussion)
CS294, Yelick Consensus, p4
Course Overview• So far: reading “systems” papers• Next few weeks: reading papers on
algorithms and proofs
• Why? – I know my algorithm works, but…– I found a missing case when I was
implementing…– My advisor (or the PC) doesn’t believe me…
CS294, Yelick Consensus, p5
Agenda• Overview and Administrivia• Specifications and verification• Consensus• Practical Issues in Consensus
CS294, Yelick Consensus, p6
Highly Available Computing• High availability means either perfection
or redundancy.– The system can work even when some parts are
broken.
• The simplest redundancy is replication:– Several copies of each part.– Each non-faulty copy does the same thing.
• Every computing system works as a state machine.
• So a replicated state machine can do highly available computing.
CS294, Yelick Consensus, p7
Replicated State Machines• If a state machine is deterministic, then
feeding two copies the same inputs will produce the same outputs and states.– We call each copy a process.– So all we need is to agree on the inputs.
• Examples:– Replicated storage with Read(a) and
Write(a, d) steps.– Airplane flight control system with
ReadInstrument(i) and RaiseFlaps(d) steps.
CS294, Yelick Consensus, p8
State Machine Approach• A distributed system is:
– A finite set of processes
• A process is: – A set of states, with one initial state– A set of events or actions
• An execution is a possibly infinite sequence of alternating states/actions
s0 s1 s20 1 2
CS294, Yelick Consensus, p9
Properties• A stuttering transition has the form s s• A property is a set of executions
closed under stuttering [Abadi, Lamport 1990]– The clock still ticks after a program
temrinates– Stuttering is also a useful in mapping
between levels of abstraction
0
CS294, Yelick Consensus, p10
Safety Properties• Informally: A safety properties is one that
says something bad doesn’t happen• Formally: A property P is a safety property
iff:– If is in P then any finite prefix of is in P
• Additionally, – If is not in P then there is some finite prefix of
that is not in P • There is a point at which an illegal transition occurred
– Safety properties can be finitely refuted.
CS294, Yelick Consensus, p11
Liveness Properties• Informally: A liveness property says
something good eventually happens• Formally: A property P is a liveness
property iff:– If every finite behavior is a prefix of some
behavior in P
• Additionally, – Can always “complete” a finite behavior
into one that is in P– Safety properties cannot be finitely refuted.
CS294, Yelick Consensus, p12
Safety and Liveness• Every property (I.e., every set of
behaviors) is the conjunction of:– A safety property and– A liveness property
• Due to Alpern and Schneider, based on basic results from Topology
CS294, Yelick Consensus, p13
Visible Behavior• A specification identifies a subset of its
actions (or its state variables) as externally visible.
• A state machine defines a set of allowable executions:– state: a set of values, usually divided into
named variables.– actions: named changes in the state; internal and external.
• They may be nondeterministic– In fact, Lampson encourages this in specs to
allow flexibility in implementations
CS294, Yelick Consensus, p14
Implements• Y implements X if
– every external behavior of Y is an external behavior of X,
• This expresses the idea that Y implements X if you can’t tell Y apart from X by looking only at the external actions
• Examples: abstract data types, databases, distributed systems
• Note: Lampson implicitly deals with finite behaviors, and therefore states the liveness property separately. (Doesn’t treat liveness in the proofs.)
CS294, Yelick Consensus, p15
Agenda• Overview and Administrivia• Specifications and verification• Consensus• Practical Issues in Consensus
CS294, Yelick Consensus, p16
Use of Consensus• Agreeing on some value is called consensus.
• A replicated state machine needs to agree on a sequence of values:– Input 1 Write(x, 3)– Input 2 Read(x)– . . .
CS294, Yelick Consensus, p17
Paxos Assumptions• Each legislator has
– A ledger (stable storage)– An hourglasses for time
• Communication– Point-to-point, fully connected network – Unreliable: loss and delay allowed
• Failures– Legislators may come and go (processor
failure)– They are honest – no byzantine failures
CS294, Yelick Consensus, p18
Agenda• Overview and Administrivia• Specifications and verification• Consensus• Practical Issues in Consensus
CS294, Yelick Consensus, p19
Summary• How to build a highly available system
using consensus.– Run a replicated deterministic state machine, and get
consensus on each input.– Use leases to replace most of the consensus steps with
actions by one process.
• The most fault-tolerant algorithm for consensus without real-time guarantees.– Lamport’s “Paxos” algorithm, based on
• How to design and understand a concurrent, fault-tolerant system.– Write a simple spec as a state machine.– Define abstract function and show simulation.