Paxos and Replicated State Machine (RSM)

41
Replicated State Machine (RSM)

description

Paxos and Replicated State Machine (RSM). Outline. Basic Concepts of Replicated State Machine Paxos Made Simple. Replicated State Machine. We can replicate data, how can we guarantee it is correctly replicated? How can we replicate computing? Using replicated state machine. - PowerPoint PPT Presentation

Transcript of Paxos and Replicated State Machine (RSM)

Page 1: Paxos  and Replicated State Machine (RSM)

Paxos and Replicated State Machine (RSM)

Page 2: Paxos  and Replicated State Machine (RSM)

OutlineBasic Concepts of Replicated State Machine

Paxos Made Simple

Page 3: Paxos  and Replicated State Machine (RSM)

Replicated State MachineWe can replicate data, how can we guarantee it is correctly replicated?

How can we replicate computing?

Using replicated state machine.

Suppose the process execute operations deterministically (or can be made deterministically).

If a group of server start with the same initial state and execute the same sequence of operations, the final state should be the same.

Page 4: Paxos  and Replicated State Machine (RSM)

Replicated State MachineSo, we can start a group of processes and made them to execute the same sequence of operations. Thus using multiple processes can be used for the purpose of fault tolerant.

Many uses:lock servers as in the lab, reliable database, reliable replicated file systems

Page 5: Paxos  and Replicated State Machine (RSM)

What is the crucial for the RSMs?The member in RSM should agree on the order of operation series.

Thus, when there are two or more alternative operations, the system should decide which one should be chosen.

Decide means: each one in the RSM agree that they will perform a specific operation and only that operation.

So, the problem can be reduced to a consensus problem:◦ How can a group of processes can agree on “something”. Lets say here

something means the operations (values!) that will be taken by the RSMs. The fundamental is how a group of process agree on a single value.

Page 6: Paxos  and Replicated State Machine (RSM)

Consensus: decide a single valueFLP tells us that with one faulty process it is impossible to achieve distributed consensus. ◦ Michael J. Fischer, Nancy Lynch, and Michael S. Paterson. Impossibility of

distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985.

◦ FLP is valid for (all) general environment (asynchronous network)◦ In practical, the situation is not that bad. We’ve spent a lot of money on clusters

and networks, they should not work that poor!

Paxos works for a partial asynchrony environment (which is a practical assumption) and can achieve consensus eventually.◦ e.g. works for the cluster environment.◦ Paxos can be used to decide a value among a group of process (nodes,

computers)◦ Lesli Lamport, Paxos Made Simple, 01 Nov 2001

Page 7: Paxos  and Replicated State Machine (RSM)

PaxosPaxos makes a group of processes agree on the same value despite process failures, network failures, and network delays.

Thus, it can be used as the building block for RSM◦ All processes in the group will decide the same value for the next operation.

To make the algorithm meaningful, we discard the consensus on trivial solution: lock a predefined value in each process and make them just accept that value.

Page 8: Paxos  and Replicated State Machine (RSM)

The overall structure of PaxosAssume a collection of processes that can propose values. (Where do values come from? Someone has to make proposals. We call them as Proposers.)

Someone should accept a proposal or reject it. Based on whether accepters accept or reject a value, the value might be chosen. We call them as Accepters.

If a value has been chosen, then processes should be able to learn the chosen value. (Eventually someone will know that the system has decided on a value. We call the persons to learn the current states of system as Learners.)

No oracle, each process can only do the work based on the steps described by the ‘possible’ algorithm, using the local states as well as the messages received from others.

Proposers, Accepters as well as Learners (Agents) are the three roles in the consensus algorithm. In an implementation, a single process may act as more than one agent.

Page 9: Paxos  and Replicated State Machine (RSM)

Again, the goalThe goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then processes can eventually learn the value.

(Chosen means something (the single value) has been decided (locked in the system.)

And eventually means no time bound we can know that a value is chosen

Page 10: Paxos  and Replicated State Machine (RSM)

Recall the two properties of distributed algorithmsSafety (Correctness)◦ Bad things never happen.◦ Any process in the group should not decide a different value than others.◦ The value should be meaningful. ( NOP for all operations! Only proposed

value can be chosen)

Liveness◦ Good things eventually happen◦ Eventually, the process will all agree on a single value.

Page 11: Paxos  and Replicated State Machine (RSM)

Safety requirements for consensus:Only a value that has been proposed may be chosen

Only a single value is chosen

A process never learns that a value has been chosen unless it actually has been

Liveness means eventually a single value will be chosen and we will leave this issue to the end of this lecture.

Page 12: Paxos  and Replicated State Machine (RSM)

Assumptions of Paxos Agents can communicate with one another by sending messages

Agents operate at arbitrary speed, may fail by stopping, and may restart. Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted (by using hard disks)

Messages can take arbitrarily long to be delivered, can be duplicated and can be lost, but they are not corrupted (no byzantine fault, no code penetration)

This model can fit to some practical environments such as clusters in a data center.

Page 13: Paxos  and Replicated State Machine (RSM)

The consensus is hardNetwork failure

Process failure

Network delay

Membership change: A process join and leave the system

Page 14: Paxos  and Replicated State Machine (RSM)

A single accepter?How: ◦ a proposer sends a proposal to the accepter, and the accepter chose the first

proposed value that it receives

Work?◦ No, the single accepter can fail

So, if an algorithm might work, it should use multiple accepters.

A proposer sends a proposed value to a set of acceptors. An acceptor may (or may not) accept the proposed value.

Chosen (Decided): The value is chosen when a large enough set of acceptors have accepted it.

Page 15: Paxos  and Replicated State Machine (RSM)

How large is large enough?Chosen (Decided): The value is chosen when a large enough set of acceptors have accepted it.

To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents.

Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value.

You can define this in some other way as “majority” or “large enough”.

Page 16: Paxos  and Replicated State Machine (RSM)

First requirement we should meetWe are very lucky that there is no failure, no message loss, no network delay. Everything works very well. We want a value to be chosen even if only one value is proposed by a single proposer. (Everything works very well, of course we can expect this.)

This suggests the requirement: (if an algorithm really works)

P1. An acceptor must accept the first proposal that it receives.

Page 17: Paxos  and Replicated State Machine (RSM)

But…… Every accepter has accepted a value, but no single value is accepted by a “Majority” of them.

Even only two proposed values, failure of a single accepter could make it impossible to learn which of the values was chosen.

Page 18: Paxos  and Replicated State Machine (RSM)

ProposalP1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal.

An algorithm which might work should use multiple proposals. We can differentiate the proposals by tagging with a natural number.

A proposal: <proposal_number, proposal_value>

Different proposals have different numbers (but may have the same values).

Page 19: Paxos  and Replicated State Machine (RSM)

Different proposals from different proposers

One method, you can define others. What is the way in the lab?

Page 20: Paxos  and Replicated State Machine (RSM)

Proposal Chosen?A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. Notice that we say that a proposal is chosen which means both number and value. In that case, we say that the proposal (as well as its value) has been chosen.

We have not discussed any algorithm until now. Image that for a specific accepter, it can accept multiple proposals. Thus, we can allow multiple proposals to be chosen.

However:

P2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v. (If a proposal is chosen, the value should not be destroyed by the future execution of the algorithm. In distributed environment, the algorithm might execute for ever until somebody tell them to stop.)

Since numbers are totally ordered, condition P2 guarantees the crucial safety property that only a single value is chosen.

Page 21: Paxos  and Replicated State Machine (RSM)

Strengthening P2 requirements:To be chosen, a proposal must be accepted by at least one acceptor. We can satisfy P2 by satisfying:◦ P2a. If a proposal with value v is chosen, then every higher-numbered proposal

accepted by any acceptor has value v.

But, someone might propose another value after the step of chosen value v with the proposal number n. Further strengthening:◦ P2b. If a proposal with value v is chosen, then every higher-numbered proposal

issued by any proposer has value v.

So, if we meet P2b, we can meet P2a and then meet P2.

What does P2b mean?◦ A proposer should not make its proposal arbitrary. It should do

something before making its proposal. Learning the history is easy, predict the future is difficult.

Page 22: Paxos  and Replicated State Machine (RSM)

How to get P2b?We would assume that some proposal with number m and value v is chosen and show that any proposal issued with number n > m also has value v. Using induction on n, assume every proposal issued with a number in m … (n-1) has value v.

For the proposal numbered m to be chosen, there must be some set C consisting of a majority of acceptors such that every acceptor in C accepted it.

Thus:◦ Every acceptor in C has accepted a proposal with number in m … (n-1), and

every proposal with number in m…(n-1) accepted by any acceptor has value v.

Page 23: Paxos  and Replicated State Machine (RSM)

How to make the proposals?Since any set S consisting of a majority of accepters contains at least one member of C, we can conclude that a proposal numbered n as value v by ensuring that the following invariant is maintained:

P2c. For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S.

Let the proposer learn something first and then make the proposals.

Page 24: Paxos  and Replicated State Machine (RSM)

Paxos AlgorithmUntil now, we have not discuss any algorithm yet. Lets see Paxos then.

Proposer : Prepare proposals

Accepter : Accept or reject (not accept) proposals

Learner : Learn the current status of the system. If the value is decided, notify the person who are supposed to know the value

Page 25: Paxos  and Replicated State Machine (RSM)

Step 1: Prepare

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

PREPARE j PREPARE k

k > j

(a) A proposer selects a proposal number n and sends a PREPARErequest with number n to a majority of acceptors.

Page 26: Paxos  and Replicated State Machine (RSM)

Step 2: PromisePROMISE n – Acceptor will

accept proposals only numbered n or higher

Proposer 1 is ineligible because a quorum has voted for a higher number than j

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

k > j

PROMISE j

PROMISE kPROMISE k

(b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

P1a . An acceptor can accept a proposal numbered n iff it has not responded to a prepare request having a number greater than n.

Page 27: Paxos  and Replicated State Machine (RSM)

Step 3: Accept!

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

ACCEPT! (v_k, k)

Proposer 1 is disqualified; Proposer 2 offers a value

(a) If the proposer receives a response to its prepare requests(numbered n) from a majority of acceptors, then it sends an ACCEPTrequest to each of those acceptors for a proposal numbered n with avalue v, where v is the value of the highest-numbered proposal amongthe responses, or is any value if the responses reported no proposals.

Page 28: Paxos  and Replicated State Machine (RSM)

Step 4: Accepted

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

A quorum has accepted value v_k; it is now a fact

Accepted k

(b) If an acceptor receives an accept request for a proposal numberedn, it accepts the proposal unless it has already responded to a PREPARErequest having a number greater than n.

Page 29: Paxos  and Replicated State Machine (RSM)

Learning values

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

Learner

v?

Acceptor Acceptor Acceptor

Proposer 1

Proposer 2

Learner

V_k

If a learner interrogates the system, a quorum will respond with fact V_k

A learner will send LEARN request to all (or majority) of the accepters. Acceptors will response with the accepted proposals. If a proposal is accepted by the majority of accepters, this proposal is the decided one.

Page 30: Paxos  and Replicated State Machine (RSM)

Proposer Codestruct proposal {number, value} //n>=1proposer_make_proposal(n, pvalue) send(PREPARE, n) to a majority of accepters; wait until [received (ACK-PREPARE, proposal) from a majority of accepters] received_proposals = [all received proposals] old_max_proposal = a proposal in received_roposals with the maximal proposal number

if old_max_proposal.number > n

abandon_making_proposal; return; //abandon

if old_max_proposal == null

newproposal = (n, pvalue);

else

newproposal = (n, old_max_proposal.value);

send(ACCEPT, new proposal) to a majority of accepters; //or all accepters

Page 31: Paxos  and Replicated State Machine (RSM)

Accepters response to PREPAREold_prepare_number;accepted_proposals;

accepter_on_receive_prepare(PREPARE,number,proposer)if number > old_prepare_number old_prepare_number = number; old_max_proposal = a proposal in accepted_proposals with max proposal number send(ACK_PREPARE, n, old_max_proposal) to proposerelse either also send back the old_max_proposal or just ignore the message

Page 32: Paxos  and Replicated State Machine (RSM)

Accepter response to ACCEPTaccepter_on_receive_accept(ACCEPT, proposal, proposer)

if proposal.number ≥ old_prepare_number

accepted_proposals = accepted_proposals proposal∪else

either send back the old_max_proposal or just ignore the message

Page 33: Paxos  and Replicated State Machine (RSM)

Learnerrepeat

send (LEARN) to all accepters

accepted_proposals = all proposals replied

until there exists a proposal that it is accepted by a majority accepters

proposal is chosen

Page 34: Paxos  and Replicated State Machine (RSM)

Proposer response to LEARNaccepter_on_receive_learn(learner)

send(ACK-LEARN, accepted_proposals) to learner

Page 35: Paxos  and Replicated State Machine (RSM)

Why Paxos is correct?The key is “do not break the value if it is chosen”. The proposer follows the algorithm strictly and make the proposal based on the collected history information.

To prove Paxos is correct, we should prove:

P2b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v.

Which means if decided, no further action should destroy the decision.◦ If proposal <m,v> is decided, the prepare phase will restrict the proposer to make any

proposal with the only value of v as the proposer will only be the returned value from accepters. The proposer can only make the proposals while getting majority response from accepters which of course intersect with the decided proposal set of <m,v>.

◦ This is what P2c said.◦ So, Paxos will not destroy a value if it is decided i.e. meets the safety requirement. What

about the liveness requirement? Will Paxos truly goes to consensus among a group of process?

Page 36: Paxos  and Replicated State Machine (RSM)

Progress (liveness)

Even two proposers might bring the system to live lock without doing anything useful.

So, there should be only one proposer which can be considered as the leader in the group.

Page 37: Paxos  and Replicated State Machine (RSM)

Leader Election(1)leader; //leader process, initialized to p2, the process with smallest id

proposer_self; //each proposer has its own id

proposer_start_leader_election

repeat periodically forever

send(ELECTION) to all proposers

wait for a while and receiving leader election messages; //”a while” can be 2x(largest latency)

active_proposers = all proposers that send back the ACK-ELECTION message

leader = a proposer in active_proposers with minimal proposer_id

Page 38: Paxos  and Replicated State Machine (RSM)

Leader Election(2)proposer_on_receive_election(proposer)

send(ACK-ELECTION, proposer_self) to proposer

Page 39: Paxos  and Replicated State Machine (RSM)

Leader Election(3, leader code)current_proposal_number;

proposer_make_proposals

repeat forever

wait for a while; //3x(maximal latency)

if leader = proposer_self

stop the existing proposer_make_proposal;

current_proposal_number = current_proposal_number + np;

start a new call of proposer_make_proposal;

Page 40: Paxos  and Replicated State Machine (RSM)

Discussion1 Can these two proposals considered the same? <100, “hello”>, <200, “hello>, consider there are only three accepters and using this group to illustrate the principle of paxos proposals.

2 If everything is OK, what about the performance of Paxos? How can a bunch of operations can be batched?

Page 41: Paxos  and Replicated State Machine (RSM)

Thank you! Any Questions?

Click icon to add picture