Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time...

21
Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04

Transcript of Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time...

Page 1: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Paxos: The Part-Time Parliament

Paxos: The Part-Time Parliament

CHEN Xinyu

2011-04-04

Page 2: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 2

OutlineOutline

Background The single-decree protocol Fault-tolerant distributed system Conclusion

Page 3: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

The ParliamentThe Parliament

The primary task was to determine the law

A sequence of passed decrees

A decree was passed if and only if a majority of legislators voted for the decree

3

Page 4: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

ConstraintsConstraints

The acoustics of the Chamber were poor

Communicate only by messenger

Part-time: No one in Paxos was willing to devote his life to the Parliament

Legislator Continually wandered in and out of the parliamentary

Chamber No secretary

Each legislator maintained a ledger in which he recorded the numbered sequence of decrees that were passed

Messenger Messages may be delayed, lost, or duplicated

4

Page 5: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

PreconditionsPreconditions Mutual trust

Legislators were willing to pass any decree that was proposed

Messengers did not garble messages When legislators and messengers

remained in the Chamber Legislators reacted promptly to any messages Messengers delivered messages in a timely fashion

Resources for each legislator A sturdy ledger

Record the decrees Write notes to remind himself of the current progress

Enough funds to hire as many messengers as he needed

Timers

5

Page 6: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

The Single-Decree ProtocolThe Single-Decree Protocol

A decree was chosen through a series of numbered ballots

In each ballot, a legislator had the choice only of voting for the decree or not voting

Each ballot was associated with a set of legislators called a quorum

A ballot succeeded if and only if every legislator in the quorum voted for the decrees

6

12/ N

Page 7: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

RequirementsRequirements

Consistency No two ledgers could contain contradictory

information

Progress If a majority of the legislators were in the

Chamber, and no one entered or left the Chamber for a sufficiently long period of time, then any decree proposed by a legislator in the Chamber would be passed, and every decree that had been passed would appear in the ledger of every legislator in the Chamber

7

Page 8: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Achieving ConsistencyAchieving Consistency

Each ballot has a unique ballot number

The quorums of any two ballots have at least one legislator in common

For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots

8

Page 9: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

A Sequence of BallotsA Sequence of Ballots

9

11

22

33

44

55

Ballot # Decree Quorum and voters

If a ballot B is successful, then any later ballot is for the same decree as B

For every ballot B, if any legislator in B’s quorum voted in an earlier ballot, then the decree of B equals the decree of the latest of those earlier ballots

Page 10: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Three RolesThree Roles Proposer: A legislator who initiated a

ballot How to chose a ballot’s number, decree, and quorum? Notes:

pnumber: the largest ballot number that he has proposed pdecree: the proposed decree for the ballot pnumber

Acceptor: A legislator in the quorum How to decide whether or not to vote? Notes

number: the largest ballot number that he has received vnumber: the largest ballot number that he has cast vdecree: the decree voted to accept during the ballot vnumber

Learner: A legislator in Parliament or citizen

10

Page 11: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

ProposerProposer

Ballot number Assign each legislator a unique id l

between 0 and N-1 Total N legislators

The smallest ballot number s larger than any he has seen such that s mod N = l

Quorum A simple majority A weighted majority

Any set of legislators whose total weight was more than half the total weight of all legislators

11

Page 12: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Phase 1: PreparePhase 1: Prepare Phase 1a: Proposer Acceptor

pnumber = … pdecree = … msg: prepare(pnumber)

Phase 1b: Proposer Acceptor if (pnumber > number)

number = pnumber msg: promise(number, vnumber, vdecree)

He promised that he would not cast a vote for a decree with ballot number less than number

else if (pnumber < number) && different proposers

msg: reject(number) else if (pnumber == number)

ignore12

Page 13: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Phase 2: ProposePhase 2: Propose Phase 2a: Proposer Acceptor

msg:promise(number, vnumber, vdecree) if (pnumber == number) && majority(number)

if(vdecree != null) pdecree = vdecree with the largest of vnumber (only one

such a value) msg: propose(pnumber, pdecree)

Phase 2b: if (pnumber number) && (vnumber pnumber)

number = vnumber = pnumber vdecree = pdecree Learner Acceptor

msg: vote(vnumber, vdecree) else if (pnumber < number)

Proposer Acceptor msg: reject(number)

13

Page 14: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Phase 3: LearnPhase 3: Learn

Phase 3: Learner if majority(vnumber)

Legislator: update his ledger with vdecree Citizen: informed with vdecree

14

Page 15: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Example 1Example 1

15

1 2 3

prepare(0)

promise(0, -, null)

propose(0, )

vote(0, )

prepare(pnumber)promise(number, vnumber, vdecree)propose(pnumber, pdecree)vote(vnumber, vdecree)reject(number)

Page 16: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Example 2Example 2

16

1 2 3 4 5

prepare(0)

promise(0, -, null)

propose(0, )

prepare(4)

promise(4, -, null)

propose(9, )

vote(9, )

prepare(9)

promise(9, 0, )citizen

citizen

vote(0, )

promise(9, -, null)

Page 17: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

LivelockLivelock

17

1 2 3 4 5

prepare(0)

promise(0, -, null)

propose(0, )

prepare(4)

promise(4, -, null)

propose(4, )

vote(4, )

reject(4)vote(0, )

prepare(5)

promise(5, -, null)promise(5, 0, )

reject(5)

prepare(9)

Page 18: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

President SelectionPresident Selection

The progress condition would be met if only a single proposer, who did not leave the Chamber, was initiating ballots

Having multiple presidents could only impede progress

It could not cause inconsistency

18

Page 19: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

Fault-Tolerant Distributed System

Fault-Tolerant Distributed System

A single server: lower availability Multiple server replicas

Legislators Multiple non-reliable server replicas

Proposer : On behalf of client Acceptor : Working server replica Learner: All server replicas

Messenger Non-reliable communication path Non-Byzantine faults (lost, out of order, duplicated)

Decree User command submitted to server replicas

Law (a numbed sequence of passed decrees) Server replica state

State needs to be consistent among replicas Ledger Stable storage

Save messages before being sent out19

Page 20: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong

ConclusionConclusion

Paxos: a consensus protocol proposed by Leslie Lamport in 1989

Quorum (Majority) Phase 1 (Prepare): no decree proposed

Used in Google Chubby lock Hadoop Zookeeper (Zab) Scalien Keyspace (key-value NOSQL)

Oracle Berkey DB replication …

20

Page 21: Dept. of Computer Science & Engineering, The Chinese University of Hong Kong Paxos: The Part-Time Parliament CHEN Xinyu 2011-04-04.

Dept. of Computer Science & Engineering, The Chinese University of Hong Kong 21