A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for...

40
1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee: Dr. Binoy Ravindran (chair) Dr. Haibo Zeng Dr. Robert Broadwater

Transcript of A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for...

Page 1: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

1

ALow-latencyConsensusAlgorithmforGeographically

ReplicatedSitesMasterofScienceThesisDefense

BalajiArun

Committee:Dr.Binoy Ravindran (chair)Dr.Haibo ZengDr.RobertBroadwater

Page 2: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

2

Agenda• Introduction• Motivation• ThesisContribution:CAESAR• Evaluation• Conclusion

Page 3: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

3

Buildingonlineservicestoday…

Page 4: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

4

DesiredProperties• Availability– FaultTolerance• Low-latency• High-throughput• StrongConsistency

Page 5: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

5

DesiredProperties• Availabilityunderfaults• Low-latency• High-throughput• StrongConsistency

Replication

DistributedSystem

Page 6: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

6

ReplicatingStateful Systems• Stateful systems• e.g.databases,in-memorycaches

• Requiresheavycoordinationamongnodes• tomaintainconsistency

• Expensiveinwidearea• duetohighlatencylinks

Page 7: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

7

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

Page 8: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

8

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

Page 9: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

9

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

Page 10: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

10

CAPTheorem[Brewer1997]

Availability

PartitionTolerance

Consistency

Page 11: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

11

ThesisContribution

Availability

PartitionTolerance

Consistency Availability

PartitionTolerance

ConsistencyUnderfaults*

*onlywhenmajorityofnodesinthesystemfail

Page 12: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

12

StateMachineReplication[F.B.Schneider1990]• Executecommands inthesameorderinallreplicas.

C1

C2 C3

Page 13: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

13

StateMachineReplication• Executecommandsinthesameorderinallreplicas

C2

C1

C3C2C2 C3 C3

C1C1

Consensus

Page 14: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

14

Paxos [Lamport 2001]

• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C2 C3

Letmeownslot1

Letmeownslot1

Letmeownslot1

Page 15: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

15

Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C2 C3

VoteOrange VoteBlueVoteBlue

Page 16: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

16

Paxos• Agreementprotocol• Choosecommandperslot• 2RTTs• Oneforslotownership• Onefordecidingcommand

• SurvivesFcrashesin2F+1nodes

1 2 3 4 5 …

C1 C3

C2

Page 17: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

17

Paxos variants• Onereplicadecides:• Multi-Paxos,FastPaxos

• Roundrobinapproach:• Mencius[Mao‘2008]

1 2 3 4 5 …

C1 C2 C3

1 2 3 4 5 …

C1 C2 C3C4 C5

Page 18: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

18

GeneralizedConsensus[Lamport 2005]

• Orderonlynon-commutativecommands• e.g.samekeyinKey-Valuestore.

• EPaxos [Moraru ‘13],M2Paxos[Peluso ’16]• Bestperformanceundernoconflicts(FastDecision,1RTT)• Performancedegradesunderconflicts(SlowDecision,1+RTT)

• Thesiscontribution:CAESAR

Page 19: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

19

üIntroductionüMotivation

CAESAR• Evaluation• Conclusion

Page 20: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

20

Overview• ImplementsGeneralizedConsensus• Useslogicaltimestampingtoordercommands(likeMencius)• Exploitsquorumsbygatheringdependenciesforcommands(likeEPaxos)• Deploysanovelwait conditiontoboostfastdecisions• underconflicts

Page 21: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

21

Systemmodel• Nodescommunicatethroughmessagepassing• Usestwoquorumtypes• ClassicQuorum: !" +1

• FastQuorum: #!$• Fourphases• FastPropose• SlowPropose• Retry• Stable

Page 22: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

22

UniformReliableBroadcast

PROPOSE:

PROPOSE:C

STABLE:

STABLE:C

OK:C

OK:OK:

OK:C

p0

p1

p2

p3

p4c&

c& c&

c&

c&c&

c&

c&

c&

c

c

c

c

c

Page 23: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

23

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{C}

STABLE:C|0|{}

OK:C|{}

OK:|{C}OK:|{}

OK:C|{}

p0

p1

p2

p3

p4 c&

c

c& c&c&

c&

c&

c&

c&

c&

c

c

c

c

CAESAR:BasicProtocol

Page 24: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

24

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{C}

STABLE:C|0|{}

OK:C|{}

OK:|{C} OK:|{}

OK:C|{}

p0

p1

p2

p3

p4

WAIT

c&c& c&

c&

c&

c&

c&

c&

c&

c

c

c

c

c

CAESAR:WaitCondition(fastpath)

Page 25: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

25

PROPOSE:|4

PROPOSE:C|0

STABLE:|4|{}

RETRY:C|5|{}

OK:C|{}

OK:|{} OK:|{}

NACK:C|{} OK:C|{}

STABLE:C|5|{}p0

p1

p2

p3

p4

WAIT

c&c&

c&

c&

c&

c&

c& c&

c&c&

c&

c&

c&

c

c

c

c

c

CAESAR:WaitCondition(slowpath)

Page 26: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

26

üIntroductionüMotivationüTheContribution:Caesar

Evaluation• Conclusion

Page 27: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

27

Implementation• CAESAR• ImplementedinJava8• AdoptedandmodifiedJPaxos

• usednetwork,messagingandstatemachineabstractions

• Competitors• EPaxos,M2Paxos,Mencius,Multi-Paxos publiclyavailable

• Benchmark• FullyreplicatedKey-Valuestorebenchmark

Page 28: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

28

Deployment

Ireland

Frankfurt

Mumbai

Ohio

Virginia

• AmazonEC2• m4.2xlargeinstances(8vCPUs,32GBMemory)

Page 29: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

29

Client-perceivedperformance

Page 30: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

30

Latency

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Latency(msec)

Virginia

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ohio

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Frankfurt

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ireland

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Mumbai

50

70

90

110

130

150

170

190

210

230

250

0% 2% 10% 30% 50% 100%

Ireland

Caesar EPaxos M2Paxos

• Closed-looprequestinjection• 10clientspernode

Conflict%

Page 31: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

31

Systemperformance

Page 32: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

32

Throughput

0

10

20

30

40

50

0% 2% 10% 30% 50% 100%

Throug

hput(1000x

tps) EPaxos CaesarM2Paxos Multi-Paxos-IRMulti-Paxos-IN Mencius

0

100

200

300

400Throug

hput(1000x

tps)

• Openlooprequestinjection

NetworkBatchingEnabled

NetworkBatchingDisabled

17%

24%

45%

Conflict%

Page 33: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

33

SlowPathsvsEPaxos

0% 20% 40% 60% 80%

100%

0% 2% 10% 30% 50% 100%

%ofSlowPaths EPaxos Caesar

1.64%0.4%

44.5%

14.7%6.7%2.0%

29.83%

50.0%

9.84%

100%

Conflict%

Page 34: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

34

üIntroductionüMotivationüTheContribution:CaesarüEvaluation

Conclusion

Page 35: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

35

Conclusion• CAESARprovidesallthedesiredpropertiesforbuildingtoday’sservices.

HighAvailabilityunder faultsStateMachineReplication andConsensus

Strongconsistency

Low-latency Fast Paths

High-throughput Generalizedconsensusandsimpleexecutionphase

Page 36: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

36

FutureWork• CAESARcanexhibitalatencydistributionwithalargestandarddeviation• duetomultiplephasesandwaitcondition

• ThisisachallengeformeetingServiceLevelAgreement(SLAs)• Futureworkshouldminimizelargetaillatencies

Page 37: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

37

Submission

SubmittedtoDSN2017

Page 38: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

38

Sourcecode• Opensource@https://www.github.com/ibalajiarun/caesar

Page 39: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

39

References• Lamport,Leslie,“Paxos madesimple,”ACMSigact News,2001.• I.Moraru,D.G.Andersen,andM.Kaminsky,“ThereisMoreConsensusinEgalitarianParliaments,”inProceedingsoftheTwenty-FourthACMSymposiumonOperatingSystemsPrinciples,ser.SOSP’13.ACM,2013,pp.358–372.

• Y.Mao,F.P.Junqueira,andK.Marzullo,“Mencius:BuildingEfficientReplicatedStateMachinesforWANs,”inProceedingsofthe8thUSENIXConferenceonOperatingSystemsDesignandImplementation,ser.OSDI’08.USENIXAssociation,2008,pp.369–384.

• L.Lamport,“GeneralizedConsensusandPaxos,”MicrosoftResearch,Tech.Rep.MSR- TR-2005-33,March2005.

• S.Peluso,A.Turcu,R.Palmieri,G.Losa,andB.Ravindran,“Makingfastconsensusgenerallyfaster,”inDSN,2016.

• F.B.Schneider,“Implementingfault-tolerantservicesusingthestatemachineapproach:Atutorial,”ACMComput.Surv.,vol.22,no.4,pp.299–319,Dec.1990.[Online].Available:http://doi.acm.org/10.1145/98163.98167

Page 40: A Low-latency Consensus Algorithm for Geographically ...1 A Low-latency Consensus Algorithm for Geographically Replicated Sites Master of Science Thesis Defense Balaji Arun Committee:

40

Thanksto:Myadvisor,Dr.BinoyRavindran,

Roberto,Sebastiano

Dr.Broadwater,Dr.Zeng

and

AllmyfellowSSRGians(presentandpast)