Jiaqing Du, Daniele Sciascia , Sameh Elnikety Willy Zwaenepoel , Fernando Pedone

29
Clock-RSM: Low-Latency Inter- Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety Willy Zwaenepoel, Fernando Pedone EPFL, University of Lugano, Microsoft Research

description

Clock - RSM : Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks. Jiaqing Du, Daniele Sciascia , Sameh Elnikety Willy Zwaenepoel , Fernando Pedone. EPFL, University of Lugano , Microsoft Research. Replicated State Machines (RSM). - PowerPoint PPT Presentation

Transcript of Jiaqing Du, Daniele Sciascia , Sameh Elnikety Willy Zwaenepoel , Fernando Pedone

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely

Synchronized Physical Clocks

Jiaqing Du, Daniele Sciascia, Sameh ElniketyWilly Zwaenepoel, Fernando Pedone

EPFL, University of Lugano, Microsoft Research

Replicated State Machines (RSM)

• Strong consistency– Execute same commands in same order– Reach same state from same initial state

• Fault tolerance– Store data at multiple replicas– Failure masking / fast failover

2

Geo-Replication

Data Center

Data Center

Data CenterData Center

Data Center

• High latency among replicas• Messaging dominates replication latency

3

Leader-Based Protocols

• Order commands by a leader replica• Require extra ordering messages at follower

Leader

client request client reply

Ordering

Replication

High latency for geo replication

Ordering

4

Follower

Clock-RSM

• Orders commands using physical clocks• Overlaps ordering and replication

5

client request client reply

Ordering + Replication

Low latency for geo replication

Outline

• Clock-RSM• Comparison with Paxos• Evaluation• Conclusion

6

Outline

• Clock-RSM• Comparison with Paxos• Evaluation• Conclusion

7

Property and Assumption

• Provides linearizability• Tolerates failure of minority replicas• Assumptions– Asynchronous FIFO channels– Non-Byzantine faults– Loosely synchronized physical clocks

8

Protocol Overview

client request client reply

client request client reply

9

PrepOKcmd1.ts = Clock()

cmd2.ts = Clock()

Clock-RSM

cmd1cmd2

cmd1cmd2

cmd1cmd2

cmd1cmd2

cmd1cmd2

Major Message Steps

• Prep: Ask everyone to log a command• PrepOK: Tell everyone after logging a command

R0

R2

R1

client request

R3

R4

Prep

PrepOK

PrepOK

cmd1.ts = 24

PrepOK

PrepOK

cmd1 committed?

client request

cmd2.ts = 23

10

Commit Conditions

• A command is committed if– Replicated by a majority– All commands ordered before are committed

• Wait until three conditions holdC1: Majority replicationC2: Stable orderC3: Prefix replication

11

C1: Majority Replication

• More than half replicas log cmd1

R0

R2

R1

client request

R3

R4

PrepOK

PrepOK

cmd1.ts = 24

Prep

Replicated by R0, R1, R2

1 RTT: between R0 and majority12

C2: Stable Order

• Replica knows all commands ordered before cmd1– Receives a greater timestamp from every other replica

R0

R2

R1

client request

R3

R4

24

cmd1.ts = 24

2523

25

25

25

0.5 RTT: between R0 and farthest peer

cmd1 is stable at R0

13

Prep / PrepOK / ClockTime

C3: Prefix Replication

• All commands ordered before cmd1 are replicated by a majority

14

R0

R2

R1

client request

R3

R4

cmd1.ts = 24

cmd2 is replicated by R1, R2, R3

cmd2.ts = 23

Prep

PrepOk

1 RTT: R4 to majority + majority to R0

client request

Prep

Prep

PrepOkPrepOk

Overlapping Steps

15

R0

R2

R1

client request

R3

R4

Latency of cmd1 : about 1 RTT to majority

client reply

Majority replication

Stable order

Prefix replication

PrepOK

PrepOK

Prep

Log(cmd1)

Log(cmd1)

24 2523

25

25

25

Prep

Prep

PrepOk

PrepOk

cmd1.ts = 24

Commit LatencyStep Latency

Majority replication 1 RTT (majority1) Stable order 0.5 RTT (farthest) Prefix replication 1 RTT (majority2)

Overall latency = MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) }

16

If 0.5 RTT (farthest) < 1 RTT (majority), then overall latency ≈ 1 RTT (majority).

R0

Topology Examples

Majority1

Farthest

R0

Majority1

Farthest

R3

R4

R2

R1

R4

R3

R2

R1

17

client request

client request

Outline

• Clock-RSM• Comparison with Paxos• Evaluation• Conclusion

18

Paxos 1: Multi-Paxos

• Single leader orders commands– Logical clock: 0, 1, 2, 3, ...

R0

Leader R2

R1

client request

Prep

CommitForward

client reply

PrepOKR3

R4

Latency at followers: 2 RTTs (leader & majority) 19

Paxos 2: Paxos-bcast

• Every replica broadcasts PrepOK– Trades off message complexity for latency

R0

Leader R2

R1

client request

Prep

Forward

client reply

PrepOK

R3

R4

Latency at followers: 1.5 RTTs (leader & majority)20

Clock-RSM vs. Paxos

• With realistic topologies, Clock-RSM has– Lower latency at Paxos follower replicas– Similar / slightly higher latency at Paxos leader

21

Protocol LatencyClock-RSM All replicas: 1 RTT (majority)

if 0.5 RTT (farthest) < 1 RTT (majority)Paxos-bcast Leader: 1 RTT (majority)

Follower: 1.5 RTTs (leader & majority)

Outline

• Clock-RSM• Comparison with Paxos• Evaluation• Conclusion

22

Experiment Setup

• Replicated key-value store• Deployed on Amazon EC2

California (CA)

Virginia (VA)

Ireland (IR)

Singapore (SG)

Japan (JP)

23

Latency (1/2)

• All replicas serve client requests

24

Overlapping vs. Separate Steps

CA VA

IR

SG

JP

25

CA VA (L)

IR

SG

JP

Clock-RSM latency: max of three

Paxos-bcast latency: sum of three

client request

client request

Latency (2/2)

• Paxos leader is changed to CA

26

Throughput

• Five replicas on a local cluster• Message batching is key

27

Also in the Paper

• A reconfiguration protocol• Comparison with Mencius• Latency analysis of protocols

28

Conclusion

• Clock-RSM: low latency geo-replication– Uses loosely synchronized physical clocks– Overlaps ordering and replication

• Leader-based protocols can incur high latency

29