Tolerating Latency in Replicated State Machines through Client Speculation

27
Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 Benjamin Wester 1 , James Cowling 2 , Edmund B. Nightingale 3 , Peter M. Chen 1 , Jason Flinn 1 , Barbara Liskov 2 University of Michigan 1 , MIT CSAIL 2 , Microsoft Research 3

description

Tolerating Latency in Replicated State Machines through Client Speculation. April 22, 2009 Benjamin Wester 1 , James Cowling 2 , Edmund B. Nightingale 3 , Peter M. Chen 1 , Jason Flinn 1 , Barbara Liskov 2 University of Michigan 1 , MIT CSAIL 2 , Microsoft Research 3. - PowerPoint PPT Presentation

Transcript of Tolerating Latency in Replicated State Machines through Client Speculation

Page 1: Tolerating Latency in Replicated State Machines through Client Speculation

Tolerating Latency in Replicated State Machines through Client Speculation

April 22, 2009

Benjamin Wester1, James Cowling2, Edmund B. Nightingale3,

Peter M. Chen1, Jason Flinn1, Barbara Liskov2

University of Michigan1, MIT CSAIL2, Microsoft Research3

Page 2: Tolerating Latency in Replicated State Machines through Client Speculation

Simple Service Configuration

NSDI'09 Benjamin Wester University of Michigan CSE

2

x=011

++x++x

x=1

Page 3: Tolerating Latency in Replicated State Machines through Client Speculation

Replicated State Machines (RSM)

3

x=1

x=1

x=1

++x++x

22

22

22

22

x=1

x=2

x=2

x=2

x=2

++x++x

++x++x

++x++x

• Agree on request• All non-faulty replies are

identical

NSDI'09 Benjamin Wester University of Michigan CSE

Page 4: Tolerating Latency in Replicated State Machines through Client Speculation

RSMs have high latency

4

222222

x=2

x=2

x=2

x=2

1. Need many replies

2. Agreement

3. Geographic Distribution

NSDI'09 Benjamin Wester University of Michigan CSE

Page 5: Tolerating Latency in Replicated State Machines through Client Speculation

5

Hide the Latency

• Use speculative execution inside RSM• Speculate before consensus is reached

– Without faults, any reply predicts consensus value– Let client continue after receiving one reply

NSDI'09 Benjamin Wester University of Michigan CSE

Page 6: Tolerating Latency in Replicated State Machines through Client Speculation

6

Overview

• Introduction

• Improving RSMs with speculation

• Application to PBFT

• Performance

• Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

Page 7: Tolerating Latency in Replicated State Machines through Client Speculation

Speculative Execution in RSM

• Continue processing while waiting

7

Blocked

Take Checkpoint

x=1

x=1

Predict: 1Speculate! Commit

x=1

x=1

x=2

x=2

Rollback

NSDI'09 Benjamin Wester University of Michigan CSE

Page 8: Tolerating Latency in Replicated State Machines through Client Speculation

Critical path: first reply

• Completion latency less relevant• First reply latency sets critical path

– Speed

– Accuracy

• Other desirable properties– Throughput

– Stability under contention

– Smaller number of replicas

8

1111

NSDI'09 Benjamin Wester University of Michigan CSE

Page 9: Tolerating Latency in Replicated State Machines through Client Speculation

Requests while speculative

1. Hold request– Bad performance

2. Distributed commit/rollback– State tracking complex

9

while !check_lottery():submit_tps()

buy_corvette()

win?

win?

yesyes

Predict win? = yes

buybuy

What do we do with this?

NSDI'09 Benjamin Wester University of Michigan CSE

Page 10: Tolerating Latency in Replicated State Machines through Client Speculation

buybuy

Resolve speculations on the replicas

• Explicitly encode dependencies as predicates• No special request handling needed• Replicas need to log past replies• Local decision at replicas matches client

10

buy

yesyes

if win?=yes: buyif win?=yes: buy

yesyes

NSDI'09 Benjamin Wester University of Michigan CSE

while !check_lottery():submit_tps()

buy_corvette()

win?

win?

win? = yes

keys

keys

buy = keys

Predict win? = yes

Page 11: Tolerating Latency in Replicated State Machines through Client Speculation

11

Overview

• Introduction

• Improving RSMs with speculation

• Application to PBFT

• Performance

• Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

Page 12: Tolerating Latency in Replicated State Machines through Client Speculation

Practical BFT

12

f=1

primary

client

-CS[Castro and Liskov 1999]

NSDI'09 Benjamin Wester University of Michigan CSE

Page 13: Tolerating Latency in Replicated State Machines through Client Speculation

13

Additional Details

• Tentative execution– PBFT/PBFT-CS complete in 4 phases

• Read-only optimization– Accurate answer from backup replica

• Failure threshold– Bound worst-case failure

• Correctness

NSDI'09 Benjamin Wester University of Michigan CSE

Page 14: Tolerating Latency in Replicated State Machines through Client Speculation

14

Overview

• Introduction

• Improving RSMs with speculation

• Application to PBFT

• Performance

• Conclusion

NSDI'09 Benjamin Wester University of Michigan CSE

Page 15: Tolerating Latency in Replicated State Machines through Client Speculation

Benchmarks

• Shared counter– Simple checkpoint– No computation

• NFS: Apache httpd build– Complex checkpoint– Significant computation

15NSDI'09 Benjamin Wester University of Michigan CSE

Page 16: Tolerating Latency in Replicated State Machines through Client Speculation

16

Topology

2.5 or 15 msPrimary

1. Primary-local2. Primary-remote3. Uniform

NSDI'09 Benjamin Wester University of Michigan CSE

Page 17: Tolerating Latency in Replicated State Machines through Client Speculation

17

Base case: no replication

2.5 or 15 ms

1. Primary-local2. Primary-remote3. Uniform

NSDI'09 Benjamin Wester University of Michigan CSE

Page 18: Tolerating Latency in Replicated State Machines through Client Speculation

Shared Counter

18

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 19: Tolerating Latency in Replicated State Machines through Client Speculation

Shared Counter

19

Primary-local topology

[Kotla et al. 07]

NSDI'09 Benjamin Wester University of Michigan CSE

Page 20: Tolerating Latency in Replicated State Machines through Client Speculation

Shared Counter

20

Uniform & Primary-remote topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 21: Tolerating Latency in Replicated State Machines through Client Speculation

Shared Counter

21NSDI'09 Benjamin Wester University of Michigan CSE

Uniform & Primary-remote topology

Page 22: Tolerating Latency in Replicated State Machines through Client Speculation

NFS: Apache build

22

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 23: Tolerating Latency in Replicated State Machines through Client Speculation

NFS: Apache build

23

Uniform topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 24: Tolerating Latency in Replicated State Machines through Client Speculation

NFS: Apache build

24

Primary-remote topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 25: Tolerating Latency in Replicated State Machines through Client Speculation

NFS: With Failure

25

(1% fail)

Primary-local topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 26: Tolerating Latency in Replicated State Machines through Client Speculation

Throughput (Shared Counter)

26

LAN topology

NSDI'09 Benjamin Wester University of Michigan CSE

Page 27: Tolerating Latency in Replicated State Machines through Client Speculation

27

Conclusion

• Integrate client speculation within RSMs• Predicated requests: performance without complexity• Clients less sensitive to latency between replicas• 5x speedup over non-speculative protocol

Makes WAN deployments more practical

NSDI'09 Benjamin Wester University of Michigan CSE