Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I....

33
Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya

Transcript of Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I....

Page 1: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

1

Byzantine Fault Tolerance

CS 425: Distributed SystemsFall 2011

Material drived from slides by I. Gupta and N.Vaidya

Page 2: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

2

Reading List

• L. Lamport, R. Shostak, M. Pease, “The Byzantine Generals Problem,” ACM ToPLaS 1982.

• M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” OSDI 1999.

Page 3: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

Byzantine Generals Problem

A sender wants to send message to n-1 other peers

• Fault-free nodes must agree

• Sender fault-free agree on its message

• Up to f failures

Page 4: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

Byzantine Generals Problem

A sender wants to send message to n-1 other peers

• Fault-free nodes must agree

• Sender fault-free agree on its message

• Up to f failures

Page 5: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

321

v

vv

Byzantine Generals Algorithm

5

value v

Faulty peer

Page 6: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

321

v

vv

6

value v

v v

Byzantine Generals Algorithm

Page 7: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

321

v

vv

7

value v

v v

?

?

Byzantine Generals Algorithm

Page 8: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

321

v

vv

8

value v

v

v v

?

v?

Byzantine Generals Algorithm

Page 9: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

32

v

vv

9

value v

x

v v

?

v?[v,v,?]

[v,v,?]

S

1

Byzantine Generals Algorithm

Page 10: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

32

v

vv

10

value v

x

v v

?

v?v

v

S

1Majorityvote resultsin correctresult atgood peers

Byzantine Generals Algorithm

Page 11: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

321

v

wx

11

Faulty source

Byzantine Generals Algorithm

Page 12: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

32

v

w

x

12

w w1

Byzantine Generals Algorithm

Page 13: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

32

v

wx

13x

w w

v

xv

1

Byzantine Generals Algorithm

Page 14: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

32

v

wx

14x

w w

v

xv

1[v,w,x]

[v,w,x]

[v,w,x]

Byzantine Generals Algorithm

Page 15: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

S

32

v

wx

15x

w w

v

xv

1[v,w,x]

[v,w,x]

[v,w,x]

Vote resultidentical atgood peers

Byzantine Generals Algorithm

Page 16: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

Known Results

• Need 3f + 1 nodes to tolerate f failures

• Need Ω(n2) messages in general

16

Page 17: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

Ω(n2) Message Complexity

• Each message at least 1 bit

• Ω(n2) bits “communication complexity” to agree on just 1 bit value

17

Page 18: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

18

Practical Byzantine Fault Tolerance

• Computer systems provide crucial services• Computer systems fail– Crash-stop failure– Crash-recovery failure– Byzantine failure

• Example: natural disaster, malicious attack, hardware failure, software bug, etc.

• Need highly available service Replicate to increase availability

Page 19: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

19

Challenges

Request A Request B

Client Client

Page 20: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

20

Requirements

• All replicas must handle same requestsdespite failure.

• Replicas must handle requests in identical order despite failure.

Page 21: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

21

Challenges

2: Request B

1: Request A

Client Client

Page 22: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

22

State Machine Replication

2: Request B

1: Request A

2: Request B

1: Request A

2: Request B

1: Request A

2: Request B

1: Request A

Client Client

How to assign sequence number to requests?

Page 23: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

23

Primary Backup Mechanism

Client Client

2: Request B

1: Request A

What if the primary is faulty?Agreeing on sequence number

Agreeing on changing the primary (view change)

View 0

Page 24: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

24

Normal Case Operation

• Three phase algorithm:– PRE-PREPARE picks order of requests– PREPARE ensures order within views– COMMIT ensures order across views

• Replicas remember messages in log• Messages are authenticated– .σk denotes a message sent by k

Page 25: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

25

Pre-prepare Phase

Primary: Replica 0

Replica 1

Replica 2

Replica 3

Request: m

PRE-PREPARE, v, n, mσ0

Fail

Page 26: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

26

Prepare PhaseRequest: m

PRE-PREPARE

Primary: Replica 0

Replica 1

Replica 2

Replica 3 Fail

Accepted PRE-PREPARE

Page 27: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

27

Prepare PhaseRequest: m

PRE-PREPARE

Primary: Replica 0

Replica 1

Replica 2

Replica 3 Fail

PREPARE, v, n, D(m), 1σ1

Accepted PRE-PREPARE

Page 28: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

28

Prepare PhaseRequest: m

PRE-PREPARE

Primary: Replica 0

Replica 1

Replica 2

Replica 3 Fail

PREPARE, v, n, D(m), 1σ1

Accepted PRE-PREPARE

Collect PRE-PREPARE + 2f matching PREPARE

Page 29: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

29

Commit PhaseRequest: m

PRE-PREPARE

Primary: Replica 0

Replica 1

Replica 2

Replica 3 Fail

PREPARE

COMMIT, v, n, D(m)σ2

Page 30: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

30

Commit Phase (2)Request: m

PRE-PREPARE

Primary: Replica 0

Replica 1

Replica 2

Replica 3 Fail

PREPARE COMMIT

Collect 2f+1 matching COMMIT: execute and reply

Page 31: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

31

View Change

• Provide liveness when primary fails– Timeouts trigger view changes– Select new primary (= view number mod 3f+1)

• Brief protocol– Replicas send VIEW-CHANGE message along with

the requests they prepared so far– New primary collects 2f+1 VIEW-CHANGE messages– Constructs information about committed requests

in previous views

Page 32: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

32

View Change Safety

• Goal: No two different committed request with same sequence number across views

Quorum for Committed Certificate (m, v, n)

At least one correct replica has Prepared Certificate (m, v, n)

View Change Quorum

Page 33: Byzantine Fault Tolerance CS 425: Distributed Systems Fall 2011 1 Material drived from slides by I. Gupta and N.Vaidya.

33

Related WorksFault Tolerance

Fail Stop Fault Tolerance

Paxos1989 (TR)

VS ReplicationPODC 1988

Byzantine Fault Tolerance

Byzantine Agreement

RampartTPDS 1995

SecureRingHICSS 1998

PBFT OSDI ‘99

BASETOCS ‘03

Byzantine Quorums

Malkhi-ReiterJDC 1998

PhalanxSRDS 1998

FleetToKDI ‘00

Q/USOSP ‘05

Hybrid Quorum

HQ Replication OSDI ‘06