HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel...
-
Upload
anabel-bryant -
Category
Documents
-
view
214 -
download
0
Transcript of HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel...
![Page 1: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/1.jpg)
HQ Replication:Efficient Quorum Agreement forReliable Distributed Systems
James Cowling1, Daniel Myers1, Barbara Liskov1
Rodrigo Rodrigues2, Liuba Shrira3
1MIT CSAIL2INESC-ID and Instituto Superior Técnico
3Brandeis University
![Page 2: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/2.jpg)
Byzantine Fault Tolerance
› Reliable client-server distributed systems» Server replicated across group of replica
machines
› General operations
› Bounded number f of Byzantine replicas
› Must ensure correct system state» Consistent ordering of client operations
![Page 3: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/3.jpg)
State of the Art
› Approaches:» State Machine Replication – BFT
3f+1 replicas» Byzantine Quorums – Q/U
5f+1 replicas Increased performance Degradation when writes contend
![Page 4: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/4.jpg)
Contributions
› Low overhead Byzantine Fault Tolerance» Performance of Byzantine Quorums
without 5f+1 replicas or contention degradation
› Hybrid Quorum scheme for Byzantine
Fault Tolerance» Quorum approach in normal-case» Use Byzantine agreement to resolve write
contention
![Page 5: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/5.jpg)
Outline
› Current Approaches
› HQ Replication
› BFT Improvements
› Performance Evaluation
› Conclusions
![Page 6: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/6.jpg)
State Machine Replication
› BFT - Castro and Liskov TOCS ’02» Operations ordered by primary » Agreed upon by replicas
Client
Primary
Replica 2
Replica 3
Replica 4
Request Pre-Prepare Prepare Commit Reply
![Page 7: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/7.jpg)
Byzantine Quorums
› Q/U - Abd-El-Malek et al. SOSP ’05
› Client controlled protocol» Replicas order operations
independently
› Optimistic» Best case one-phase
protocol» Worst case unbounded
Randomized backoff
Client
Replica 1
Replica 2
Replica 3
Replica 4
Replica 5
Update Reply
Replica 6
![Page 8: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/8.jpg)
Advantages/DisadvantagesBFT
› Good» 3f+1 replicas» Bounded number of
phases› Bad
» Higher latency» Quadratic
communication
Q/U› Good
» Best-case performance One-phase write Low replica load
› Bad» 5f+1 replicas» Degraded
performance when writes contend
![Page 9: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/9.jpg)
HQ Replication
› 3f+1 replicas
› Supports general operations
› No all-to-all communication in normal-
case
› BFT used to resolve contention
![Page 10: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/10.jpg)
HQ Replication
Client
Replica 1
Replica 2
Replica 3
Replica 4
Write1 Write1 OK Write2 Write2 OK
› One-phase read
› Two-phase write
![Page 11: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/11.jpg)
High-level Write Protocol
› Two-phase write protocol
› Phase 1:» Client obtains timestamp grant from each
replica
› Phase 2:» Client forms certificate from 2f+1
matching grants
» Sends to replicas to complete write
![Page 12: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/12.jpg)
Grants
› Promise to execute operation at given sequence number» Assuming agreement from quorum
› Grant» Client ID» Object ID» Hash over requested operation» Sequence Number (timestamp)» Replica signature
![Page 13: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/13.jpg)
Certificates
› Certificate» Quorum (2f+1) matching grants
› Proves quorum of replicas agree to
ordering of operation» Uniquely identify client, operation and
sequential ordering» Existence of certificate precludes
existence of conflicting certificate
![Page 14: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/14.jpg)
Replica State
› Multiple independent objects
› State per-object» Certificate supporting most recent write» Operation status
Active– Write in progress, outstanding grant
Quiescent– No current write operation
![Page 15: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/15.jpg)
Write Phase 1
› Client sends write request to replicas» If quiescent, replica assigns new grant to
client» If active, replica sends currently
outstanding grant
› Several Possibilities» All grants match» Grants for different client» Grants conflict
![Page 16: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/16.jpg)
Isolated Write
![Page 17: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/17.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
![Page 18: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/18.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
Write A
Write A
Write A
![Page 19: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/19.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
Write A
Write A
Write A
![Page 20: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/20.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
Grant <1,1,A>1
Grant <1,1,A>2
Grant <1,1,A>3
![Page 21: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/21.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
Matching grants: Phase 2 write
Grant <1,1,A>1
Grant <1,1,A>2
Grant <1,1,A>3
![Page 22: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/22.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
Matching grants: Phase 2 write
![Page 23: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/23.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
execute A
execute A
execute A
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
![Page 24: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/24.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
Result A
Result AResult A
![Page 25: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/25.jpg)
Isolated Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
result
Write Complete
Result A
Result AResult A
![Page 26: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/26.jpg)
Incomplete Write
![Page 27: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/27.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
client 2
![Page 28: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/28.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
client 2
Write A
Write A
Write A
![Page 29: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/29.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Write A
Write A
Write A
![Page 30: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/30.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Grant <1,1,A>1
Grant <1,1,A>2
Grant <1,1,A>3
![Page 31: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/31.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Client 1 slow or failed
Grant <1,1,A>1
Grant <1,1,A>2
Grant <1,1,A>3
![Page 32: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/32.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Writ
e B
Write B
Write B
![Page 33: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/33.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Grant<
1,1,A> 1
Grant <1,1,A>2
Grant <1,1,A>3
Replicas active: Return current grant
![Page 34: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/34.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2
Grants for different client: Perform Writeback
Grant<
1,1,A> 1
Grant <1,1,A>2
Grant <1,1,A>3
![Page 35: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/35.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
client 2Cert
{G 1,G 2
,G 3}, W
rite B
Cert {G1,G2,G3}, Write B
Cert {G1,G2,G3}, Write B
Grants for different client: Perform Writeback
![Page 36: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/36.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
client 2
execute A
execute A
execute A
Cert {G 1
,G 2,G 3
}, Writ
e B
Cert {G1,G2,G3}, Write B
Cert {G1,G2,G3}, Write B
![Page 37: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/37.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: 1Seq No: 1
Operation: AGrant
client 2Cert
{G 1,G 2
,G 3}, W
rite B
Cert {G1,G2,G3}, Write B
Cert {G1,G2,G3}, Write B
![Page 38: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/38.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 2Seq No: 2
Operation: BGrant
State: ActiveClient: 2Seq No: 2
Operation: BGrant
State: ActiveClient: 2Seq No: 2
Operation: BGrant
client 2
Grant<
2,2,B> 1
Grant <2,2,B>2
Grant <2,2,B>3
![Page 39: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/39.jpg)
Incomplete Write
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 2Seq No: 2
Operation: BGrant
State: ActiveClient: 2Seq No: 2
Operation: BGrant
State: ActiveClient: 2Seq No: 2
Operation: BGrant
client 2
Matching grants: Phase 2 write
Grant<
2,2,B> 1
Grant <2,2,B>2
Grant <2,2,B>3
![Page 40: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/40.jpg)
Write Contention
![Page 41: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/41.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
client 2
Write A
![Page 42: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/42.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
client 2
Write A
Write A
![Page 43: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/43.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: QuiescentClient: ?Seq No: 0
Operation: ?Grant
client 2
Write A
Write B
Write A
Write A
![Page 44: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/44.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 2Seq No: 1
Operation: BGrant
client 2
Write A
Write B
Write A
Write A
![Page 45: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/45.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
Grant <1,1,A>1
Grant <1,1,A>2
Grant <2,1,B>3
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 2Seq No: 1
Operation: BGrant
![Page 46: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/46.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 2Seq No: 1
Operation: BGrant
Conflicting grants: Request resolution
Grant <1,1,A>1
Grant <1,1,A>2
Grant <2,1,B>3
![Page 47: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/47.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 2Seq No: 1
Operation: BGrant
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
Conflicting grants: Request resolution
Resolve Request
![Page 48: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/48.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 1Seq No: 1
Operation: AGrant
State: ActiveClient: 2Seq No: 1
Operation: BGrant
Contention
Resolution
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
Resolve Request
![Page 49: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/49.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
execute A
execute A
execute A
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
Resolve Request
![Page 50: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/50.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
client 2
execute B
execute B
execute B
Cert {G1,G2,G3}
Cert {G1,G
2,G3}Cert {G
1 ,G2 ,G
3 }
Resolve Request
![Page 51: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/51.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
client 2
Result A
Result AResult A
![Page 52: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/52.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
client 2
result
Result A
Result AResult A
![Page 53: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/53.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
client 2
Result
B
Result B
Result B
![Page 54: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/54.jpg)
Write Contention
client 1
replica 1
replica 2
replica 3
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
State: QuiescentClient: 2Seq No: 2
Operation: BGrant
client 2result
Result
B
Result B
Result B
![Page 55: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/55.jpg)
Contention Resolution
› BFT module used to resolve contention» Establish sequential order on contending
ops
› On receiving resolve request:» Freeze local object state» Send state to primary
› Primary runs BFT on combined state
› Replicas execute contending operations
![Page 56: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/56.jpg)
Additional Details
› Read protocol
› State transfer
› Multi-object transactions
› Performance enhancements
![Page 57: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/57.jpg)
Performance Enhancements
› Preferred quorums»Core protocol run by only 2f+1
replicas
› Symmetric-key cryptography»Authenticators instead of signatures
Collection of 3f+1 MACs <mi,1,mi,2,…,mi,n>
»Lower CPU overhead
![Page 58: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/58.jpg)
BFT Improvements
› Preferred quorums»Reduces degree of quadratic
communication
› Single MAC per message»Significant improvements over
authenticators
![Page 59: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/59.jpg)
Non-Contention Message Overhead
Messages sent/received at each replica per write request
![Page 60: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/60.jpg)
Non-Contention Bandwidth Use
Total bandwidth at each replica per write request
![Page 61: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/61.jpg)
Experimental Setup
› HQ and BFT prototypes deployed on Emulab» Up to 16 replicas (f=5), 200 clients (4 per
machine)
› New BFT codebase
› Implement counter service» Negligible operation payload» Multiple objects
Private non-contention objects Shared contention object
![Page 62: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/62.jpg)
Non-contention Throughput
Maximum operation throughput
![Page 63: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/63.jpg)
Resilience to Contention
Throughput degradation with increasing write-contention
![Page 64: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/64.jpg)
Resilience to Contention
Throughput degradation with increasing write-contention
new
![Page 65: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/65.jpg)
BFT Batching
› BFT allows batching at primary
› Greatly reduces internal protocol
communication
› Increased delay
Client
Primary
Replica 1
Replica 2
Replica 3
Request Pre-Prepare Prepare Commit Reply
once per batch
![Page 66: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/66.jpg)
Batched Performance
Effect of BFT batching on maximum write throughput
![Page 67: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/67.jpg)
Recommendations
› Use Q/U when» Latency critical» Contention low» 5f+1 replicas acceptable
› Use HQ when» Low latency important» Moderate contention
› Use BFT when» Contention high» Throughput more important than latency
![Page 68: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/68.jpg)
Conclusions
› First Byzantine Quorum protocol with 3f+1 replicas» Supports general operations» Resilient to Byzantine clients
› Introduced Hybrid technique» Resolve contention without performance
degradation» Applicable to general quorum systems
› Found optimized BFT to perform well under high load
![Page 69: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/69.jpg)
Questions?
![Page 70: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/70.jpg)
Further Details› HQ Replication: Properties and optimizations
» James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues and Liuba Shrira. Technical Memo In Prep., MIT Computer Science and Artificial Laboratory, Cambridge, Massachusetts, 2006.
› Contact:» [email protected]» http://people.csail.mit.edu/cowling/
![Page 71: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/71.jpg)
Write-back Operation› Write certificate paired with a subsequent
request› Used to ensure progress with slow
replicas or clients» Completes phase 2 for a slow client» Advances state of slow replicas
› Replica processes write phase 2 based on certificate, then the paired request
![Page 72: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/72.jpg)
![Page 73: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/73.jpg)
Backups…
![Page 74: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/74.jpg)
Slow Replicas
› Some grants in quorum have old timestamp
› Perform writeback to slow replicas, using certificate provided with highest grant» Brings replicas up to date and solicits new
grants
![Page 75: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/75.jpg)
Why 3f+1?
› 3f+1 replicas» f of which can be faulty
› 2f+1 agree on any ordering» f of these may be Byzantine» The remaining f may be slow
› Maximum of 2f can respond with old
system state, but not 2f+1
![Page 76: HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ec75503460f94bd3f28/html5/thumbnails/76.jpg)
› Won’t HQ have a higher rate of
contention since it’s two phase (higher
latency) than Q/U?» No – contention window only between first
replica receives phase 1 request to last replica receives it. Hence independent of two-phase, and actually smaller than in Q/U