Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant...
-
Upload
herbert-owens -
Category
Documents
-
view
220 -
download
0
description
Transcript of Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant...
Antidio ViguriaAnn Krueger
A Nonblocking Quorum Consensus Protocol for Replicated DataDivyakant Agrawal and Arthur J. Bernstein
Paper Presentation:Dependable Distributed Systems
A Nonblocking Quorum Consensus Protocol for Replicated Data
Agenda Introduction Nonblocking Quorum Overview Blocking versus Nonblocking Propagation Mechanism Application in multi-robot systems
Task allocation Fault tolerance for distributed task
allocation algorithms Conclusions
A Nonblocking Quorum Consensus Protocol for Replicated Data
Introduction Motivation for data replication
Fault Tolerance Delays in accessing data
Gifford’s (Blocking) Quorum Protocol qr[x] + qw[x] > n[x] Read-write operations blocked until
current copy is identified Quorum protocol developed for full file
access Adopting for databases inefficient
A Nonblocking Quorum Consensus Protocol for Replicated Data
Nonblocking Quorum Overview
Assume background mechanism to propagate updates to all copies
Optimistic assumption: “Every copy of an object is current with high probability.”
Significantly reduces delay If obsolete copy, transaction can be rolled
back
A Nonblocking Quorum Consensus Protocol for Replicated Data
Nonblocking Quorum Read an object x
1. Send read request to qr[x] nodes2. Continues computation using value in
first reply3. Quorum accumulation in background;
replies of quorum concurrently collected with execution of transaction
4. If obsolete, transaction rolled back
A Nonblocking Quorum Consensus Protocol for Replicated Data
Nonblocking Quorum Write an object x
1. Send write request to qw[x] nodes2. Continues computation3. Replies gather concurrently, used to
calculate version number4. Transaction commit delayed until
required quorum is assembled
21xnwqw (1)
A Nonblocking Quorum Consensus Protocol for Replicated Data
Performance Comparison Assumptions:
Transaction, T, executes without conflicts Delays due to quorum accumulation &
rollback qr[x], qw[x], n[x] are same for each object x
Blocking LatencyLB = k * D
Nonblocking LatencyLNB ≈ (k-1)p*D+D for k*c«D
LNB improvement over LB if p is small
A Nonblocking Quorum Consensus Protocol for Replicated Data
Optimistic Assumption Every copy is current most of the time
cannot be justified in general. Mechanism that propagates updates
made to qw[x] to all copies of x would decrease value of p by increasing the number of current copies of x
A Nonblocking Quorum Consensus Protocol for Replicated Data
Propagation Mechanism Broadcast mechanism to propagate
updates from write quorum to all copies Need not be reliable X messages contain new value of object
Some messages will be received by nodes among X which already current.
A Nonblocking Quorum Consensus Protocol for Replicated Data
Propagation Mechanism General model for spread of information
xnew xnew
xnew
xold xoldxnewxnew
A Nonblocking Quorum Consensus Protocol for Replicated Data
Effect of Propagation p is the probability
that first reply is obsolete
p -> 0 as number of propagation cycles between successive logical accesses increases
A Nonblocking Quorum Consensus Protocol for Replicated Data
Effect of Propagation In order to justify optimistic assumption
that every copy is current most of the time, we must integrate propagation mechanism such that average number of propagation cycles between successive logical accesses is sufficiently large.
A Nonblocking Quorum Consensus Protocol for Replicated Data
Propagation mechanism Log: local copy organized as an ordered
sequence of event records Gossip messages: used to keep copies of the log
up to date Each site maintains a timetable with events and
the timestamps when they occurred A site uses the timetable to decide which
portion of its log it should send to another site and which should be discarded
For a particular event record, a site maintains its local copy if it is not certain that all other sites are aware of this event
A Nonblocking Quorum Consensus Protocol for Replicated Data
Propagation mechanism Run in background and periodically Two properties are guaranteed:
Propagation property: assumes that site failures and networks partitions are not permanent
Causality property For this application:
Event: describes a transaction commit and contains the new version of objects written
Implicit communication: gossip messages Explicit communication: unicast (information
pertinent to the quorum only)
A Nonblocking Quorum Consensus Protocol for Replicated Data
Nonblocking quorum protocol with the propagation mechanism
Atomic transactions: finish with a commit or an abort
Two-phase commit protocol to guarantee the atomicity of transactions that involve multiple sites
Concurrency control protocol only permits recoverable executions The copies of the data object are not modified until a
transaction commits Main difference with the original protocol appears
when a transaction decides to commit Coordinator: site at which a transaction is initiated Participants: other sites that participate in the
execution of the transaction
A Nonblocking Quorum Consensus Protocol for Replicated Data
Two-phase commit protocol Commitment is delayed until t completes its verification
for every object. T explicitly sends a “prepare” message to all participants
If any participant fails, T aborts sending explicitly messages
If not T is committed The coordinator sends a commit message to all
participants using implicit communication. Update coordinator’s copy of the log.
Update database information when a site receives a commit gossip message.
The commit record is discarded from the log when all sites have learned about T’s commitment
Commit messages are sent to every site. More overhead (can be reduced using optimizations) More robust than the original one. Not need a special
termination protocol when failures occur.
1st phase
2nd phase
A Nonblocking Quorum Consensus Protocol for Replicated Data
Application in multi-robot systems
Cooperation Coodination
Time
Space
Task Allocation
Nonblocking quorum protocol can be used for real time applications such as robotics
Use this algorithm to increase dependability in multi-robot systems
A Nonblocking Quorum Consensus Protocol for Replicated Data
Task Allocation Problem:
Given a number of robots and tasks which robot should execute which task in order to minimize a parameter (traveled distance, mission time, etc.)
Illustrative example: 3 robots 3 tasks (go to a certain point) Minimize global distance and time of the
mission
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 1Task 2
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 1Task 2
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Different approaches Centralized
Exponential computational complexity (NP-hard) Slow response for dynamic environments Single point of failure
Decentralized Tolerate dynamic environments No single point of failure More complex and efficient but not optimal
solutions Most successful approach so far based on the
CNP (Contract Net Protocol)
A Nonblocking Quorum Consensus Protocol for Replicated Data
How does it work? Based on roles:
Director of the auction Bidders
Simplest protocol: Director announces a task
with an associted minimum bid.
Bidders send their bids. The director selects the best
bid. The director allocates the task
to the best bidder. The roles are played
dynamically by the different robots.
Each robot only knows the tasks allocated to him.
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 1Task 2
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
What happens when a robot fails?
Types of failures Failure that does not allow the robot to execute the
task (motors stalled, video-cameras broken, etc.) Failure in communications, i.e., the robot is not
able to communicate with the rest of the robots. Robot is dead (for example failure in the main
power supply) Focus on robot deaths and failures in
communication When a robot fails the tasks allocated to him are
lost. Mission failed!!.
A Nonblocking Quorum Consensus Protocol for Replicated Data
Possible solution Use well-know algorithms already tested for dependable
distributed systems Use of quorums in order to replicate the tasks allocated to
each robot. All robots work as clients and servers.
Because data is only read when there is a failure. Number of writings much larger than number of readings. r+w>n (n=number of robots). For this application, in order to minimize the overhead r>>w. Write once, read all
Overhead is very important Use dynamic quorums Just support one crash failure r+w=n+2
Take into account that a team of robots can split up and join frequently.
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot CTask 2
Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot CTask 2
Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot CTask 2
Task 1
Task 3
A Nonblocking Quorum Consensus Protocol for Replicated Data
Illustrative example
Robot A
Robot B
Robot C
Task 2Task 1
Task 3
Mission completed successfully!!
A Nonblocking Quorum Consensus Protocol for Replicated Data
Conclusions Nonblocking quorums better than blocking
quorums (in terms of latency) when you use propagation mechanism.
Propagation mechanism lowers probability that first reply is obsolete.
Algorithms used in distributed systems can be applied in other fields such as robotics
Quorum protocol could be a good solution to make more fault tolerant multi-robot systems
Questions?
Paper Presentation:Dependable Distributed Systems
Antidio ViguriaAnn Krueger
Thank you for your attention!
Paper Presentation:Dependable Distributed Systems