Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
-
Upload
owen-mcbride -
Category
Documents
-
view
213 -
download
0
Transcript of Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Coordination and Agreement
Topics Distributed Mutual Exclusion Leader Election
Failure Assumptions Each pair of processes connected by
reliable channels. Underlying network components may suffer
failures, but reliable protocols recover.• Reliable channel eventually delivers message. No
bound as in an asynchronous system.• May take a while.• Could have network partition, asynchronous
connectivity.
More Failure Assumptions Processes may only fail by crashing
No arbitrary (Byzantine) failures Failure Detectors
Figure 11.1A network partition
Crashedrouter
Distributed Mutual Exclusion Critical Section problem
Need a solution based only on message passing.
Example: Users that update a file NFS is stateless; UNIX provides file-
locking service lockd. More interesting: No server, collection
of peer processes.• Ethernet: who gets to transmit?
Algorithms for Mutual Exclusion
N asynchronous processes that do not share variables. Processes do not fail Message delivery is reliable
• Every message that is sent is eventually delivered exactly once. Conditions:
ME1: At most one process may execute in CS at any time. ME2: Requests to enter and exit the CS eventually succeed
• No deadlock, starvation. Might want ME3: If one request to enter the CS happened-
before another, then entry to CS is granted in that order.
Criteria for Evaluating Algorithm
Bandwidth Consumed (number of messages) sent in each enter and exit operation.
Client delay incurred by process at exit and entry.
Algorithm’s effect on throughput of system. Synchronization delay between one process
exiting CS and the next entering it.
Simplest: Centralized Server managing a mutual exclusion token for a set of
processes
Server
1. Requesttoken
Queue ofrequests
2. Releasetoken
3. Granttoken
4
2
p4
p3p
2
p1
Evaluation of Centralized
ME1, ME2, not ME3 Messages:
Entering critical section:• Two messages (request/grant)
• Delay of round-trip Exiting:
• One release message• No delay.
Server may become a performance bottleneck.
Synchronization delay: round-trip.
A ring of processes transferring a mutual exclusion token
pn
p2
p3
p4
Token
p1
Evaluation of Ring: ME1, ME2, not ME3 Continuously consumes bandwidth. Delay:
Entry: 0 to N. Exit: 1
Synch delay: 1 to N
Multicast and Logical Clocks Basic Idea: Processes that want entry
multicast a request, only enter when all other processes have replied.
Conditions under which you reply insure ME1 through ME3.
Messages are of form <T,Pi>: T is sender’s Lamport timestamp and Pi is the identifier.
States: RELEASED/WANTED/HELD
Ricart and Agrawala’s algorithm
On initializationstate := RELEASED;
To enter the sectionstate := WANTED;Multicast request to all processes including self; request processing deferred hereT := request’s timestamp;Wait until (number of replies received = (N – 1));state := HELD;
On receipt of a request <Ti, pi> at pj (i ≠ j)if (state = HELD or (state = WANTED and (T, pj) < (Ti, pi)))then
queue request from pi without replying; else
reply immediately to pi;end if
To exit the critical sectionstate := RELEASED;reply to any queued requests;
Evaluation: Entry: 2(N-1) messages Synch delay: only one message
transmission time. Both previous had roundtrip
Figure 11.5Multicast synchronization
p3
34
Reply
34
41
4141
34
p1
p2
ReplyReply
Maekawa’s Algorithm In order to enter crit. section,not necessary for all peers to
grant access. Only need permission from subset of peers, as long as all
subsets overlap. Think of processes “voting for each other” to enter the CS. Processes ensure ME1 by casting their votes for only one
candidate. Associate voting set Vi with each process pi
Want pi an element of Vi Want intersection of Vi and Vj nonempty for all i,j. Each voting set of same size K; each pi in M voting sets.
You can get K around sqrt(N) and M-K Easy to see twice that
Maekawa’s algorithm – Part 1On initialization
state := RELEASED;voted := FALSE;
For pi to enter the critical sectionstate := WANTED;Multicast request to all processes in Vi – {pi};Wait until (number of replies received = (K – 1));state := HELD;
On receipt of a request from pi at pj (i ≠ j)if (state = HELD or voted = TRUE)then
queue request from pi without replying; else
send reply to pi;voted := TRUE;
end if
Continues on next slide
Maekawa’s algorithm For pi to exit the critical section
state := RELEASED;Multicast release to all processes in Vi – {pi};
On receipt of a release from pi at pj (i ≠ j)if (queue of requests is non-empty)then
remove head of queue – from pk, say; send reply to pk;voted := TRUE;
else voted := FALSE;
end if
Discussion of Maekawa’s Algorithm
Achieves safety property ME1. Deadlock prone
Can you give an example?
Deadlock Example 3 processes p1,p2,p3
V1=p1,p2 V2=p2,p3 V3=p3,p1
If all 3 concurrently request entry to CS p1 can vote for p2 p2 can vote for p3 p3 can vote for p1
Noone has a quorum Can be made deadlock-free.
Leader Election Choose a unique process to perform
a particular role. Essential that all processes agree on
the choice.
Leader Election Process calls the election: initiates a run
of the algorithm. Individual algorithm does not call more than
one election at a time, but N could call N.• Very important that choice of elected process is
unique. At any point in time a process is a participant
or non-participant. Wolog, require elected process be chosen
as the one with the largest identifier.
Leader Election: Requirements During any particular run of the algorithm:
E1: (safety) A participant process either has not yet defined the leader or has elected P, where P is the non-crashed process at the end of the run with the largest identifier.
E2: (liveness) All processes participate and eventually select a leader or crash.
Measure by Total number of messages sent Turnaround time: number of serialized message
transmission times between initiation and termination of a single run.
Ring-based election algorithm
Motivated by token ring Initially everyone is a non-participant. Any process can begin an election.
Marks itself a participant, places identifier in an election message and sends it to clockwise neighbor.
When a process receives an election message, compares identifier.
If greater, forwards it If < and receiver is not a participant, substitutes its own identifier in
message and forwards it.• Does not forward if its already a participant.
In any case, if it forwards a message, it marks itself as a participant.
If the received identifier is that of the receiver itself, this process’ identifier must be the greatest, and it becomes the coordinator.
Coordinator marks itself non-participant once more and sends an elected message to its neighbor, announcing its election and enclosing its identity.
When receives elected message, marks self as non-participant, sets its elected variable, and forwards message.
Does it work? E1: yes. For any two processes, the
one with the larger identifier will not pass on the other’s identifier. Therefore impossible that both should receive their identifier back.
E2: Follows from guaranteed traversals of ring.
Performance If only a single process starts an
election… Worst case: anti-clockwise neighbor has
highest identifier. • N-1 messages required to reach this neighbor.• Wont announce its election for another N.n • N for announcement.• 3N-1.
Turnaround time also 3N-1.
A ring-based election in progress(Figure 11.7)
24
15
9
4
3
28
17
24
1
Note: The election was started by process 17.The highest process identifier encountered so far is 24. Participant processes are shown darkened
Bully Algorithm Allows processes to crash during an
election Message delivery between processes is
reliable. Assumes system is synchronous; uses
timeouts to detect a process failure. Also assumes that each process knows
which processes have higher identifiers, and that it can communicate with all such processes.
3 types of messages: Election
• Announce an election Answer
• Sent in response to an election message Coordinator
• Announce identity of elected process.
A process begins an election when it notices, through timeouts, that coordinator failed.
Process that knows it has the highest identifier can elect itself coordinator by sending coordinator message to all lower numbered ones.
Process with lower identifier begins an election by sending an election message to those processes that have a higher identifier and awaits an answer message in response. If none arrives within time T, process considers itself a
coordinator and sends coordinator messages to lower-numbered processes.
Otherwise, process waits a while longer for coordinator message to arrive from new coordinator. If none arrives, it starts a new election.
If process receives a coordinator message
it notes that as the coordinator. If a process receives an election message
it sends back an answer message and begins another election, unless it has begun one already.
When a process is started t.o replace a crashed process, it begins an election
Evaluation Clearly meets liveness condition, by
assumption of reliable message delivery. If no process replaced (with same
identifier) meets E1. Impossible for two processes to decide they are coordinators since process with lower identifier will discover that the other exists and defer to it.
O(N^2) messages in the worst case.
The Bully Algorithm Figure 11.8
p1 p2
p3
p4
p1
p2
p3
p4
Ccoordinator
Stage 4
C
election
electionStage 2
p1
p2
p3
p4
C
election
answer
answer
electionStage 1
timeout
Stage 3
Eventually.....
p1
p2
p3
p4
election
answer
The election of coordinator p2, after the failure of p4 and then p3