November 2005Distributed systems: distributed algorithms 1 Distributed Systems: Distributed...

November 2005 Distributed systems: distributed algorithms

Distributed Systems:

Distributed algorithms

Overview of chapters

• Introduction• Co-ordination models and languages• General services• Distributed algorithms

– Ch 10 Time and global states, 11.4-11.5– Ch 11 Coordination and agreement, 12.1-12.5

• Shared data

• Building distributed services

This chapter: overview• Introduction• Logical clocks• Global states• Failure detectors• Mutual exclusion• Elections• Multicast communication• Consensus and related problems

Logical clocks• Problem: ordering of events

– requirement for many algorithms– physical clocks cannot be used

• use causality:– within a single process: observation– between different processes: sending of a

message happens before receiving the same message

Logical clocks (cont.)

• Formalization: happens before relation

• Rules: – if x happens before y in any process p

then x y – for any message m: send (m) receive (m)– if x y and y z

then x z

• Implementation: logical clocks

• Logical clock– counter appropriately incremented– one counter per process

• Physical clock– counts oscillations occurring in a crystal at a

definitive frequency

• Rules for incrementing local logical clock1 for each event (including send) in process p:

Cp := Cp + 1

2 when a process sends a message m, it piggybacks on m the value of Cp

3 on receiving (m, t), a process q

• computes Cq := max (Cq, t)

• applies rule 1: Cq := Cq +1

Cq is logical time for event receive(m)

• Logical timestamps: example

P3 •1

• C(x) logical clock value for event x

• Correct usage: if x y then C(x) < C(y)

• Incorrect usage: if C(x) < C(y) then x y

• Solution: Logical vector clocks

• Vector clocks for N processes:– at process Pi: Vi[j] for j = 1, 2,…,N

– Properties:

if x y then V(x) < V(y)

if V(x) < V(y) then x y

• Rules for incrementing logical vector clock1 for each event (including send) in process Pi:

Vi[i] := Vi[i] + 1

2 when a process Pi sends a message m, it piggybacks on m the value of Vi

3 on receiving (m, t), a process Pi

• apply rule 1

• Vi[j] := max(Vi[j] , t[j]) for j = 1, 2,…, N

• Logical vector clocks : example

(2,0,0)(1,0,0)

(2,1,0) (2,2,0)

(2,2,2)(0,0,1)

Physical time

Global states• Detect global properties

message

garbage object

objectreference

a. Garbage collection

p2p1 wait-for

wait-forb. Deadlock

activatepassive passivec. Termination

Global states (cont.)

• Local states & events– Process Pi : ei

k events

sik state, before event k

– History of Pi :

hi = < ei0, ei

1, ei2,…>

– Finite prefix of history of Pi :

hik = < ei

0, ei1, ei

2,…, eik >

• Global states & events– Global history

H = h1 h2 h3 … hn

– Global state (when?)

S = ( s1p, s2

q, …, snu)

consistent?– Cut of the systems execution

C = h1c1 h1

c2 … h1cn

• Example of cuts:

p2Physical

Consistent cutInconsistent cut

• Finite prefix of history of Pi :

hik = < ei

0, ei1, ei

2,…, eik >

• Cut of the systems executionC = h1

c1 h1c2 … h1

• Consistent cut C e C, f e f C

• Consistent global statecorresponds to consistent cut

• Model execution of a (distributed) system

S0 S1 S2 S3 …

– Series of transitions between consistent states

– Each transition corresponds to one single event• Internal event

• Sending message

• Receiving message

– Simultaneous events order events

• Definitions:– Run = ordering of all events (in a global history)

consistent with each local history’s ordering

– Linearization =consistent run +consistent with

– S’ reachable from S linearization: … S … S’ …

• Kinds of global state predicates:– Stable

– Safety

– Liveness

= true in SS’, S … S’ = true in S’

= undesirable propertyS0 = initial state of systemS, S0 … S = false in S

= desirable propertyS0 = initial state of systemS, S0 … S = true in S

• Snapshot algorithm of Chandy & Lamport– Record consistent global state– Assumptions:

• Neither channels nor processes fail• Channels are unidirectional and provide FIFO-

ordered message delivery• Graph of channels and processes is strongly

connected• Any process may initiate a global snapshot• Process may continue their execution during the

snapshot

• Snapshot algorithm of Chandy & Lamport– Elements of algorithm

• Players: processes Pi with– Incoming channels– Outgoing channels

• Marker messages• 2 rules

– Marker receiving rule– Marker sending rule

– Start of algorithm• A process acts as it received a marker message

Marker receiving rule for process pi

On pi’s receipt of a marker message over channel c:if (pi has not yet recorded its state) it

records its process state now;records the state of c as the empty set;turns on recording of messages arriving over other incoming channels;

else pi records the state of c as the set of messages it has received over c since it saved its state.

end if

Marker sending rule for process pi

After pi has recorded its state, for each outgoing channel c: pi sends one marker message over c (before it sends any other message over c).

• Example:

p1 p2c2

account widgets

$1000 (none)

account widgets

$50 2000

p1 p2(empty)<$1000, 0> <$50, 2000>

(empty)

1. Global state S 0

2. Global state S 1 p1 p2(Order 10, $100), M<$900, 0> <$50, 2000>

(empty)

3. Global state S 2 p1 p2(Order 10, $100), M<$900, 0> <$50, 1995>

(five widgets)

(M = marker message)

4. Global state S 3 p1 p2(Order 10, $100)<$900, 5> <$50, 1995>

(empty)

c1 C2 = <>C1=<(five widgets)>

p1 p2(empty)<$1000, 0> <$50, 2000>

(empty)

1. Global state S 0

(M = marker message)

(empty)

• Observed state– Corresponds to consistent cut– Reachable!

Sinit Sfinal

actual execution e 0,e1,...

recording recording begins ends

pre-snap: e '0,e'1,...e'R-1 post-snap: e'R,e'R+1,...

Failure detectors• Properties

– Unreliable failure detector: answers with

• Suspected• Unsuspected

– Reliable failure detector: answers with

• Failed• Unsuspected

• Implementation– Every T sec: multicast by P of “P is here”– Maximum on message transmission time:

• Asynchronous system: estimate E• Synchronous system: absolute bound A

No “P is here” within T + E sec

No “P is here” within T + A sec

Mutual exclusion• Problem: how to give a single process

temporarily a privilege?– Privilege = the right to access a (shared)

resource– resource = file, device, window,…

• Assumptions– clients execute the mutual exclusion algorithm– the resource itself might be managed by a

server– Reliable communication

Mutual exclusion (cont.)

• Basic requirements:– ME1: at most one process might execute

in the shared resource at any time

(Safety)– ME2: a process requesting access to the

shared resource is eventually granted it (Liveness)

– ME3: Access to the shared resource should be granted in happened-before order

(Ordering or fairness)

• Solutions:– central server algorithm

– distributed algorithm using logical clocks

– ring-based algorithm

– voting algorithm

• Evaluation– Bandwidth (= #messages to enter and exit)

– Client delay (incurred by a process at enter and exit)

– Synchronization delay (delay between exit and enter)

central server algorithm• Central server offering 2 operations:

– enter()• if resource free

then operation returns without delayelse request is queued and return from operation is delayed

– exit()• if request queue is empty

then resource is marked freeelse return for a selected request is executed

central server algorithm• Example:

Server

Queue:

Enter()

Server

Queue:

Enter()

Server

Queue:

Server

Queue:

enter()

Server

Queue:

enter()

Server

Queue:

enter()

Server

Queue:

enter()

Server

Queue:

enter()

enter() exit()

Server

Queue:

enter()

Server

Queue:

enter()

central server algorithm• Evaluation:

– ME3 not satisfied!

– Performance:• single server is performance bottleneck

• Enter critical section: 2 messages

• Synchronization: 2 messages between exit of one process and enter of next

– Failure: • Central server is single point of failure

• what if a client, holding the resource, fails?

• Reliable communication required

ring-based algorithm • All processes arranged in a

– unidirectional

– logical

• token passed in ring

• process with token has access to resource

ring-based algorithm

P2 can use resource

P2 stopped using resource and forwarded token

P3 doesn’t need resource and forwards token

ring-based algorithm • Evaluation:

– ME3 not satisfied

– efficiency

• high when high usage of resource

• high overhead when very low usage

– failure

• Process failure: loss of ring!

distributed algorithm using logical clocks• Distributed agreement algorithm

– multicast requests to all participating processes– use resource when all other participants agree

(= reply received)• Processes

– keep logical clock; included in all request messages

– behave as finite state machine:• released• wanted• held

distributed algorithm using logical clocks• Ricart and Agrawala’s algorithm: process Pj

– on initialization:• state := released;

– to obtain resource:• state := wanted;

• T = logical clock value for next event;

• multicast request to other processes <T, Pj>;

• wait for n-1 replies;

• state := held;

distributed algorithm using logical clocks• Ricart and Agrawala’s algorithm: process Pj

– on receipt of request <Ti, Pi> :

• if (state = held) or (state = wanted and (T,Pj) < (Ti,Pi) )

then queue request from Pi

else reply immediately to Pi

– to release resource:• state := released;

• reply to any queued requests;

distributed algorithm using logical clocks• Ricart and Agrawala’s algorithm: example

– 3 processes

– P1 and P2 will request it concurrently

– P3 not interested in using resource

distributed algorithm using logical clocks• Ricart and Agrawala’s algorithm: example

releasedQueue:

distributed algorithm using logical clocks

wantedQueue:

releasedQueue:

<41,P1>

<41,P1>0

wantedQueue:

releasedQueue:

<41,P1>

<34,P2>

wantedQueue:

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

wantedQueue:

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

wantedQueue:P1

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

wantedQueue:P1

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

wantedQueue:P1

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

heldQueue:P1

releasedQueue:

<41,P1>

<34,P2>

<43,P3>

<45,P3>

wantedQueue:

heldQueue:P1

releasedQueue:

wantedQueue:

releasedQueue:

distributed algorithm using logical clocks• Evaluation

– Performance:

• expensive algorithm:

2 * ( n - 1 ) messages to get resource

• Client delay: round trip

• Synchronization delay: 1 message to pass section to another process

• does not solve the performance bottleneck

– Failure

• each process must know all other processes

voting algorithm• Approach of Maekawa:

– Communication with subset of partners should suffice– Candidate collects sufficient votes

• Voting set: Vi = voting set for pi

i, j: Vi Vj – | Vi | = K

– Pj contained in M voting sets– Optimal solution

• K ~ N• M = K

voting algorithmOn initialization

state := RELEASED;voted := FALSE;

For pi to enter the critical sectionstate := WANTED;Multicast request to all processes in Vi – {pi};Wait until (number of replies received = (K – 1));state := HELD;

On receipt of a request from pi at pj (i ≠ j)if (state = HELD or voted = TRUE)then

queue request from pi without replying; else

send reply to pi;voted := TRUE;

end if

Maekawa’s algorithm

voting algorithmMaekawa’s algorithm (cont.)For pi to exit the critical section

state := RELEASED;Multicast release to all processes in Vi – {pi};

On receipt of a release from pi at pj (i ≠ j)if (queue of requests is non-empty)then

remove head of queue – from pk, say; send reply to pk;voted := TRUE;

else voted := FALSE;

end if

voting algorithm• Evaluation

– Properties• ME1: OK

• ME2: NOK, deadlock possiblesolution: process requests in order

• ME3: Ok– Performance

• Bandwidth: on enter 2 N messages + on exit N messages

• Client delay: round trip

• Synchronization delay: round trip

– Failure:• Crash of process on another voting set can be tolerated

• Discussion

– algorithms are expensive and not practical

– algorithms are extremely complex in the

presence of failures

– better solution in most cases:

• let the server, managing the resource, perform

concurrency control

• gives more transparency for the clients

Elections• Problem statement:

– select a process from a group of processes– several processes may start election

concurrently

• Main requirement:– unique choice– Select process with highest id

• Process id

• <1/load, process id>

Elections (cont.)

• Basic requirements:– E1: participant pi set electedi = or

electedi = P; P process with highest id (Safety)

– E2: all processes pi participate and setelectedi or crash (Liveness)

not yet defined

Elections (cont.)

• Solutions:– Bully election algorithm

– Ring based election algorithm

• Evaluation

– Bandwidth ( ~ total number of messages)

– Turnaround time (the number of serialized message

transmission times between initiation and termination

of a single run)

Elections (cont.)

Bully election• Assumptions:

– each process has identifier– processes can fail during an election– communication is reliable

• Goal: – surviving member with the largest identifier is

elected as coordinator

Elections (cont.)

Bully election

• Roles for processes:

– coordinator

• elected process

• has highest identifier, at the time of election

– initiator

• process starting the election for some reason

Elections (cont.)

Bully election• Three types of messages:

– election message• sent by an initiator of the election

to all other processes with a higher identifier

– answer message• a reply message sent by the receiver of an election

message

– coordinator message• sent by the process becoming the coordinator

to all other processes with lower identifiers

Elections (cont.)

Bully election• Algorithm:

– send election message:• process doing it is called initiator

• any process may do it at any time

• when a failed process is restarted, it starts an election, even though the current coordinator is functioning (bully)

– a process receiving an election message• replies with an answer message

• will start an election itself (why?)

Elections (cont.)

Bully election

• Algorithm:

– actions of an initiator

• when not receiving an answer message within a

certain time (2Ttrans +Tprocess) becomes coordinator

• when having received an answer message

( a process with a higher identifier is active)

and not receiving a coordinator message (after x

time units)

will restart elections

Elections (cont.)

Bully election

• Example: election of P2 after failure of P3 and P4

P1 initiator

election

Elections (cont.)

Bully election

P2 and P3 reply and start election

election

answer

Elections (cont.)

Bully election

Election messages of P2 arrive

election

answer

Elections (cont.)

Bully election

P3 replies

election

answer

Elections (cont.)

Bully election

P3 fails

election

answer

Elections (cont.)

Bully election

Timeout at P1 : election starts again

election

answer

Elections (cont.)

Bully election

Timeout at P1 : election starts again

election

answer

Elections (cont.)

Bully election

P2 replies and starts election

election

answer

Elections (cont.)

Bully election

P2 receives no replies coordinator

election

answer

Elections (cont.)

Bully election• Evaluation

– Correctness: E1 & E2 ok, if• Reliable communication• No process replaces crashed process

– Correctness: no guarantee for E1, if• Crashed process is replaced by process with same id• Assumed timeout values are inaccurate (= unreliable failure

detector)

– Performance• Worst case: O(n2)• Optimal: bandwidth: n-2 messages

turnaround: 1 message

Elections (cont.) Ring based election

• Assumptions:

– processes arranged in a logical ring

– each process has an identifier: i for Pi

– processes remain functional and reachable during the

algorithm

• Messages:– forwarded over logical ring– 2 types:

• election: used during electioncontains identifier

• elected: used to announce new coordinator

• Process States:– participant– non-participant

• Algorithm

– process initiating an election

• becomes participant

• sends election message to its neighbour

initiator

• Algorithm– upon receiving an election message, a process

compares identifiers:• Received: identifier in message

• own: identifier of process

– 3 cases:• Received > own

• Received < own

• Received = own

• Algorithm– receive election message

• Received > own– message forwarded

– process becomes participant

initiator

• Algorithm– receive election message

• Received > own– message forwarded

• Received < own and process is non-participant– substitutes own identifier in message

– message forwarded

initiator

• Algorithm: – receive election message

• Received > own

– ...

• Received < own and process is non-participant– ...

• Received = own– identifier must be greatest

– process becomes coordinator

– new state: non-participant

– sends elected message to neighbour

P5 coordinator

• Algorithm receive election message• Received > own

– message forwarded– process becomes participant

• Received < own and process is non-participant– substitutes own identifier in message– message forwarded– process becomes participant

• Received = own– identifier must be greatest– process becomes coordinator– new state: non-participant– sends elected message to neighbour

• Algorithm – receive elected message

• participant:– new state: non-participant

– forwards message

• coordinator:– election process completed

P5 coordinator

P5 coordinator 21

P5 coordinator

• Evaluation– Why is condition

Received < own and process is non-participantnecessary? (see next slide for full algorithm)

– Number of messages:• worst case: 3 * n - 1• best case: 2 * n

– concurrent elections: messages are extinguished • as soon as possible• before winning result is announced

• Algorithm receive election message• Received > own

– message forwarded– process becomes participant

• Received < own and process is non-participant– substitutes own identifier in message– message forwarded– process becomes participant

• Received = own– identifier must be greatest– process becomes coordinator– new state: non-participant– sends elected message to neighbour

Multicast communication• Essential property:

– 1 multicast operation <> multiple sendsHigher efficiencyStronger delivery guarantees

• Operations: g = group, m = message

– X-multicast(g, m)– X-deliver(m)

• <> receive(m)

– X additional property Basic, Reliable, FifO,….

Multicast communication(cont.) IP multicast

• Datagram operations – with multicast IP address

• Failure model cfr UDP

– Omission failures– No ordering or reliability guarantees

Multicast communication(cont.) Basic multicast

• = IP multicast + delivery guarantee if multicasting process does not crash

• Straightforward algorithm: (with reliable send)

• Ex. practical algorithm using IP-multicast

To B-multicast(g, m):

p g: send(p, m)

On receive(m) at p:

B-deliver(m)

Multicast communication(cont.) Reliable multicast

• Properties:– Integrity (safety)

• A correct process delivers a message at most once

– Validity (liveness)

• Correct process p multicasts m p delivers m

– Agreement (liveness)

correct process p delivering m all correct processes will deliver m

– Uniform agreement (liveness)

process p (correct or failing) delivering m all correct processes will deliver m

• 2 algorithms:

1. Using B-multicast

2. Using IP-multicast + piggy backed acks

Algorithm 1 with B-multicast

• Correct?

– Integrity

– Validity

– Agreement

• Efficient?

– NO: each message transmitted g times

Algorithm 1 with B-multicast

Messageprocessing

Delivery queueHold-back

deliver

Incomingmessages

When delivery guarantees aremet

Algorithm 2 with IP-multicast

Data structures at process p:

Sgp : sequence number

Rgq : sequence number of the latest message it has delivered from q

On initialization:

Sgp = 0

For process p to R-multicast message m to group g

IP-multicast (g, <m, Sgp , <q, Rg

q > >)

Sgp ++

On IP-deliver (<m, S, <q, R>>) at q from p

Algorithm 2 with IP-multicastOn IP-deliver (<m, S, <q, R>>) at q from p

if S = Rgp + 1

then R-deliver (m)

Rgp ++

check hold-back queue

else if S > Rgp + 1

then store m in hold-back queue

request missing messages endif

if R > Rg then request missing messages endif

• Correct?– Integrity: seq numbers + checksums

– Validity: if missing messages are detected

– Agreement: if copy of message remains available

• 3 processes in group: P, Q, R

• State of process:– S: Next_sequence_number

– Rq: Already_delivered from Q

– Stored messages

• Presentation:

Algorithm 2 with IP-multicast: example

P: 2Q: 3 R: 5< >

• Initial state:

P: 0Q: -1 R: -1< >

Q: 0P: -1 R: -1< >

R: 0P: -1 Q: -1< >

• First multicast by P:

P: 1Q: -1 R: -1< mp0 >

Q: 0P: -1 R: -1< >

R: 0P: -1 Q: -1< >

P: mp0, 0, <Q:-1, R:-1>

• Arrival multicast by P at Q:

P: 1Q: -1 R: -1< mp0 >

Q: 0P: 0 R: -1< mp0 >

R: 0P: -1 Q: -1< >

P: mp0, 0, <Q:-1, R:-1>

• New state:

P: 1Q: -1 R: -1< mp0 >

Q: 0P: 0 R: -1< mp0 >

R: 0P: -1 Q: -1< >

• Multicast by Q:

P: 1Q: -1 R: -1< mp0 >

Q: 1P: 0 R: -1< mp0 ,mq0 >

R: 0P: -1 Q: -1< >

Q: mq0, 0, <P:0, R:-1>

• Arrival of multicast by Q:

P: 1Q: 0 R: -1< mp0 ,mq0 >

Q: 1P: 0 R: -1< mp0 , ,mq0 >

R: 0P: -1 Q: 0< mq0 >

Q: mq0, 0, <P:0, R:-1>

• When to delete stored messages?

P: 1Q: 0 R: -1< mp0 ,mq0 >

Q: 1P: 0 R: -1< mp0 , ,mq0 >

R: 0P: -1 Q: 0< mq0 >

Multicast communication(cont.) Ordered multicast

• FIFO

• Causal

• Total

if a correct process P:multicast(g, m);multicast(g, m’);

then for all correct processes:deliver(m’) deliver(m) before deliver(m’)

if multicast(g, m) multicast(g, m’)then for all correct processes:

deliver(m’) deliver(m) before deliver(m’)

• Total

• FIFO-Total = FIFO + Total

• Causal-Total = Causal + Total

• Atomic = reliable + Total

if p: deliver(m) deliver( m’)then for all correct processes:

deliver(m’) deliver(m) before deliver(m’)

P1 P2 P3

Notice the consistent ordering of totally ordered messages T1 and T2,

the FIFO-related messages F1 and F2 and the causally

related messages C1 and C3 – and the otherwise arbitrary delivery ordering of messages.

Multicast communication(cont.) FIFO multicast

• Alg. 1: R-multicast using IP-multicast

– Correct?• Sender assigns Sg

• Receivers deliver in this order

• Alg. 2 on top of any B-multicast

Algorithm 2 on top of any B-multicast

Sgp : sequence number

Rgq : sequence number of the latest message it has delivered from q

On initialization:

Sgp = 0; Rg

q = -1

For process p to FO-multicast message m to group g

B-multicast ( g, <m, Sgp >)

Sgp ++

On B-deliver (<m, S >) at q from p

Algorithm 2 on top of any B-multicast

On B-deliver (<m, S >) at q from p

if S = Rgp + 1

then FO-deliver (m)

Rgp ++

check hold-back queue

else if S > Rgp + 1

then store m in hold-back queue endif

Multicast communication(cont.) TOTAL multicast

• Basic approach:– Sender: assign totally ordered identifiers iso process

ids– Receiver: deliver as for FIFO ordering

• Alg. 1: use a (single) sequencer process

• Alg. 2: participants collectively agree on the assignment of sequence numbers

Multicast communication(cont.) TOTAL multicast: sequencer process

• Correct?

• Problems?– A single sequencer process

• bottleneck• single point of failure

Multicast communication(cont.) TOTAL multicast: ISIS algorithm

• Approach:– Sender:

• B-multicasts message

– Receivers:• Propose sequence numbers to sender

– Sender: • uses returned sequence numbers to

generate agreed sequence number

1 Message

2 Proposed Seq

3 Agreed Seq

Agp : largest agreed sequence number

Pgp : largest proposed sequence number by P

On initialization:

Pgp = 0

For process p to TO-multicast message m to group g

B-multicast ( g, <m, i >) /i = unique id for m

On B-deliver (<m, i> ) at q from p

Pgq = max (Ag

q, Pgq) + 1

send (p, < i, Pgq >)

store <m, i, Pgq > in hold-back queue

On receive( i, P) at p from q

wait for all replies; a is the largest reply

B-multicast(g, < “order”, i, a >)

On B-deliver (<“order”, i, a > ) at q from p

Agq = max (Ag

attach a to message i in hold-back queue

reorder messages in hold-back queue (increasing sequence numbers)

while message m in front of hold-back queue has been assigned

an agreed sequence number

do remove m from hold-back queue

TO-deliver (m)

• Correct?– Processes will agree on sequence number for

a message– Sequence numbers are monotonically

increasing– No process can prematurely deliver a

message• Performance

– 3 serial messages!• Total ordering

– <> FIFO– <> causal

Multicast communication(cont.) Causal multicast

• Limitations:– Causal order only by multicast operations– Non-overlapping, closed groups

• Approach:– Use vector timestamps– Timestamp = count number of multicast

messages

Multicast communication(cont.) Causal multicast: vector timestamps

Meaning?

Multicast communication(cont.) Causal multicast: vector timestamps

• Correct?– Message timestamp

m V m’ V’

– Given multicast(g,m) multicast(g,m’)proof V < V’

Consensus & related problems• System model

– N processes pi

– Communication is reliable– Processes mail fail

• Crash• Byzantine

– No message signing• Message signing limits the harm a faulty process can do

• Problems– Consensus– Byzantine generals– Interactive consistency

Consensus

• Problem statement pi: undecided state & pi proposes vi

– Message exchanges

– Finally: each process pi sets decision variable di, enters decided state and may not change di

• Requirements:– Termination: eventually each correct process pi sets di

– Agreement: pi and pj correct & in decided state di = dj

– Integrity: correct processes all propose same value d any process in decided state has

chosen d

Consensus

• Simple algorithm (no process failures)

– Collect processes in group g– For each process pi:

• R-multicast(g, vi)• Collect N values• d = majority (v1,v2,...,vN)

• Problems with failures:– Crash: detected? not in asynchronous systems– Byzantine? Faulty process can send around

different values

Byzantine generals• Problem statement

– Informal: • agree to attack or to retreat• commander issues the order• lieutenants are to decide to attack or retreat• all can be ‘treacherous’

– Formal• One process proposes value• Others are to agree

• Requirements:– Termination: each process eventually decides– Agreement: all correct processes select the same value– Integrity: commander correct other correct processes

select value of commander

Interactive consistency

• Problem statement– Correct processes agree upon a vector of values (one

value for each process)

• Requirements:– Termination: each process eventually decides

– Agreement: all correct processes select the same value

– Integrity: if pi correct

then all correct processes decide on vi as

the i-the component of their vector

November 2005Distributed systems: distributed algorithms 1 Distributed Systems: Distributed...

Documents

Transcript of November 2005Distributed systems: distributed algorithms 1 Distributed Systems: Distributed...

Distributed Algorithms Mutual exclusion

Distributed Algorithms Distributed Transactionsdisi.unitn.it/~montreso/ds/handouts/13-acid.pdf · Distributed Algorithms Distributed Transactions ... I Distributed rollback recovery

Distributed Bayesian Algorithms for Fault-Tolerant …ceng.usc.edu/.../KrishnamachariIyengarBayesianEventDetectionTC03.pdf · Distributed Bayesian Algorithms for Fault-Tolerant Event

Distributed Algorithms – 2g1513

Distributed Algorithms - mk.cs.msu.ru

Distributed Hash Table Algorithms

1 Distributed Algorithms – Mutual Exclusion Distributed Algorithms Distributed Mutual Exclusion Ludovic HENRIO Ludovic.henrio@inria.fr Borrowed from plenty.

Classical Distributed Algorithms with DDS · A Distributed Mutual Exclusion Based Distributed-Queue • Different distributed algorithms can be used to implement the specification

Parallel Algorithms & Distributed Computing

October 2005Distributed systems: coordination models and languages 1 Distributed Systems: Coordination models and languages.

DISTRIBUTED ALGORITHMS AND BIOLOGICAL SYSTEMSsaketn/BDA2014/Material/... · • This has yielded interesting distributed algorithms results. • Q: But what can distributed algorithms

Distributed Algorithms Epidemic Dissemination

Distributed Graph Algorithms

Distributed Algorithms for Message-Passing Systemsdisplexity/Angers/Distr-algo-presentation.pdf · 8.2.3 On Distributed Checkpointing Algorithms .....198 8.3 Checkpointing Algorithms

Distributed Algorithms

1 Lecture 1 Distributed Algorithms Gadi Taubenfeld © 2011 Distributed Algorithms Gadi Taubenfeld Lecture 1 INTRODUCTION.

UBI529 Distributed Algorithms

Distributed Algorithms PracticalByzantineFaultTolerancedisi.unitn.it/~montreso/ds/slides17/10-pbft.pdf · Distributed Algorithms PracticalByzantineFaultTolerance AlbertoMontresor

IOA: Distributed Algorithms Distributed Programs

Distributed Algorithms and Sublinear-Time Algorithms