Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Synchronization: outline ...

50
Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson

Transcript of Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Synchronization: outline ...

Operating Systems, 2011, Danny Hendler & Amnon Meisels 1

Distributed Synchronization: outline

Introduction

Causality and timeo Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson

2Operating Systems, 2011, Danny Hendler & Amnon Meisels

A distributed system is a collection of independent computational nodes, communicating over a network, that is abstracted as a single coherent systemo Grid computingo Cloud computing (“software as a

service”)o Peer-to-peer computingo Sensor networkso …

A distributed operating system allows sharing of resources and coordination of distributed computation in a transparent manner

Distributed systems

(a), (b) – a distributed system(c) – a multiprocessor

3Operating Systems, 2011, Danny Hendler & Amnon Meisels

Underlies distributed operating systems and algorithms

Processes communicate via message passing (no shared memory)

Inter-node coordination in distributed systems challenged byo Lack of global stateo Lack of global clocko Communication links may failo Messages may be delayed or losto Nodes may fail

Distributed synchronization supports correct coordination in distributed systemso May no longer use shared-memory based locks and semaphores

Distributed synchronization

4Operating Systems, 2011, Danny Hendler & Amnon Meisels

Distributed Synchronization: outline

Introduction

Causality and timeo Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

5Operating Systems, 2011, Danny Hendler & Amnon Meisels

Eventso Sending a messageo Receiving a messageo Timeout, internal interrupt

Processors send control messages to each othero send(destination, action; parameters)

Processes may declare that they are waiting for events:o Wait for A1, A2, …., An

A1(source; parameters) code to handle A1

. . An(source; parameters) code to handle An

Distributed computation model

6Operating Systems, 2011, Danny Hendler & Amnon Meisels

A distributed system has no global state nor global clock no global order on all events may be determined

Each processor knows total orders on events occurring in it

There is a causal relation between the sending of a message and its receipt

Causality and events ordering

Lamport's happened-before relation H

1. e1 <p e2 e1 <H e2 (events within same processor are ordered)

2. e1 <m e2 e1 <H e2 (each message m is sent before it is received)

3. e1 <H e2 AND e2 <H e3 e1 <H e3 (transitivity)

Leslie Lamport (1978): “Time, clocks, and the ordering of events in a distributed system”

7Operating Systems, 2011, Danny Hendler & Amnon Meisels

Causality and events ordering (cont'd)TimeP1 P2 P3

e1

e2

e3

e4

e5

e6

e7

e8

e1 <H e7 ? Yes. e1 <H e3 ? Yes. e1 <H e8 ? Yes.

e5 <H e7 ? No.

8Operating Systems, 2011, Danny Hendler & Amnon Meisels

1 Initially my_TS=0

2 Upon event e,3 if e is the receipt of message m4 my_TS=max(m.TS, my_TS)5 my_TS++6 e.TS=my_TS7 If e is the sending of message m8 m.TS=my_TS

Lamport's timestamp algorithm

To create a total order, ties are broken by process ID

9Operating Systems, 2011, Danny Hendler & Amnon Meisels

Lamport's timestamps (cont'd)TimeP1 P2 P3

e1

e2

e3

e4

e5

e6

e7

e8

1.1

2.1

3.1

1.3

2.2

3.2

4.3

1.2

10Operating Systems, 2011, Danny Hendler & Amnon Meisels

Distributed Synchronization: outline

Introduction

Causality and timeo Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

11Operating Systems, 2011, Danny Hendler & Amnon Meisels

Lamport's timestamps define a total ordero e1 <H e2 e1.TS < e2.TS

o However, e1.TS < e2.TS e1 <H e2 does not hold, in general.

Vector timestamps - motivation

Definition: Message m1 casually precedes message m2 (written as m1 <c m2) if

s(m1) <H s(m2) (sending m1 happens before sending m2)

Definition: causality violation occurs if m1 <c m2 but r(m2) <p r(m1).

In other words, m1 is sent to processor p before m2 but is received after it.

Lamport's timestamps do not allow to detect (nor prevent) causality violations.

12Operating Systems, 2011, Danny Hendler & Amnon Meisels

Causality violations – an exampleP1 P2 P3

Migrate O to P2

Where is O?

M1

On p2

Where is O?

I don’t know

M2

M3

ERROR!

Causality violation between… M1 and M3.

13Operating Systems, 2011, Danny Hendler & Amnon Meisels

Vector timestamps1 Initially my_VT=[0,…,0]

2 Upon event e,3 if e is the receipt of message m4 for i=1 to M5 my_VT[i]=max(m.VT[i], my_VT[i])6 My_VT[self]++7 e.VT=my_VT8 if e is the sending of message m9 m.VT=my_VT

For vector timestamps it does hold that: e1 <VT e2 e1 <H e2

e1.VT ≤V e2.VT if and only if e1.VT[i] ≤ e2.VT[i], for every i=1,…,M

e1.VT <V e2.VT if and only if e1.VT ≤V e2.VT and e1.VT ≠ e2.VT

14Operating Systems, 2011, Danny Hendler & Amnon Meisels

Vector timestamps (cont'd)P1 P2 P3 P4

1 1

1

1

22

232

334

54

3

5

4

6e

e.VT=(3,6,4,2)

15Operating Systems, 2011, Danny Hendler & Amnon Meisels

VTs can be used to detect causality violations P1 P2 P3

Migrate O to P2

Where is O?

M1

On p2

Where is O?

I don’t know

M2

M3

ERROR!

Causality violation between… M1 and M3.

(1,0,0) (0,0,1)

(2,0,1)

(3,0,1) (3,0,2)

(3,0,3)(3,1,3)

(3,2,3)

(3,3,3)

(3,2,4)

16Operating Systems, 2011, Danny Hendler & Amnon Meisels

Distributed Synchronization: outline

Introduction

Causality and timeo Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

17Operating Systems, 2011, Danny Hendler & Amnon Meisels

A processor cannot control the order in which it receives messages…But it may control the order in which they are delivered to applications

Preventing causality violations

Application

FIFO ordering

Message passing

Deliver in FIFO order

Assigntimestamps

Message arrival order

x1

Deliverx1

x2

Deliverx2

x4

Bufferx4

x5

Bufferx5

x3

Delieverx3 x4 x5

Protocol for FIFO message delivery(as in, e.g., TCP)

18Operating Systems, 2011, Danny Hendler & Amnon Meisels

Preventing causality violations (cont'd) Senders attach a timestamp to each message

Destination delays the delivery of out-of-order messages

Hold back a message m until we are assured that no message m’ <H m will be deliveredo For every other process p, maintain the earliest timestamp of a message m

that may be delivered from po Do not deliver a message if an earlier message may still be delivered from

another process

19Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

For each processor p, the earliest timestamp with which a message from p may still be delivered

20Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

For each process p, the messages from p that were received but were not delivered yet

21Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

List of messages to be delivered as a result of m’s receipt

22Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

If no blocked messages from p, update earliest[p]

23Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

m is now the most recent message from p not yet delivered

24Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

If there is a process k with delayed messages for which it is now safe to deliver its earliest message…

25Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

Remove k'th earliest message from blocked queue, make sure it is delivered

26Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

If there are additional blocked messages of k, update earliest[k] to be the timestamp of the earliest such message

27Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

Otherwise the earliest message that may be delivered from k would have previous timestamp with the k'th component incremented

28Operating Systems, 2011, Danny Hendler & Amnon Meisels

Algorithm for preventing causality violations1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p4 Delivery_list = {}5 If (blocked[p] is empty)6 earliest[p]=m.timestamp7 Add m to the tail of blocked[p]8 While ( k such that blocked[k] is non-empty AND

i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])9 remove the message at the head of blocked[k], put it in delivery_list10 if blocked[k] is non-empty11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]12 else13 increment the k'th element of earliest[k]14 End While 15 Deliver the messages in delivery_list, in causal order

Finally, deliver set of messages that will not cause causality violation (if there are any).

29Operating Systems, 2011, Danny Hendler & Amnon Meisels

Execution of the algorithm as multicast

P1 P2 P3

(0,1,0)

Since the algorithm is “interested” only in causal order of sending events, vector timestamp is incremented only upon send events.

(1,1,0)

p3 state

earliest[1] earliest[2] earliest[3]

(1,0,0) (0,1,0) (0,0,1)m1

m2

(1,1,0) (0,1,0) (0,0,1)

Upon receipt of m2

Upon receipt of m1

(1,1,0) (0,1,0) (0,0,1)deliver m1

(1,1,0) (0,2,0) (0,0,1)deliver m2

(2,1,0) (0,2,0) (0,0,1)

30Operating Systems, 2011, Danny Hendler & Amnon Meisels

Introduction

Causality and timeo Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

Distributed Synchronization: outline

31Operating Systems, 2011, Danny Hendler & Amnon Meisels

Assume we would like to implement a distributed deadlock-detector We record process states to check if there is a waits-for-cycle

Snapshots motivation – phantom deadlock

P1 r1 r2 P2

request

OK

request

OK

release request

OK

12

3

4

p2

r1

r2

p1

Actual waits-for graph

p2

r1

r2

p1

Observed waits-for graph

request

Global system state:o S=(s1, s2, …, sM) – local processor states

o The contents Li,j =(m1, m2, …, mk) of each communication channel Ci,j

(channels assumed to be FIFO)These are messages sent but not yet received

Global state must be consistento If we observe in state si that pi received message m from pk, then in

observation sk ,k must have sent m.

o Each Li,j must contain exactly the set of messages sent by pi but not yet received by pj, as reflected by si, sj.

What is a snapshot?

• Snapshot state much be consistent, one that might have existed during the computation.

• Observations must be mutually concurrent that is, no observation casually precedes another observation

Upon joining the algorithm a process records its local state

The process that initiates the snapshot sends snapshot tokens to its neighbors (before sending any other messages)o Neighbors send them to their neighbors – broadcast

Upon receiving a snapshot token:o a process must record its state prior to receiving additional messageso Must then send tokens to all its other neighbors

How shall we record sets Lp,q?o q receives token from p and that is the first time q receives token

Lp,q={ }o q receives token from p but q received token before

Lp,q={all messages received by q from p since q received token}

Snapshot algorithm – informal description

34Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M]processor_state S[1…M]channel_state [1..M][1..M]

The version number of my current snapshot

35Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M]processor_state S[1…M]channel_state [1..M][1..M]

current_snap[r] contains version number of snapshot initiated by processor r

36Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M]processor_state S[1…M]channel_state [1..M][1..M] tokens_received[r]

contains the number of tokens received for the snapshot initiated by processor r

37Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M]processor_state S[1…M]channel_state [1..M][1..M]

processor_state[r] contains the state recorded for the snapshot of processor r

38Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M]processor_state S[1…M]channel_state [1..M][1..M]

channel_state[r][q] records the state of channel from q to current processor for the snapshot initiated by processor r

39Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

40Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

41Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Increment version number of this snapshot, record my local state

42Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Send snapshot-token on all outgoing channels, initialize number of received tokens for my snapshot to 0

43Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Upon receipt from q of TOKEN for snapshot (r,version)

44Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

If this is first token for snapshot (r,version)

45Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Record local state for r'th snapshotUpdate version number of r'th snapshot

46Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Set of messages on channel from q is emptySend token on all outgoing channelsInitialize number of received tokens to 1

47Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Not the first token for (r,version)

48Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Yet another token for snapshot (r,version)

49Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

These messages are the state of the channel from q for snapshot (r,version)

50Operating Systems, 2011, Danny Hendler & Amnon Meisels

Snapshot algorithm – pseudo-codeexecute_snapshot()

Wait for a snapshot request or a token

Snapshot request:1 my_version++, current_snap[self]=my_version2 S[self] my_state3 for each outgoing channel q, send(q, TOEKEN; self, my_version)4 tokens_received[self]=0

TOKEN(q; r,version):5. If current_snap[r] < version6. S[r] my state7. current_snap[r]=version8. L[r][q] empty, send token(r, version) on each outgoing channel9. tokens_received[r] 110. else11. tokens_received[r]++12. L[r][q] all messages received from q since first receiving token(r,version)13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

If all tokens of snapshot (r,version) arrived, snapshot computation is over