Distributed Snapshots: Determining Global States of Distributed Systems
description
Transcript of Distributed Snapshots: Determining Global States of Distributed Systems
![Page 1: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/1.jpg)
Distributed Snapshots: Determining Global States
of Distributed SystemsJoshua Eberhardt
Research Paper: Kanianthra Mani Chandy and Leslie Lamport
![Page 2: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/2.jpg)
Background What is a distributed system?
Set of autonomous computers Communication network Software that integrates it into a single entity
![Page 3: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/3.jpg)
Figure 1
![Page 4: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/4.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Stability Detection
![Page 5: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/5.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Stability Detection
![Page 6: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/6.jpg)
Processes in Distributed Systems Process is an instance of a computer
program being executed. Processes in a distributed system
communicate by sending and receiving messages.
A process can record its own state and the message it sends and receives.
![Page 7: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/7.jpg)
Global States and Processes To determine a global state, a process p
must cooperate with other processes to record their own states and send them to p.
Main problem is to devise an algorithm to record global states.
![Page 8: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/8.jpg)
Global State Detection Problems Let y, be a predicate function defined over
the global states of the a distributed system D. (In other words, y(S) is true or false for a global
state S of D) The predicate y is a stable property of D if
y(S) implies y(S’) for global states S’ of D reachable from S of D
![Page 9: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/9.jpg)
Going Further Many distributed system problems can be
formulated as the general problem of creating an algorithm by which a process in a distributed system can determine whether a stable property y holds.
Examples Deadlock Detection Termination Detection
![Page 10: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/10.jpg)
Structure of Distributed Algorithms Structured as sequence of phases.
Transient Part Stable Part
Stability needs to be detected so that one phase can be terminated and another initiated. Termination of a Computational Phase vs.
Termination of a Computation
![Page 11: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/11.jpg)
Termination Phase The overall problem can be partitioned into
the problems of detecting the termination of one phase and initiating a new phase.
Example of a stable property The kth computational phase has terminated
where k = 1, 2, 3, … Thus we can determine the termination of the kth
phase for any given k.
![Page 12: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/12.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Properties Stability Detection
![Page 13: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/13.jpg)
Channels A distributed system consists of a finite set
of processes and a finite set of channels. Properties of channels.
Infinite buffers Error-free Deliver messages in order sent.
![Page 14: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/14.jpg)
Linking the Terms State of a channel
Sequence of messages sent along the channel. Process
Defined by a set of states, including the initial state and a set of events.
Event An atomic action that may change the state of a
process and the state of at most one channel that is incident of the process.
![Page 15: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/15.jpg)
Figure 2 Distributed
system with processes p, q, r and channels C1, C2, C3, C4.
![Page 16: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/16.jpg)
Events Can be defined by
Process p in which the event occurs State s of p before the event State s’ of p after the event Channel c whose state is altered by the event Message M sent along channel c
Based on these definitions we can define event e into a 5-tuple. <p, s, s’, M, c>
![Page 17: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/17.jpg)
Expanding to Global States Global state of a distributed system is a set
of component process and channel states. Initially, all of the states are at their initial state,
and as a consequence all of the channels would be the empty sequence.
Occurrences of events may change the global state.
![Page 18: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/18.jpg)
Events and Global States Remember e = <p, s, s’, M, c> We can say e can occur in a global state S:
The state of p in S is s If c is directed towards p, then the state of c in S
is a sequence of messages with M at the head. If c is directed away from p, then the state of c in
S is a sequence of messages with M at the tail.
![Page 19: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/19.jpg)
Going Further If c is directed towards p, then the state of c
in S is a sequence of messages with M at the head. Define a function next where next(S, e) is the
global state immediately after the occurrence of event e in global state S.
The value of next(S, e) is defined only if event e can occur in global state S.
![Page 20: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/20.jpg)
Computational Model Let seq = (ei: 0 < i < n) be a sequence of
events in component processes of a distributed system.
Si+1 = next(Si, ei) for (0 < i < n) where S0 is the initial global state.
We can say seq is a computation of the system iff ei can occur in Si
![Page 21: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/21.jpg)
Example: Single Token Conversation (Deterministic)
Simple distributed system
State Transition Diagram of a Process
![Page 22: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/22.jpg)
Example: Single Token Conversation (Deterministic)
![Page 23: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/23.jpg)
Example: Message Passing (Nondeterministic)
New State Transition Diagrams
![Page 24: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/24.jpg)
Example: Message Passing (Nondeterministic)
More then one way to change the initial global states, all subsequent states would then be different.
![Page 25: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/25.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Properties Stability Detection
![Page 26: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/26.jpg)
Motivation How it works:
Each process records its own state and the 2 processes that a channel is incident on cooperate in recording the channel state.
Algorithm is to be superimposed on the underlying computation.
Next example will show how we can record the state of a channel instantaneously. Let c be a channel from p to q.
![Page 27: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/27.jpg)
Single Token Example
Assume the state of process p is recorded as “in p”. Now assume that the global state transitions to “in c”. Suppose the states of c, c’, and q were also recorded in the global state “in c”.
This global state shows that there are two tokens! This shows inconsistency because the state of p was recorded
before p sent the message along c and the state of c is recorded after p sent the message.
![Page 28: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/28.jpg)
Notation Let n be the number of messages sent
along c before p’s state is recorded. Let n’ be the number of messages sent
along c before c’s state is recorded. In our example, this inconsistency shows
that n < n’ or (0 < 1)
![Page 29: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/29.jpg)
Another scenario
Suppose the state of c is recorded in global state “in p”. The system then transitions to the global state “in c” and the states of c’, p
and q are recorded in the global state “in c”. The recorded state shows no tokens in the system! This shows inconsistency when the state of c is recorded before p sends a
message along c and the state of p is recorded after p sends a message along c. Other words n > n’ (1 > 0)
To maintain consistency, n = n’
![Page 30: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/30.jpg)
In Relation to Messages Received Let m be the number of messages received along
c before q’s state is recorded. Let m’ be the number of messages received along
c before c’s state is recorded. To show consistency, m = m’ So for every state the number of messages
received along a channel can’t exceed the number of messages sent along that channel. In other words n > m and n’ > m’.
![Page 31: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/31.jpg)
Bank Example
![Page 32: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/32.jpg)
Bank Example
![Page 33: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/33.jpg)
Bank Example
![Page 34: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/34.jpg)
Important Details to Not e The state of channel c that is recorded must be
the sequence of messages sent along the channel before the sender’s state is recorded.
If n’ = m’, the recorded state of c must be the empty sequence.
If n’ > m’, the recorded state of c must be the (m’ + 1)st…… nth messages sent by p along c.
![Page 35: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/35.jpg)
Markers From these conditions we can devise an
algorithm by which q can record the state of the channel c.
Process p sends a marker after the nth message it sends along c and before sending any messages further along c.
The state of c is the sequence of messages received by q after q records its own state and before q sends the marker along c.
To ensure n > m, q must record its state after receiving a marker along c and before q receives further messages along c.
![Page 36: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/36.jpg)
Algorithm Outline Marker Sending Rule for a Process p
For each channel c, incident on and directed away from p:
p sends a marker along c after p records its state and before p sends further messages along c.
![Page 37: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/37.jpg)
Algorithm Outline Marker Receiving Rule for a Process q
On receiving a marker along a channel c: if (q hasn’t recorded its state)
record qq records c as the empty sequence
elseq records the state of c as the
sequence of messages received along c after q’s state was recorded and before q received the marker along c
![Page 38: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/38.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Properties Stability Detection
![Page 39: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/39.jpg)
Termination of the Algorithm The marker receiving and sending rules
guarantee that if a marker is received along every channel, then each process will record its state and the states of all incoming channels.
![Page 40: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/40.jpg)
Finite Time To ensure that the global state recording
algorithm terminates in finite time, each process ensures No marker remains forever in an incident input
channel. It records its state within finite time of initiation of
the algorithm.
![Page 41: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/41.jpg)
Finite Time If process records its state and there is a
channel from p to q, then q will record its state in finite time.
Termination in finite time is ensured if for every process q, q records its state or there is a path from p which records its state to q.
![Page 42: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/42.jpg)
Overview Introduction Model of a Distributed System Global-state Detection Algorithm
Motivation Termination
Stability Detection
![Page 43: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/43.jpg)
Stability Detection Motivation
It is a paradigm for many practical problems, such as distributed deadlock detection.
Can be defined as follows Input: A stable property of y Output: Boolean value definite with the property
(y(Si) definite) or (definite y(Sf)) where Si
represents the global state when initiated and Sf represents the global state when it is terminated.
![Page 44: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/44.jpg)
What this means Input of the algorithm is based on the function of
y. During execution of the algorithm the value y(S)
for any global state S may be determined by a process in the system.
With the output of the algorithm stored in the boolean value definite, we mean that Process p enters and thereafter remains in some special
state to signal that definite = true or false.
![Page 45: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/45.jpg)
Definite value Definite = true
Implies the stable property holds when the algorithm terminates.
Definite = false Implies the stable property doesn’t hold when the
algorithm is initiated.
![Page 46: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/46.jpg)
Solution begin
record a global state S*;definite := y(S*);
end. Correctness of the stability detection algorithm
S* is reachable from Si
Sf is reachable from S* (Theorem) y(S) y(S’) for all S’ reachable from S (definition of stable
property)
![Page 47: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/47.jpg)
Conclusion Distributed systems are applied to many
applications used today, especially in database applications.
Its important to know how each of the processes interact with each other and to know the global state of the system to ensure it is consistent.
![Page 48: Distributed Snapshots: Determining Global States of Distributed Systems](https://reader035.fdocuments.us/reader035/viewer/2022062315/568161d4550346895dd1d587/html5/thumbnails/48.jpg)
References Chandy, K. M. and Lamport L. Distributed
Snapshots: Determining Global States of Distributed Systems
http://www.eecs.ucf.edu/~dcm/Teaching/COT4810-Spring2011/Literature/ChandyAndLamport.pdf
Llewellyn M. Intro to OS: (Distributed Process Management)
http://www.cs.ucf.edu/courses/cop4600/sum2010/distributed%20process%20management%20-%20part%202%20(12).pdf