OS 18

7/29/2019 OS 18

1/38

Chapter 18

Distributed ProcessManagement

Dave BremerOtago Polytechnic, N.Z.2008, Prentice Hall

Operating Systems:Internals and Design Principles, 6/E

William Stallings

Roadmap

Process Migration

Distributed Global States

Distributed Mutual Exclusion

Distributed Deadlock

7/29/2019 OS 18

2/38

Process Migration

Transfer of sufficient amount of the stateof a process from one computer to another

The process executes on the targetmachine

Motivation

Process migration is desirable indistributed computing for several reasonsincluding:

Load sharing

Communications performance

Availability

Utilizing special capabilities

7/29/2019 OS 18

3/38

Load Sharing

Move processes from heavily loaded tolightly load systems

Significant improvements are possible

Must be careful that the communicationsoverhead does not exceed the performance

gained.

Communicationsperformance Processes that interact intensively can be

moved to the same node to reducecommunications cost

May be better to move process to the datathan vice versa

Especially when the data is larger than thesize of the process

7/29/2019 OS 18

4/38

Availability and

Special Capabilities Availability

Long-running process may need to movebecause of faults or down time

OS must have advance notice of fault

Utilizing special capabilities

Process can take advantage of unique

hardware or software capabilities

Migration issues

For process migration to work we need tosatisfy a few issues:

Who initiates the migration?

What is involved in a Migration?

What portion of the process is migrated?

What happens to outstanding messages andsignals?

7/29/2019 OS 18

5/38

Who Initiates Migration?

Depends on the goal or reason formigration

OS initiates

if the goal is load balancing.

May be transparent to process

Process initiates

If the goal is to access a particular resource Process must be aware of the distributed

system

What is Involvedin Migration? Must destroy the process on the source

system and create it on the target system

Process movement, not replication.

Process image and process control blockand any links must be moved

7/29/2019 OS 18

6/38

Example of

Process Migration

What is migrated?

Moving the process control block is simple

Several strategies exist for moving theaddress space and data including:

Eager (All)

Precopy

Eager (dirty) Copy-on-reference

Flushing

7/29/2019 OS 18

7/38

Eager (all)

Transfer entire address space

No trace of process is left behind

If address space is large and if the process

does not need most of it, then this approachmy be unnecessarily expensive (takingminutes)

Checkpoint/restart capability is useful.

Precopy

Process continues to execute on thesource node while the address space iscopied

Pages modified on the source during precopyoperation have to be copied a second time

Reduces the time that a process is frozen and

cannot execute during migration

7/29/2019 OS 18

8/38

Eager (dirty)

Transfer only that portion of the addressspace that is in main memory and havebeen modified

Any additional blocks of the virtual addressspace are transferred on demand

The source machine is involvedthroughout the life of the process

Maintains page and/or segment table entries.

Copy-on-reference

Variation of Eager(All)

Pages are only brought over whenreferenced

Has lowest initial cost of process migration

7/29/2019 OS 18

9/38

Flushing

Pages are cleared from main memory byflushing dirty pages to disk

Pages are accessed as needed from disk

Relieves the source of holding any pages ofthe migrated process in main memory

Choosing a Strategy

If the process is not using much addressspace while on the target machine thenbetter to use

Eager (All)

Precopy

Eager (dirty)

Otherwise use

Copy-on-reference

Flushing

7/29/2019 OS 18

10/38

What Happens to

Messages and Signals? Need to have a way to temporarily store

outstanding messages and signals duringthe migration activity and then direct themto the new destination.

May need to maintain forwarding details at theinitial site to ensure outstanding messagesand signals get through

Decision to Migrate

Decision to migrate may be made by asingle entity

OS may decide based on load monitoringmodule

Process may decide based on resource

needs

Some systems let the target systemparticipate in the decision.

Negotiated migration

7/29/2019 OS 18

11/38

Migration by Negotiation

Charlotte is an example system.

Migration policy is responsibility of aStarter utility

Starter utility is also responsible for long-termscheduling and memory allocation

Migration decision must be reached jointlyby two Starter processes

one on the source and one on the destination

Negotiation of ProcessMigration

7/29/2019 OS 18

12/38

Eviction

Destination system may refuse to acceptthe migration of a process to itself

If a workstation is idle, process may havebeen migrated to it

Once the workstation is active, it may benecessary to evict the migrated processes toprovide adequate response time

Preemptive vs.Nonpreemptive Transfers Previous points related to preemptive

processes

Process has been created and may havebegun executing

Nonpreemptive process transfers involveprocesses which have not yet begun

So have no state to transfer

Useful in load balancing.

7/29/2019 OS 18

13/38

Roadmap

Process Migration




Distributed Goal State

Operating system cannot know the currentstate of all process in the distributedsystem

A process can only know the current stateof all processes on the local system

Remote processes only know stateinformation that is received by messages

7/29/2019 OS 18

14/38

Example

Bank account is distributed over twobranches

The total amount in the account is the sumat each branch

At 3 PM the account balance isdetermined

Messages are sent to request theinformation

Example 1

7/29/2019 OS 18

15/38

Example 2

If at the time of balance determination, thebalance from branch A is in transit tobranch B

The result is a false reading

Example 3

All messages in transit must be examinedat time of observation

Total consists of balance at both branchesand amount in message

7/29/2019 OS 18

16/38

Some Terms

Channel

Exists between two processes if theyexchange messages

State

Sequence of messages that have been sentand received along channels incident with theprocess

Some Terms

Snapshot

Records the state of a process

Global state

The combined state of all processes

Distributed Snapshot

A collection of snapshots, one for eachprocess

7/29/2019 OS 18

17/38

Inconsistent

Global State

Consistent Global State

7/29/2019 OS 18

18/38

Distributed

Snapshot Algorithm Records a consistent global state

Assumes messages are delivered in orderthat they were sent

And no messages are lost

TCP satisfies requirements

Uses a special control message

Marker

Roadmap

Process Migration




7/29/2019 OS 18

19/38

Concurrent

Process Problems Two main problems with concurrent

processing are:

Mutual Exclusion

Deadlock

Ch5/6 looked at issues within a singlecomputer

Different issues arise when the processors do

not share common memory or clock

Critical Section

Non-sharable resources are criticalresources

Code accessing it is a critical section of aprogram

Mutual exclusion must be enforced:

only one process at a time is allowed in itscritical section

7/29/2019 OS 18

20/38

Distributed Mutual

Exclusion Requirements1. Mutual exclusion must be enforced

2. A process halting in noncritical sectionmust not interfere with other processes

3. A process requiring access to a criticalsection should not be delayed indefinitely

Deadlock and starvation are not allowed

Distributed MutualExclusion Requirements4. If no process is in a critical section, any

process may enter without delay

5. No assumption about the number ofprocessors

6. A process remains inside its critical

section for a finite time only.

7/29/2019 OS 18

21/38

Mutual Exclusion

Centralized Algorithmfor Mutual Exclusion One node is designated as the control

node

This node control access to all sharedobjects

Only the control node makes resource-

allocation decision

7/29/2019 OS 18

22/38

Properties of a

Centralized Method Only the control node makes resource-

allocation decisions.

All information is located in the controlnode

If control node fails, mutual exclusion breaksdown

Distributed Algorithm

1. All nodes have equal amount ofinformation, on average

2. Each node has only a partial picture ofthe total system and must makedecisions based on this information

3. All nodes bear equal responsibility for thefinal decision

7/29/2019 OS 18

23/38

Distributed Algorithm

4. All nodes expend equal effort, onaverage, in effecting a final decision

5. Failure of a node, in general, does notresult in a total system collapse

6. There exits no system-wide commonclock with which to regulate the time ofevents

Ordering of Events in aDistributed System Events must be order to ensure mutual

exclusion and avoid deadlock

But clocks are not synchronized

Communication delays affect timingdecisions

7/29/2019 OS 18

24/38

Timestamping

Each system on the network maintains acounter which functions as a clock

Each site has a numerical identifier

When a message is received, thereceiving system sets is clock to one morethan the maximum of its current value andthe incoming timestamp

Timestampingis Ordering Messages Each system ihas a counter Ci functioning

as a clock

Each message increments Ci

Messages sent in the form (m, Ti, i) where

m = contents of the message

Ti= timestamp for this message, set to equalCi i = numerical identifier of this system in the

distributed system

7/29/2019 OS 18

25/38

Receiving System

The receiving system j increments itsclock to one more than the maximum of itscurrent value and the incoming timestamp:

Cj 1 + max[Cj, Ti ]

If i sends x, and j sends y then x comesbefore y

1.If Ti < Tj or

2.If Ti = Tj AND i < j

Timestamping

7/29/2019 OS 18

26/38

Timestamping

Distributed Queue

An early approach to distributed mutualexclusion uses a distributed queue

Relied on several assumptions

1. Each node (1..N) has 1 process to managemutual exclusion

2. Messages arrive in the same order as sent(this assumption relaxed in later versions)

3. Every message is successfully received

4. Network is fully connected

7/29/2019 OS 18

27/38

State Diagram

Token-Passing Approach

Pass a token among the participatingprocesses

The token is an entity that at any time isheld by one process

The process holding the token may enter itscritical section without asking permission

When a process leaves its critical section, itpasses the token to another process

7/29/2019 OS 18

28/38

Token-Passing

Token Passing

7/29/2019 OS 18

29/38

Roadmap

Process Migration




Deadlock

The permanent blocking of a set ofprocesses that either compete for systemresources or communicate with oneanother.

Definition valid for distributed systems asmuch as for single

More complex as no node as knowledge ofthe state of the overall system

7/29/2019 OS 18

30/38

Types of Deadlock

Those related to resource allocation

Multiple processes attempt to access thesame resource

Communication of messages

Each process in a set is waiting for amessage from another process in the set,

No process sends a message.

Deadlock inResource Allocation Deadlock required:

Mutual exclusion

Hold and wait

No preemption

Circular wait

In a distributed system there is no controlprocess with an up-to-date knowledge ofthe global state of the system.

7/29/2019 OS 18

31/38

Phantom Deadlock

Deadlock Prevention

Circular-wait condition can be preventedby defining a linear ordering of resourcetypes

Hold-and-wait condition can be preventedby requiring that a process request all ofits required resource at one time, andblocking the process until all requests canbe granted simultaneously

7/29/2019 OS 18

32/38

Deadlock

Prevention Methods

Deadlock Avoidance

Distributed deadlock avoidance isimpractical

Every node must keep track of the globalstate of the system

The process of checking for a safe global

state must be mutually exclusive

Checking for safe states involvesconsiderable processing overhead

7/29/2019 OS 18

33/38

Deadlock Detection

Processes are allowed to obtain freeresources as they wish, and the existenceof a deadlock is determined after the fact.

Each site only knows about its ownresources

But deadlock may involve distributedresources.

DeadlockDetection Strategies

7/29/2019 OS 18

34/38

Deadlock Detection

Centralized control

one site is responsible for deadlock detection

Hierarchical control

sites organized in a tree structure

Distributed control

all processes cooperate in the deadlockdetection function

DistributedDeadlock Detection

7/29/2019 OS 18

35/38


Detection States

Deadlock inMessage Communication Types

Mutual Waiting deadlock

Unavailability of Message Buffers

7/29/2019 OS 18

36/38

Mutual Waiting

Deadlock Each process in a group is waiting for

another member of the group

And no messages are in transit

Unavailability ofMessage Buffers Common in packet-switching networks

Simplest forms

Store and forward deadlock

For each node, the queue to the adjacentnode in one direction is full with packets

destined for the next node beyond

7/29/2019 OS 18

37/38

Direct Store-and-Forward

Deadlock

Indirect Store-and-ForwardDeadlock

7/29/2019 OS 18

38/38

Structured Buffer Pool

Communication Deadlock

OS 18

Documents

Transcript of OS 18