OS 18

download OS 18

of 38

Transcript of OS 18

  • 7/29/2019 OS 18

    1/38

    Chapter 18

    Distributed ProcessManagement

    Dave BremerOtago Polytechnic, N.Z.2008, Prentice Hall

    Operating Systems:Internals and Design Principles, 6/E

    William Stallings

    Roadmap

    Process Migration

    Distributed Global States

    Distributed Mutual Exclusion

    Distributed Deadlock

  • 7/29/2019 OS 18

    2/38

    Process Migration

    Transfer of sufficient amount of the stateof a process from one computer to another

    The process executes on the targetmachine

    Motivation

    Process migration is desirable indistributed computing for several reasonsincluding:

    Load sharing

    Communications performance

    Availability

    Utilizing special capabilities

  • 7/29/2019 OS 18

    3/38

    Load Sharing

    Move processes from heavily loaded tolightly load systems

    Significant improvements are possible

    Must be careful that the communicationsoverhead does not exceed the performance

    gained.

    Communicationsperformance Processes that interact intensively can be

    moved to the same node to reducecommunications cost

    May be better to move process to the datathan vice versa

    Especially when the data is larger than thesize of the process

  • 7/29/2019 OS 18

    4/38

    Availability and

    Special Capabilities Availability

    Long-running process may need to movebecause of faults or down time

    OS must have advance notice of fault

    Utilizing special capabilities

    Process can take advantage of unique

    hardware or software capabilities

    Migration issues

    For process migration to work we need tosatisfy a few issues:

    Who initiates the migration?

    What is involved in a Migration?

    What portion of the process is migrated?

    What happens to outstanding messages andsignals?

  • 7/29/2019 OS 18

    5/38

    Who Initiates Migration?

    Depends on the goal or reason formigration

    OS initiates

    if the goal is load balancing.

    May be transparent to process

    Process initiates

    If the goal is to access a particular resource Process must be aware of the distributed

    system

    What is Involvedin Migration? Must destroy the process on the source

    system and create it on the target system

    Process movement, not replication.

    Process image and process control blockand any links must be moved

  • 7/29/2019 OS 18

    6/38

    Example of

    Process Migration

    What is migrated?

    Moving the process control block is simple

    Several strategies exist for moving theaddress space and data including:

    Eager (All)

    Precopy

    Eager (dirty) Copy-on-reference

    Flushing

  • 7/29/2019 OS 18

    7/38

    Eager (all)

    Transfer entire address space

    No trace of process is left behind

    If address space is large and if the process

    does not need most of it, then this approachmy be unnecessarily expensive (takingminutes)

    Checkpoint/restart capability is useful.

    Precopy

    Process continues to execute on thesource node while the address space iscopied

    Pages modified on the source during precopyoperation have to be copied a second time

    Reduces the time that a process is frozen and

    cannot execute during migration

  • 7/29/2019 OS 18

    8/38

    Eager (dirty)

    Transfer only that portion of the addressspace that is in main memory and havebeen modified

    Any additional blocks of the virtual addressspace are transferred on demand

    The source machine is involvedthroughout the life of the process

    Maintains page and/or segment table entries.

    Copy-on-reference

    Variation of Eager(All)

    Pages are only brought over whenreferenced

    Has lowest initial cost of process migration

  • 7/29/2019 OS 18

    9/38

    Flushing

    Pages are cleared from main memory byflushing dirty pages to disk

    Pages are accessed as needed from disk

    Relieves the source of holding any pages ofthe migrated process in main memory

    Choosing a Strategy

    If the process is not using much addressspace while on the target machine thenbetter to use

    Eager (All)

    Precopy

    Eager (dirty)

    Otherwise use

    Copy-on-reference

    Flushing

  • 7/29/2019 OS 18

    10/38

    What Happens to

    Messages and Signals? Need to have a way to temporarily store

    outstanding messages and signals duringthe migration activity and then direct themto the new destination.

    May need to maintain forwarding details at theinitial site to ensure outstanding messagesand signals get through

    Decision to Migrate

    Decision to migrate may be made by asingle entity

    OS may decide based on load monitoringmodule

    Process may decide based on resource

    needs

    Some systems let the target systemparticipate in the decision.

    Negotiated migration

  • 7/29/2019 OS 18

    11/38

    Migration by Negotiation

    Charlotte is an example system.

    Migration policy is responsibility of aStarter utility

    Starter utility is also responsible for long-termscheduling and memory allocation

    Migration decision must be reached jointlyby two Starter processes

    one on the source and one on the destination

    Negotiation of ProcessMigration

  • 7/29/2019 OS 18

    12/38

    Eviction

    Destination system may refuse to acceptthe migration of a process to itself

    If a workstation is idle, process may havebeen migrated to it

    Once the workstation is active, it may benecessary to evict the migrated processes toprovide adequate response time

    Preemptive vs.Nonpreemptive Transfers Previous points related to preemptive

    processes

    Process has been created and may havebegun executing

    Nonpreemptive process transfers involveprocesses which have not yet begun

    So have no state to transfer

    Useful in load balancing.

  • 7/29/2019 OS 18

    13/38

    Roadmap

    Process Migration

    Distributed Global States

    Distributed Mutual Exclusion

    Distributed Deadlock

    Distributed Goal State

    Operating system cannot know the currentstate of all process in the distributedsystem

    A process can only know the current stateof all processes on the local system

    Remote processes only know stateinformation that is received by messages

  • 7/29/2019 OS 18

    14/38

    Example

    Bank account is distributed over twobranches

    The total amount in the account is the sumat each branch

    At 3 PM the account balance isdetermined

    Messages are sent to request theinformation

    Example 1

  • 7/29/2019 OS 18

    15/38

    Example 2

    If at the time of balance determination, thebalance from branch A is in transit tobranch B

    The result is a false reading

    Example 3

    All messages in transit must be examinedat time of observation

    Total consists of balance at both branchesand amount in message

  • 7/29/2019 OS 18

    16/38

    Some Terms

    Channel

    Exists between two processes if theyexchange messages

    State

    Sequence of messages that have been sentand received along channels incident with theprocess

    Some Terms

    Snapshot

    Records the state of a process

    Global state

    The combined state of all processes

    Distributed Snapshot

    A collection of snapshots, one for eachprocess

  • 7/29/2019 OS 18

    17/38

    Inconsistent

    Global State

    Consistent Global State

  • 7/29/2019 OS 18

    18/38

    Distributed

    Snapshot Algorithm Records a consistent global state

    Assumes messages are delivered in orderthat they were sent

    And no messages are lost

    TCP satisfies requirements

    Uses a special control message

    Marker

    Roadmap

    Process Migration

    Distributed Global States

    Distributed Mutual Exclusion

    Distributed Deadlock

  • 7/29/2019 OS 18

    19/38

    Concurrent

    Process Problems Two main problems with concurrent

    processing are:

    Mutual Exclusion

    Deadlock

    Ch5/6 looked at issues within a singlecomputer

    Different issues arise when the processors do

    not share common memory or clock

    Critical Section

    Non-sharable resources are criticalresources

    Code accessing it is a critical section of aprogram

    Mutual exclusion must be enforced:

    only one process at a time is allowed in itscritical section

  • 7/29/2019 OS 18

    20/38

    Distributed Mutual

    Exclusion Requirements1. Mutual exclusion must be enforced

    2. A process halting in noncritical sectionmust not interfere with other processes

    3. A process requiring access to a criticalsection should not be delayed indefinitely

    Deadlock and starvation are not allowed

    Distributed MutualExclusion Requirements4. If no process is in a critical section, any

    process may enter without delay

    5. No assumption about the number ofprocessors

    6. A process remains inside its critical

    section for a finite time only.

  • 7/29/2019 OS 18

    21/38

    Mutual Exclusion

    Centralized Algorithmfor Mutual Exclusion One node is designated as the control

    node

    This node control access to all sharedobjects

    Only the control node makes resource-

    allocation decision

  • 7/29/2019 OS 18

    22/38

    Properties of a

    Centralized Method Only the control node makes resource-

    allocation decisions.

    All information is located in the controlnode

    If control node fails, mutual exclusion breaksdown

    Distributed Algorithm

    1. All nodes have equal amount ofinformation, on average

    2. Each node has only a partial picture ofthe total system and must makedecisions based on this information

    3. All nodes bear equal responsibility for thefinal decision

  • 7/29/2019 OS 18

    23/38

    Distributed Algorithm

    4. All nodes expend equal effort, onaverage, in effecting a final decision

    5. Failure of a node, in general, does notresult in a total system collapse

    6. There exits no system-wide commonclock with which to regulate the time ofevents

    Ordering of Events in aDistributed System Events must be order to ensure mutual

    exclusion and avoid deadlock

    But clocks are not synchronized

    Communication delays affect timingdecisions

  • 7/29/2019 OS 18

    24/38

    Timestamping

    Each system on the network maintains acounter which functions as a clock

    Each site has a numerical identifier

    When a message is received, thereceiving system sets is clock to one morethan the maximum of its current value andthe incoming timestamp

    Timestampingis Ordering Messages Each system ihas a counter Ci functioning

    as a clock

    Each message increments Ci

    Messages sent in the form (m, Ti, i) where

    m = contents of the message

    Ti= timestamp for this message, set to equalCi i = numerical identifier of this system in the

    distributed system

  • 7/29/2019 OS 18

    25/38

    Receiving System

    The receiving system j increments itsclock to one more than the maximum of itscurrent value and the incoming timestamp:

    Cj 1 + max[Cj, Ti ]

    If i sends x, and j sends y then x comesbefore y

    1.If Ti < Tj or

    2.If Ti = Tj AND i < j

    Timestamping

  • 7/29/2019 OS 18

    26/38

    Timestamping

    Distributed Queue

    An early approach to distributed mutualexclusion uses a distributed queue

    Relied on several assumptions

    1. Each node (1..N) has 1 process to managemutual exclusion

    2. Messages arrive in the same order as sent(this assumption relaxed in later versions)

    3. Every message is successfully received

    4. Network is fully connected

  • 7/29/2019 OS 18

    27/38

    State Diagram

    Token-Passing Approach

    Pass a token among the participatingprocesses

    The token is an entity that at any time isheld by one process

    The process holding the token may enter itscritical section without asking permission

    When a process leaves its critical section, itpasses the token to another process

  • 7/29/2019 OS 18

    28/38

    Token-Passing

    Token Passing

  • 7/29/2019 OS 18

    29/38

    Roadmap

    Process Migration

    Distributed Global States

    Distributed Mutual Exclusion

    Distributed Deadlock

    Deadlock

    The permanent blocking of a set ofprocesses that either compete for systemresources or communicate with oneanother.

    Definition valid for distributed systems asmuch as for single

    More complex as no node as knowledge ofthe state of the overall system

  • 7/29/2019 OS 18

    30/38

    Types of Deadlock

    Those related to resource allocation

    Multiple processes attempt to access thesame resource

    Communication of messages

    Each process in a set is waiting for amessage from another process in the set,

    No process sends a message.

    Deadlock inResource Allocation Deadlock required:

    Mutual exclusion

    Hold and wait

    No preemption

    Circular wait

    In a distributed system there is no controlprocess with an up-to-date knowledge ofthe global state of the system.

  • 7/29/2019 OS 18

    31/38

    Phantom Deadlock

    Deadlock Prevention

    Circular-wait condition can be preventedby defining a linear ordering of resourcetypes

    Hold-and-wait condition can be preventedby requiring that a process request all ofits required resource at one time, andblocking the process until all requests canbe granted simultaneously

  • 7/29/2019 OS 18

    32/38

    Deadlock

    Prevention Methods

    Deadlock Avoidance

    Distributed deadlock avoidance isimpractical

    Every node must keep track of the globalstate of the system

    The process of checking for a safe global

    state must be mutually exclusive

    Checking for safe states involvesconsiderable processing overhead

  • 7/29/2019 OS 18

    33/38

    Deadlock Detection

    Processes are allowed to obtain freeresources as they wish, and the existenceof a deadlock is determined after the fact.

    Each site only knows about its ownresources

    But deadlock may involve distributedresources.

    DeadlockDetection Strategies

  • 7/29/2019 OS 18

    34/38

    Deadlock Detection

    Centralized control

    one site is responsible for deadlock detection

    Hierarchical control

    sites organized in a tree structure

    Distributed control

    all processes cooperate in the deadlockdetection function

    DistributedDeadlock Detection

  • 7/29/2019 OS 18

    35/38

    Distributed Deadlock

    Detection States

    Deadlock inMessage Communication Types

    Mutual Waiting deadlock

    Unavailability of Message Buffers

  • 7/29/2019 OS 18

    36/38

    Mutual Waiting

    Deadlock Each process in a group is waiting for

    another member of the group

    And no messages are in transit

    Unavailability ofMessage Buffers Common in packet-switching networks

    Simplest forms

    Store and forward deadlock

    For each node, the queue to the adjacentnode in one direction is full with packets

    destined for the next node beyond

  • 7/29/2019 OS 18

    37/38

    Direct Store-and-Forward

    Deadlock

    Indirect Store-and-ForwardDeadlock

  • 7/29/2019 OS 18

    38/38

    Structured Buffer Pool

    Communication Deadlock