Post on 27-Nov-2014
INDEX
What is Distributed Operating System ?
Evolution
Distributed Communicating System Models
Why are so popular?
Issues in designing DOS
Distributed Computing Environment
1.1 What is a Distributed Operating System
Advancement in microelectronic technology and
communication technology result increase the use of
multiple processors instead of single processor.
Multiple processors architectures are two types:
Tightly Coupled Systems
Known as Parallel Processing System
Single system wide primary memory
Loosely Coupled Systems
Known as Distributed Computing System
No shared memory, each processor has local
memory
Fig 1.1 Difference between tightly coupled and loosely coupled systems
Distributed computing system is a collection of
processors interconnected by a communication
network in which each processor has its own local
memory and other peripherals and the
communication between any two processors of the
system takes place by message passing over the
communication network.
Its own resources are local, and the other
processors are remote.
A processors and its resources are referred to as a
node or site or machine of the distributed
computing system.
1.2 Evolution of Distributed Computing System
Earlier computers were very expensive, large in size.
Job setup time was really a problem on those days.
During 1950/1960, advancement in technology introduced
batch processing.
Batching similar jobs automatic sequencing of jobs, off-line
processing by buffering and spooling and
multiprogramming increased CPU utilization.
But still multiple users can’t directly interact with the
computer system & share their resources.
In 1970, Time-Sharing concept was introduced.
Now, multiple users (computers) could simultaneously
execute interactive jobs and share the resources of the
computer system.
Parallel advancement in hardware allowed reduction in
size and increase in the processing speed of computer,
more processing capability. These systems were called
minicomputers.
Advent of time-sharing systems was the first step toward
distributed computing system b’cus it provides sharing of
resources and accessing of remote computers.
The merging of computer and networking technologies
gave birth to Distributed computing System in late 1970.
1.3 Distributed Computing System Models
Various models of Distributed Computing System are Minicomputer Workstation Workstation server Processor pool hybrid
1.3.1 Minicomputer Model
It is an extension of centralized time-sharing system.
In this model few minicomputers are interconnected by a
network.
Each such minicomputer has multiple users logged on to it.
Several interactive terminals are connected to each
minicomputer.
Each user is logged on to one specific computer & the network
allows a user to access remote resources that are available on
some machine other than one on to which the user is logged in.
This model is useful when resource sharing with remote machine
is desired.
E.g. ARPAnet
Fig 1.2 A distributed computing system based on minicomputer model
1.3.2 Workstation Model
In this model, several workstations inter connected by a
communication network.
Each workstation has its own disk and serving as a single
user computer.
This model is used to interconnect workstations by a high-
speed LAN so that idle workstations may be used to process
jobs of users who are logged onto other workstations and do
not have sufficient power at their own workstations to get
their jobs processed efficiently.
Fig 1.3 A distributed computing system based on workstation model
In this model, user logs onto his own workstation. When system finds that
the user’s workstation doesn’t have sufficient processing power for
submitted job, it transfers one or more of the processes from the user’s
workstation to some other workstation that is currently idle and gets the
process executed there, and finally the result of execution is returned to
the user’s workstation.
E.g. The Sprite system and experimental system developed at Xerox
PARC.
Major issues with this model
How does the system find an idle workstation?
How is process migration
What happen if we migrate process when it is busy?
For third issue, solution
Allow remote process share resources of workstation
Kill the remote process
Migrate remote process back to its home workstation.
1.3.3 Workstation-Server Model
Workstation Model is a network of personal workstations, each with
its own disk and local file system.
A workstation with its own local disk is called diskfull workstation
and a workstation without a local disk is called diskless workstation.
A distributed system which use workstation server model consists of
a few minicomputers and several workstations interconnected by a
communication network.
When diskless workstations are used, minicomputers are for file
system decisions and also for some special services like database
service, print service etc.
Fig 1.4 A distributed computing system based on workstation-server model
In this model, there are specialized machines for running server
process for managing and providing access to shared resources.
In this model, a user logs onto a workstation called his or her
home workstation. Normal computation activities required by the
user’s processes are performed at the user’s home workstation,
but requests for special service are sent to a server providing
that type of service that performs the use’s requested activity and
returns the result of request processing to the use’s workstation.
E.g. V-system
Some advantages over workstation model
It is much cheaper to use a few minicomputers with large, fast disks than a
large number of diskful workstation.
Backup and hardware maintenance are easier than with many small disks
scattered all over network.
In this model, all files are managed by servers, users have the flexibility to use
any workstation and access the files in the same manner irrespective of which
workstation the user is currently logged in.
In this model, request-response protocol is used to access the service of the
server machines. So, no process migration is required which is very complex.
A user has guaranteed response time because workstations are not used for
executing remote processes.
Request-Response Protocol
It is also known as client-server model of communication.
In this model, a client process (on workstation) sends a request to a
server process (minicomputer) for getting some service such as reading
a block of a file.
The server executes the request and sends back a reply to the client that
contains the result of request processing.
1.3.3 Processor-Pool Model In this model, processors are pooled together to be shared by the
users as needed.
The pool consists of a large number of microcomputers and minicomputers attached to the network.
Each processor has its own memory to load and run a system program or an application program of distributed computing system.
The processors in the pool have no terminals attached directly to them, and users access the system from terminals that are attached to the network via special devices.
A special server (called a run server) manages and allocates the processors in the pool to different users on demand basis.
Fig 1.5 A distributed computing system based on processor-pool model
Here no concept of home machine.
It allows better utilization of the available processing power of a distributed computing system.
Greater flexibility- that is system’s service can be easily expanded without the need to install any more computers.
E.g. Amoeba and the Cambridge Distributed Computing System.
Not suitable for high performance interactive application.
1.3.3 Hybrid Model Out of four model, workstation-server model is widely used
model because large numbers of computer users perform task such as editing jobs, sending mails and executing small programs.
Processor-pool model is also useful for massive computation.
A hybrid model is a combination of these two.
It is the workstation-server model but with a pool of processors.
Processors in the pool can be dynamically allocated for computations that are too large for workstations or that require several computers concurrently for efficient execution.
1.4 Why are Distributed computing systems gaining popularity ?
The two main demerits of DCS is complexity and the difficulty of building distributed system.
But the advantages of DCS overcome all disadvantages.
They are
Inherently Distributed Applications
Information Sharing among Distributed Users
Resource Sharing
Better price-performance ratio
Shorter Response Times and higher throughput
Higher Reliability
Extensibility
Better Flexibility
1.4.1 Inherently Distributed Applications
Several applications are inherently distributed.
E.g an employee database of a nationwide organization, the data
for an employee are generated at the employee’s branch office,
and the global need is to view the entire database, also local need
for frequent and immediate access to locally generated data at
each branch office.
Some other examples are online reservation, online banking etc.
1.4.2 Information Sharing among distributed users
DS provides efficient person-person communication facility by
sharing information over great distance.
The users are geographically separated from each other, they can
work in cooperation by transferring file of the project, logging
onto each other’s remote computers to run programs, and
exchanging messages by email.
A group of users uses distributed computing system to
cooperatively work is known as “Computer-Supported
Cooperative Working (CSCW)”.
Using DS, Both hardware and software resources can easily be shared.
Hardware resources include printers, scanners, hard-disks etc.
Software resources include files, database etc.
1.4.3 Resource Sharing
1.4.4 Better Price-Performance Ratio
With increasing power and reduction in the price of processors,
increasing speed of networks, a DS potentially have better price-
performance ratio.
Another reason for more cost effective price-performance ratio is
good sharing of information/resources among multiple
computers.
1.4.5 Shorter Response Time
Two most important performance matrices are response time and
throughput.
In DS, multiple processors are utilize in such a way, so that they
can provide response in short time period and with large amount
of output.
DS with fast network is increasingly being used as parallel
computers to solve single complex problem.
To improve performance, DS use load distribution techniques.
1.4.6 Higher Reliability
Reliability refers to the degree of tolerance against errors and component failures in a system.
A reliable system prevents loss of information even in the event of component failures.
The multiplicity of storage devices and processors in DS allows maintenance of multiple copies of critical information within the system and execution of computation to protect them against catastrophic failures.
An important aspect of reliability is availability which means the fraction of time for which a system is available for use.
In DS, few parts of the system can be down without interrupting the jobs of the users.
In processor pool model, if some of the processors are down at any time, the system can continue to function normally.
In DS, it is possible to easily extend power and functionality of
the system by simply adding additional resources
(hardware/software).
Properly designed DS that have the property of extensibility and
incremental growth are called Open Distributed System.
1.4.7 Extensibility and Incremental Growth
A Distributed computing system may have a pool of different types of computers.
Different types of computers are usually more suitable for performing different types of computation.
1.4.8 Better flexibility
1.5 What is Distributed Operating System
It is a program that controls the resources of a computer system
and provides its users with an interface or virtual machine that is
more convenient to use than the bare machine.
Two main tasks are:
To present users with a virtual machine that is easier to program than the
underlying hardware.
To manage the various resources of the system.
OS used for distributed environment are of two types
Network Operating System
Distributed Operating System
Three most common features for differentiation between two types of OS
1) System Image :
In case of Network Operating System, the users view the distributed computing
system as a collection of distinct machines connected by a communication
subsystem.
In case of Distributed OS, system hides the existences of multiple computers
and provides a single system image to its users.
In network OS, a user can run a job on any machine of the distributed
computing system, he/she is aware of the machine on which his/her job is
executed.
In Distributed OS, system dynamically and automatically allocates jobs to the
various machines of the system for processing.
With network OS, a user is required to know the location of a
resource to access it, and different sets of system calls have to be
used for accessing local and remote resources.
With Distributed OS, users need not keep track of the locations of
various resources for accessing them, and same set of system
calls is used for accessing both remote and local resources.
2) Autonomy
In network OS, each computer of systems has its own local OS and there is no
coordination at all among the computers except for the rule that when two
processes of different computers communicate with each other.
In this type of OS, all computers are work independently.
In distributed OS, there is a single system wide OS and each computer of the
distributed computing system runs a part of this global OS.
Here, all computers are work in close cooperation with each other for effective
& efficient utilization of the various resources.
Here, kernel (set of system calls) manages and controls the hardware of the
computer system to provide the facilities and resources that are accessed by
other programs.
3) Fault Tolerance
With Network OS, very little fault tolerance capability is there.
If 10% of machines are down, at least 10% of users are unable to work.
With Distributed OS, most of the users are mostly unaffected by the failed
machines and can continue to perform their work.
A distributed computing system that uses a network OS is
referred to as a Network System.
Whereas a distributed computing system that uses a distributed
OS is referred to as a true distributed System.
1.6 Issues in Designing a Distributed OS
Designing a distributed OS is very difficult.
It is designed with the assumption that complete information
about the system environment will never be available.
In distributed system, the resources are physically separated,
there is no common clock among the multiple processors,
delivery of messages is delayed and message could even be lost.
The designer of distributed OS has to focus on the following
design issues for good design.
Distributed system is more difficult to design as compared to centralized system.
Design Issues Transparency Reliability Flexibility Performance Scalability Heterogeneity Security Emulation of Existing Operating Systems
1.6.1 Transparency Main goals of a distributed Os is to make the existence of
multiple computers invisible and provide a single system image to all users.
All users have virtual uniprocessor. Eight forms of transparency are
Access Transparency
Location Transparency
Replication Transparency
Failure Transparency
Migration Transparency
Concurrency Transparency
Performance Transparency
Scaling Transparency
Access Transparency Users should not be able to recognize whether a resource (hardware /
software) is remote or local. User can access remote resource in the same way as local.
Location Transparency(1) Name transparency :
Name of resource should not reveal any hint as to the physical location of the resource.
Resource name must be unique system wide.
(2) User mobility :
No matter which machine a users is logged onto, he or she should be able to access a resource with the same name.
Replication Transparency All DOS have the provision to create replicas of files and other resources on
different nodes.
Here, both existence of replicas and replication activity should be transparent
to the users.
Also replication control decision such as number of copies, location of
replicas, creation/deletion time should be mad entirely automatically by the
system in a user-transparent manner.
Failure Transparency Users are not aware of partial failures in the system, such as a communication
link failure, a machine failure or a storage device crash.
Design of such system is practically impossible.
Migration Transparency The aim of this is to ensure that the movement of the object is handled
automatically by the system in a user-transparent manner.
Three important issues:
Migration decision such as which object should be moved from where to
where should be made automatically.
Migration from one node to another should not require the change in
object name.
Concurrency Transparency
For concurrency transparency, the resource sharing mechanism of the
distributed OS must have the following four properties:
An event-ordering property ensures that all access request to various
system resources are properly ordered to provide a consistent view to all
users of the system.
A mutual exclusion ensures that at any time at most one process accesses
a shared resource, which must not used simultaneous by multiple
processes if program operation is to be correct.
A no-starvation property ensures that if every process that is granted a
resource, which must not be used simultaneously by multiple process,
eventually release it, every request for that resources is eventually granted.
A no-deadlock property ensures that a situation will never occur in which competing processes prevent their mutual progress even though no single one requests more resources than available in the system.
Performance Transparency
The aim of this transparency is to allow the system to be automatically
reconfigured to improve performance, as loads very dynamically in the system.
The processing capability of the system should be uniformly distributed among
the currently available jobs in the system.
Scaling Transparency The aim of scaling transparency is to allow the system to expand in scale
without disrupting the activities of the users.
1.6.2 Reliability DOS is more reliable as compared to centralized OS because
there is existence of multiple copies of resources.
For higher reliability, the fault-handling mechanisms of DOS must
be designed properly to avoid faults, to tolerate faults and to
detect and recover from faults.
A fault is mechanical or algorithmic defect that may generate an error.
System failures are of two types
Fail-stop : The system stops functioning after
changing to a state in which its failure can be
detected.
Byzantine : The system continues to function but
produces wrong result.
Commonly used methods for dealing with above issues are Fault Avoidance
Fault Tolerance
Fault Detection & Recovery
Fault Avoidance It deals with designing the components of the system in such a way that the
occurrence of faults is minimized.
Fault Tolerance It is the ability of a system to continue functioning in the event of partial
system failure.
Some concepts to improve this fault tolerance are
Redundancy techniques : to avoid single point of failure by replicating critical hardware/software, so if one fails, the others can be used.
Distributed control : For better reliability, a DOS must employee distributed control mechanism. Multiple control server should be there.
Fault Detection & Recovery
This method uses hardware and software mechanism to determines the
occurrence of a failure and then to correct the system to a state acceptable for
continued operation.
Atomic transaction : It is a computation consisting of a collection of operations
that take place indivisibly in the presence of failures and concurrent
computations. That is either all of the operations are performed successfully or
none of them.
Stateless Servers : Stateless server does not depend on the history of serviced
requests between client & server. It means history doesn’t affects the execution
of the next service request.
Acknowledgement & timeout based retransmission of messages :
In DOS node or communication link failure may interrupt a
communication events. So there may be loss of message. So
receiver must send acknowledgement for every message.
Reliable IPC should also be able of detecting and handling lost
messages or duplicate message.
The main drawback by increasing reliability is potential loss of execution
time efficiency due to extra overhead involved in these techniques.
1.6.3 Flexibility The design of DOS should be flexible due to following reasons:
Ease of modification
Ease of enhancement
The most important design factor is the model used for kernel. Two models are there
Monolithic Kernel : In which most services like process management, memory management, file management etc. are provided by kernel.
Microkernel : Kernel is small nucleus of software that provides only minimal facilities. The only service provided by kernel is IPC, low-level device management, some memory management.
Fig 1.6 Modes of Kernel
Microkernel has advantages :
Microkernel is easy to design, implement, and install.
Since most of the services are implemented as user-level server
processes, it is easy to modify the design or add new services.
Disadvantage :
Sometimes poor performance
Since each server is an independent process having its own address
space. So the servers have to use some form of message based IPC
for communication.
Message passing between server processes and the microkernel
requires context switches, resulting additional performance
overhead.
1.6.4 Performance
Some design principles for better performance are as follows :
Batch if possible
Cache whenever possible
Minimize copying of data
Minimize network traffic
Take advantage of parallelism for multiprocessing
1.6.5 Scalability It means capability of a system to adapt to increased service load.
DS will grow with time since it is very common to add new machines or an entire
sub-network to the system to take care of increased workload.
Some principles to follow when designing DS
Avoid centralized entities
In DOS, use of centralized entities such as a single central file server,
single database makes the distributed system nonscalable.
Avoid centralized algorithms
This algorithm operates by collecting information from all nodes,
processing this information on a single node and then distributing the
result to other nodes.
Perform most operations on client workstation
1.6.6 Heterogeneity
A heterogeneous DS consists of interconnected sets of dissimilar hardware or
software systems.
It is more difficult to design heterogeneous systems.
In a heterogeneous DS, some form of data translation is necessary for interaction
between two incompatible users.
To reduce the complexity of translation process, some intermediate standard data
format should be declared.
1.6.7 Security The various resources of a computer system must be protected against destruction and
unauthorized access.
Providing security in DS is very difficult.
In DS, client-server model is there.
Additional requirement for security in dos
It should be possible for the sender of a message to know that the message was
received by the intended receiver.
It should be possible for the receiver of a message to know that the message was
sent by the intended sender.
It should be possible for both the sender and receiver of a message to be guaranteed
that the contents of the message were not changed while it was in transfer.
When client is sending message, an intruder may pretend to an authorized client or may
change the message contents during transmission.
Cryptography is only known practical method for dealing with these security.
1.6.8 Emulation of existing OS
For commercial success, it is important that a newly designed DOS be able to
emulate existing popular OS.
With this property, new s/w can be written using the system call interface of the
new OS to take full advantage of its special features of distribution.