A survey of permission-based distributed mutual exclusion ... · PDF fileA survey of...

23
A survey of permission-based distributed mutual exclusion algorithms P.C. Saxena a , J. Rai b, * a School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India b Department of Mathematics, P.G.D.A.V. College (Eve), University of Delhi, New Delhi, India Received 24 May 2002; received in revised form 8 October 2002; accepted 11 October 2002 Abstract The problem of mutual exclusion in distributed systems has attracted considerable attention over the last two decades. The mutual exclusion problem requires that, at a time, only one of the contending processes be allowed to enter its critical section (CS). A number of solutions have been provided to the mutual exclusion problem in distributed systems. Different algorithms have used different techniques to achieve mutual exclusion and have different performances. Depending on the technique used, these algorithms have been classified as token-based and permission-based algorithms. In this paper, we present a survey of various permission-based distributed mutual exclusion (PBDME) algorithms and their comparative performance. D 2002 Elsevier Science B.V. All rights reserved. Keywords: Distributed systems; Mutual exclusion; k-Mutual exclusion; Voting; Coteries 1. Motivation and problem description A distributed system is a collection of autonomous computers connected via a communication network. There is no common memory and processes commu- nicate through message passing. One of the most important purposes of the distributed systems is to provide an efficient and convenient environment for sharing of resources. They also provide computational speedup and better reliability. A thorough discussion on various concepts and design issues of distributed systems can be found in Refs. [10,23,60]. In a system consisting of a number of processes, each process has a segment of code, called a critical section, in which the process may be changing common variables, updating a table, writing a file and so on [86]. A distributed system may have thousands of data items and scores of databases residing in many widely dispersed sites. Moreover, many users can have simultaneous access to this data. Therefore, it is required that processes at differ- ent sites cooperate with one another and have some sort of coordination among them. In many applica- tions, there are critical operations that should not be performed by different processes concurrently. It may sometimes be required that a typical resource is used by only one process at a time. Or, if a data item is 0920-5489/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. doi:10.1016/S0920-5489(02)00105-8 * Corresponding author. Mailing address: C-3/319 C, SFS Flats, Pankha Road, Janak Puri, New Delhi 110058, India. Tel.: +91-11- 5532457. E-mail addresses: prem _ [email protected] (P.C. Saxena), [email protected] (J. Rai). www.elsevier.com/locate/csi Computer Standards & Interfaces 25 (2003) 159 – 181

Transcript of A survey of permission-based distributed mutual exclusion ... · PDF fileA survey of...

A survey of permission-based distributed

mutual exclusion algorithms

P.C. Saxenaa, J. Raib,*

aSchool of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, IndiabDepartment of Mathematics, P.G.D.A.V. College (Eve), University of Delhi, New Delhi, India

Received 24 May 2002; received in revised form 8 October 2002; accepted 11 October 2002

Abstract

The problem of mutual exclusion in distributed systems has attracted considerable attention over the last two decades. The

mutual exclusion problem requires that, at a time, only one of the contending processes be allowed to enter its critical section

(CS). A number of solutions have been provided to the mutual exclusion problem in distributed systems. Different algorithms

have used different techniques to achieve mutual exclusion and have different performances. Depending on the technique used,

these algorithms have been classified as token-based and permission-based algorithms. In this paper, we present a survey of

various permission-based distributed mutual exclusion (PBDME) algorithms and their comparative performance.

D 2002 Elsevier Science B.V. All rights reserved.

Keywords: Distributed systems; Mutual exclusion; k-Mutual exclusion; Voting; Coteries

1. Motivation and problem description

A distributed system is a collection of autonomous

computers connected via a communication network.

There is no common memory and processes commu-

nicate through message passing. One of the most

important purposes of the distributed systems is to

provide an efficient and convenient environment for

sharing of resources. They also provide computational

speedup and better reliability. A thorough discussion

on various concepts and design issues of distributed

systems can be found in Refs. [10,23,60].

In a system consisting of a number of processes,

each process has a segment of code, called a critical

section, in which the process may be changing

common variables, updating a table, writing a file

and so on [86]. A distributed system may have

thousands of data items and scores of databases

residing in many widely dispersed sites. Moreover,

many users can have simultaneous access to this

data. Therefore, it is required that processes at differ-

ent sites cooperate with one another and have some

sort of coordination among them. In many applica-

tions, there are critical operations that should not be

performed by different processes concurrently. It may

sometimes be required that a typical resource is used

by only one process at a time. Or, if a data item is

0920-5489/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.

doi:10.1016/S0920-5489(02)00105-8

* Corresponding author. Mailing address: C-3/319 C, SFS Flats,

Pankha Road, Janak Puri, New Delhi 110058, India. Tel.: +91-11-

5532457.

E-mail addresses: [email protected] (P.C. Saxena),

[email protected] (J. Rai).

www.elsevier.com/locate/csi

Computer Standards & Interfaces 25 (2003) 159–181

yhf
高亮

replicated at various sites, we expect that only one of

the sites updates it at a time. This gives rise to the

problem of mutual exclusion, which requires that

only one of the contending processes be allowed, at

a time, to enter its critical section (CS). Thus, mutual

exclusion is crucial for the design of distributed

systems.

An important application of distributed systems,

which uses mutual exclusion and needs special

mention, is in the field of replicated databases.

Replication is the maintenance of on-line copies of

data and other resources [23]. A replicated database

is a distributed database in which some data items

are stored redundantly at multiple sites [12]. In

replicated databases, a data item or a file is repli-

cated at a number of sites with independent failure

modes. This results in increased availability, better

fault-tolerance capability and improved response

time. However, maintaining the consistency of the

data is a major issue. For example, owing to node or

link failures, the network may partition into isolated

groups of nodes. In such a case, it is not desirable

that users at isolated groups update the database

independently. If a node is to perform updates, it

must ensure that no other node (in any group,

whatsoever) is doing this activity. Thus, mutual

exclusion is needed for updating files in replicated

databases. Protocols that control access to replicated

data and ensure data consistency in case of network

partitioning are called replica control protocols. A

transaction (an access request to data) is the basic

unit of user computation in a database. It executes in

three steps [18]: it reads a portion of the database

into a local workspace; then it performs some local

computation on it; and finally, it writes some values

into the database. Read and write operations per-

formed by a transaction are of interest to us. All

replica control protocols require that mutual exclu-

sion must be guaranteed between two write opera-

tions and a read and write operation. We have no

intention of going into the finer details of replicated

databases and related issues. A reference to repli-

cated databases and replica control protocols has

been made, simply to emphasize the role of mutual

exclusion in this important application of distributed

systems. A discussion of the issues involved in

maintaining the consistency of replicated database

systems, the basic techniques for managing repli-

cated data and their relative merits can be found in

Refs. [5,30,82].

Another important application of mutual exclu-

sion is in the field of distributed shared memory.

Distributed shared memory (DSM) is an abstraction

used for sharing data between computers that do

not share physical memory. Processes access dis-

tributed shared memory through reads and updates.

Any process may access the variables stored in the

DSM in the system. However, if more than one

process is allowed to update these variables, incon-

sistencies may arise. Therefore, it is required that

only one process is permitted to execute CS (a

fragment of code whose simultaneous execution by

more than a process may cause some sort of

inconsistency). Thus, applications based on DSM

may require mutual exclusion. A thorough discus-

sion on distributed shared memory and related

topics can be found in Refs. [10,23]. Other areas

where mutual exclusion is desirable include atomic

commitment.

The algorithms designed to ensure mutual exclu-

sion in distributed systems are termed distributed

mutual exclusion (DME) algorithms. A number of

distributed mutual exclusion algorithms have been

proposed. They have been classified as token-based

algorithms or permission-based algorithms depending

on the technique used to achieve mutual exclusion.

In this work, we present a review of permission-

based distributed mutual exclusion (PBDME) algo-

rithms that have emerged during the last two

decades or so. The organization of the paper is as

follows. In Section 2, we trace the history of

mutual exclusion which dates back to 1965. Section

3 deals with various performance metrics on the

basis of which various DME algorithms can be

compared and with the classification of DME

algorithms. In Section 4, we give the system model

which most of the DME algorithms use and in

Section 5, the concepts of logical clocks and time-

stamps as given by Lamport for ordering messages.

In Section 6, we discuss vote assignments and

coteries, the basic tools for enforcing mutual exclu-

sion in PBDME algorithms and related concepts.

Sections 7 and 8 have been devoted exclusively to

voting-based algorithms and coterie-based algo-

rithms, respectively. In Section 9, we briefly discuss

multiple-mutual exclusion or the k-mutual exclusion

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181160

yhf
高亮
yhf
高亮
yhf
高亮
yhf
高亮

problem and take account of some of the solutions

that have been proposed in order to solve the

problem. We conclude our discussion in Section

10 where we briefly mention various possible

choices a system designer has while trying to select

an algorithm which best suits the application on

hand. Each section has been divided into subsec-

tions so as to make the discussion comprehendible.

A large number of algorithms that have contributed

to the development of subject have been included.

We have not discussed the algorithms in detail.

Instead, we have tried to bring out the essential

features, which make a particular algorithm distinct

from others. We begin with a historical background

of mutual exclusion.

2. History of mutual exclusion

The origin of the mutual exclusion problem can

be traced back to 1965 when Dijkstra [29] described

and solved the mutual exclusion problem. Dijkstra

stated that any solution to the mutual exclusion

problem must satisfy 4 constraints. Dijkstra’s algo-

rithm guaranteed mutual exclusion and was dead-

lock-free, but it did not guarantee fairness. That is, it

was possible that a process seeking access to its

critical section waited indefinitely while others

entered and exited their critical sections frequently.

Knuth [47] proposed the first fair solution to the

mutual exclusion problem. In addition to the four

constraints specified by Dijkstra, Knuth’s algorithm

also satisfied:

(i) No assumption is made about the instructions or

the numbers of processors supported by the

machine except that basic instructions such as

read, write or test a memory location are executed

atomically.

(ii) No assumption is made about the execution speed

of the competing processes, except that it is

nonzero.

(iii) When a process is in the noncritical section, it

cannot prevent another from entering the critical

section.

(iv) It must not be possible for a process seeking access

to a critical section to be delayed indefinitely.

(v) A process attempting to enter its critical section

will be able to do so within a finite time.

Thereafter, a number of algorithms were proposed

which guaranteed mutual exclusion and were dead-

lock-free and fair. Each of these algorithms aimed at

improving performance in terms of synchronization

delay, the period of time between the instant a site

invokes mutual exclusion and the instant when it

enters CS. Some of these algorithms are De Bruijns’s

algorithm [11], Eisenberg and McGuire’s algorithm

[32] and Peterson’s algorithm [68]. All these algo-

rithms were designed for the centralized systems, the

systems possessing a central memory that all pro-

cesses can access simultaneously for reading and

writing. Moreover, their basic principle was the same:

they all used one or more shared variables to achieve

mutual exclusion, just similar to the case of single-

processor systems, where critical regions are protected

using semaphores, monitors and similar constructs.

An elegant discussion on the solutions of the mutual

exclusion problem in centralized systems and a lucid

account of the development of the subject can be

found in Ref. [70].

3. Distributed mutual exclusion algorithms

Due to lack of a common shared memory, the

problem of mutual exclusion becomes much more

complex in the case of distributed systems as com-

pared to the centralized systems and needs special

treatment. Of course, when we use the term ‘distrib-

uted system’, we mean message-passing networks in

which every process has a local memory and there is

no central controlling unit. The systems in which there

exists a central control unit (process) to which all sites

submit request messages for entry into CS and send

release messages after exiting CS are not included

because, in effect, such systems are closer to the

centralized systems. Moreover, in such systems, the

failure of a single process (the centralized king) leads

to the total failure of the system.

A number of algorithms have been proposed to

solve the mutual exclusion problem in distributed

systems. A good DME algorithm, besides providing

mutual exclusion, should take care that there are no

deadlocks and starvation does not occur. A deadlock

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 161

is a state of the system when no node is in its CS and

yet no requesting node can proceed to its CS. Star-

vation is said to occur when some node waits indef-

initely to enter its CS, even though other nodes are

entering and exiting their critical sections.

3.1. Performance metrics of distributed mutual

exclusion algorithms

The performance of a distributed mutual exclusion

algorithm can be evaluated in terms of a number of

metrics. Message complexity (MC) and synchroniza-

tion delay (D) are two parameters, which can be used to

compare the performance of various DME algorithms.

Message complexity of a DME algorithm is the

number of messages exchanged by a process per CS

entry. Synchronization delay is the average time delay

in granting CS, which is the period of time between the

instant a site invokes mutual exclusion and the instant

when it enters CS. The execution time of the instruc-

tions in the algorithm is assumed to be negligible as

compared to the message transition time. Other criteria

that can be used to compare the performance of DME

algorithms are fault tolerance and availability. The

fault tolerance of a DME algorithm is the maximal

number of nodes that can fail before it becomes

impossible for any node to enter its CS. Its availability

is the probability that the critical section can be entered

in the presence of failures. In fact, availability of a

DME algorithm is a measure of its fault tolerance. It is

probabilistic in nature.

Thus, a good DME algorithm must be safe (it shall

ensure mutual exclusion), live (the system should

make progress towards the execution of CS and a

deadlock situation shall not occur) and fair (it shall not

be biased against or in favour of a node and each

request shall eventually be satisfied). It should have

low message complexity and synchronization delay.

High fault tolerance and availability add to the qual-

ities of a good DME algorithm.

3.2. Classification of distributed mutual exclusion

algorithms

A number of DME algorithms have been proposed,

all aiming at improved performance with respect to one

performance metric or the other. Based on the techni-

que used, DME algorithms can be classified as token-

based algorithms and permission-based algorithms as

suggested by Raynal [73], or as token-based algorithms

and non-token-based algorithms as suggested by Sin-

ghal [81]. Moreover, these algorithms can be static or

dynamic, logical structure-based or broadcast-based. A

DME algorithm is static if it does not remember the

history of CS execution. In a static algorithm, nodes

need not keep information about the current state of the

system. On the other hand, in the case of a dynamic

algorithm, nodes keep track of the current state of the

system and the algorithm has an inbuilt mechanism to

make decision on the basis of this knowledge. In

logical structure-based algorithms, the sites in the

system are assumed to be arranged in a logical config-

uration like tree, ring, etc., and messages are passed

from one site to another along the edges of the logical

structure imposed. In the case of broadcast-based

algorithms, no such structure is assumed and the

requesting site sends messages to other sites in parallel,

i.e., the message is broadcasted.

Token-based algorithms achieve mutual exclusion

through a privilege message called a token, which is

shared among the sites. A site can enter CS if and only

if it is in possession of the token. Since the token is

unique, mutual exclusion is guaranteed. Fair schedul-

ing of token among competing sites, detecting the loss

of token and regenerating a unique token are some of

the major design issues of the token-based mutual

exclusion algorithms. Although token-based algo-

rithms are generally faster than the non-token-based

algorithms, produce lesser message traffic and are not

deadlock prone, their resiliency to failures is poor [37].

If the site holding the token fails, complex token-

regeneration protocols have to be executed, and when

this site recovers, it has to undergo a recovery phase

during which it is informed that the token it was

holding prior to its failure has been invalidated.

Token-based algorithms can be broadcast-based or

logical structure-based, static or dynamic. Some exam-

ples of token-based algorithms are Helary et al.’s [40]

and Suzuki and Kasami’s [89] algorithms (broadcast-

based, static), Singhal’s [78] and Yan et al.’s [100]

algorithms (broadcast-based, dynamic), Raymond’s

[72] and Neilson andMizuno’s [63] algorithms (logical

structure-based, static) and Chang et al.’s [26], Helary

et al.’s [39] and Naimi et al.’s [66] algorithms (logical

structure-based, dynamic). A token-based delay opti-

mal mutual exclusion algorithm for arbitrary network

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181162

topologies is proposed in Ref. [85]. A performance

comparison of token- and tree-based distributed mutual

exclusion algorithms can be found in Ref. [84]. A

simulation study for token-based DME algorithms

has been carried out by Chang Ref. [19].

Permission-based algorithms require rounds of

message exchange among the nodes to obtain the

permission to execute CS. The basic idea on which

permission-based algorithms are based is as follows:

When a process wants to enter its CS, it asks other

nodes for their permission. A process on receiving a

request grants its permission if it is not interested in

CS. If it is interested in CS, the priority of the incoming

request is established against its own request. Gener-

ally, priority decisions are made using timestamps.

Permission-based algorithms can be further subdi-

vided into voting-based algorithms and coterie-based

algorithms. In voting-based algorithms, each site in the

system is assigned a vote (a nonnegative integer). A

site seeking access to CS must obtain permission from

an appropriate number of nodes, i.e., a number of

nodes whose total votes constitute a majority of the

total number of votes assigned to the system. In

coterie-based algorithms, a collection of sets (of the

nodes of the system), called a coterie, is attached to the

system. A node seeking access to CS must obtain

permission from each and every node of a set from the

coterie. Each of these two categories may further be

subdivided into static and dynamic algorithms. Fig. 1

depicts the classification of DME algorithms that has

emerged during nearly two decades of the develop-

ment of the subject. We shall organize our discussion

on various DME algorithms as per this classification in

the sections to follow. We discuss vote assignments

and coteries and related concepts in detail in Section 6.

However, before we proceed further, we give, in the

next two sections, the system model which most of the

DME algorithms use and the concepts of logical clocks

and timestamps as given by Lamport for ordering

messages.

4. System model

We present here a model of the system, which, in

general, has been used by almost every DME algo-

rithm. A distributed system of N nodes consists of N

geographically dispersed autonomous sites connected

via a communication network. The sites communi-

cate only through message passing and do not share

a common memory. The communication network is

assumed to be logically fully connected, i.e., every

node can communicate with every other node. The

communication medium is reliable, i.e., all messages

sent are eventually delivered and the sites do not

crash. Message propagation delay is finite but unpre-

dictable. Messages between any two processes are

assumed to arrive in the order they are sent. There is

no global clock. Byzantine failures [57] do not occur

and failures are fail-stop [93].

5. Message ordering

The order of events occurring in different processes

can be very important in certain distributed applica-

tions. However, clocks in different computers may

count time at different rates and may thus diverge.

Although many synchronization techniques exist,

clocks across a distributed system cannot be perfectly

synchronized. Therefore, we cannot, in general, use

physical time to find out the order of various events

occurring in a distributed system. To overcome this

problem, Lamport introduced the concept of logical

clocks and used it to generate timestamps. Since almost

every mutual exclusion algorithm has used timestamps

to order messages, to decide priorities and to resolve

conflict between processes seeking concurrent access

Fig. 1. Classification tree for DME algorithms.

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 163

to CS, we find it appropriate to put here in brief the

concept of timestamps as given by Lamport [56].

Each process Pi maintains a logical clock Ci , a

function which assigns a number Ci (a) to any event a

occurring in the process Pi. Each process Pi increments

Ci between any two successive events. Every message

m generated from the process Pi contains a timestamp

Tm. A timestamp, Tm, is a nonnegative integer whose

value equals Ci (a), where event a is the sending of

message m by the process Pi. On receiving a message

m, a process Pj sets its clock Cj to a value, which is

greater than or equal to the maximum of its present

value and Tm, the timestamp on the message received.

Ties are broken using process identities.

In the section that follows, we discuss vote assign-

ments and coteries, two important tools that are

generally used to achieve mutual exclusion in distrib-

uted systems and some related concepts.

6. Techniques for mutual exclusion in PBDME

algorithms

Mutual exclusion in permission-based DME algo-

rithms can be achieved either by using voting or

through coteries. Thomas [94] was the first to intro-

duce the concept of voting. A nonnegative integer vi is

attached to each node i of the set U={1, 2,. . ., N}. Thecollection f={v1, v2,. . ., vN} is called a vote assign-

ment for the system. The simplest way to assign votes

is to assign one vote each to a process. This is called

uniform voting. However, different votes can be

assigned to different processes. This is known as

weighted voting and was introduced by Gifford

[36]. If all the votes are assigned to a single node,

the assignment is said to be singleton. For a vote

assignment f, Totf =PN

i¼1 vi gives the total number of

votes in the system. The number of majority votes in

the system is given by

Majf ¼1=2Totf þ 1; if Totf is even

1=2ðTotf þ 1Þ; if Totf is odd

8<:

A process can enter its critical section only if it has

obtained permission from a number of nodes whose

total number of votes is at least Majf. Since a node

grants permission to only one node at a time, only one

node can gather a majority of votes at a time andmutual

exclusion is ensured.

In Ref. [38], Garcia-Molina and Barbara intro-

duced the concept of a coterie as a powerful tool for

enforcing mutual exclusion in distributed systems. A

coterie C, on a nonempty set U, is a collection of

nonempty subsets of U satisfying:

(i) Minimality condition: S, TaCZ SKT, TKC

(ii) Intersection property: S\T p F bS, TaC

Elements of a coterie are called quorum groups or

quorum sets or just quorums. If a node wants to

perform a restricted operation, it must obtain permis-

sion from each and every node of some quorum in the

assigned coterie. Since a node grants permission to

only one node at a time and since any two quorums in a

coterie have at least one node in common, mutual

exclusion is guaranteed. A number of techniques have

been proposed to construct coteries [2,8,21,58,59].

A coterie C on a set U is said to be dominated by a

coterie D ( p C) on U if every quorum of C has a

quorum of D as its subset, i.e.,

bGaC; aHaD : HpG

A coterie C is nondominated (ND) if there does

not exist any coterie that dominates it. Suppose that a

coterie D dominates a coterie C. Then, if a quorum

group of C survives in a partition due to node or link

failures, a quorum group in D will certainly survive.

Therefore, we can say that an ND coterie provides

more protection against partitions as compared to the

coteries it dominates.

Thus, in short, we can say that, in permission-based

DME algorithms, a node that wishes to access CS

seeks permission from other nodes of the system.

Mutual exclusion is enforced either using voting

schemes or through coteries. Based on the technique

used, permission-based algorithms can be divided into

two categories:

5 voting-based algorithms

5 coterie-based algorithms

While comparing the two strategies of achieving

mutual exclusion, Garcia-Molina and Barbara [38]

observed that the two approaches are not equivalent.

Whereas every vote assignment gives rise to a coterie,

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181164

there are coteries that cannot be obtained from vote

assignments. A coterie, which has a corresponding vote

assignment, is called a vote-assignable coterie. The

authors prove that for systems with five or fewer nodes,

the two concepts turn out to be equivalent, but for

systems with more than five nodes, there exist coteries,

which are not vote-assignable. Thus, coteries are more

general than vote assignments as they provide us with

choices,whichwouldnot have beenobtained usingvote

assignments (for a system of n nodes, the number of

distinct vote assignments is not greater than 2n2

, while

the number of distinct coteries is greater than 22cn

for

some constant c [38]).

Though at the outset, the two approaches for

obtaining mutual exclusion in distributed systems

may seem to be similar but they are different, and

the difference between the two is quite important. In

algorithms based on coteries, in order to perform a

restricted operation, a node must seek permission from

a group of nodes, which form a quorum group of a pre-

assigned coterie. Permission from any arbitrary set of

nodes will not do. On the other hand, in algorithms

based on voting, a node is not concerned about which

nodes contribute to its required number of votes. Its

only concern is the number of votes it has to collect. It

sends requests and as soon as it gets permission from

nodes whose total votes constitute majority, it enters

CS. Thus, though (as observed above) coteries are

more general than vote assignments, vote assignments

are more flexible as compared to coteries. Therefore,

the difference between the two approaches is quite

significant and an attempt to classify permission-based

algorithms into voting-based algorithms and coterie-

based algorithms is justified.

In this section, we have restricted ourselves to the

discussion of some very basic concepts, which are

crucial for developing a fair understanding of the

subject. A number of concepts may still be required

to make our discussion meaningful and understand-

able. We shall define them as and when needed. We

now proceed to discuss various PBDME algorithms in

the next two sections.

7. Voting-based algorithms

Voting-based algorithms use majority voting to

achieve mutual exclusion. A process seeking entry into

CS sends messages and on receiving replies (permis-

sions) from various nodes, sums up the number of votes

of the nodes granting permission and its own votes. As

soon as the number of votes gathered becomes more

than or equal to Majf, the process can enter its CS.

Voting can be static or dynamic. In static voting,

the nodes do not keep information about the current

state of the system and the votes, once assigned, are

not changed as the algorithm evolves. In dynamic

voting, the nodes keep track of the current state of the

system and in the case of network partitioning, due to

link or node failures, new votes may be assigned so as

to make at least one partitioned group of nodes active

and to keep the system working. We first take up static

algorithms.

7.1. Static algorithms

The earliest voting-based distributed mutual exclu-

sion algorithms were given by Thomas [94] and

Gifford [36]. These algorithms were static in nature.

The votes were fixed a priori. While Thomas used

uniform voting, Gifford used weighted voting. In

uniform voting, each node is assigned a single vote

(or the same number of votes) and a node, which

obtains permission from a majority of nodes, is

allowed to enter CS. The technique used by Thomas

is known as the majority voting. In weighted voting,

different votes may be assigned to different nodes.

Gifford used weighted voting for managing replicated

data. A process is permitted to perform a write (read)

operation only if it has collected a pre-specified

number of votes called write (read) quorum. For a

vote assignment f, the read quorum r and the write

quorum w are chosen so as to satisfy r +w>Totf and

2w>Totf. This ensures mutual exclusion as simulta-

neous occurrence of two write operations or a read

and a write operation is ruled out. The technique used

by Gifford is called the quorum consensus method.

In fact, the majority voting of Thomas is a special case

of the quorum consensus method of Gifford.

Agrawal and Abbadi [3] impose a logical structure

on the set of copies of an object and develop a

protocol that uses the information available in the

logical structure to reduce the communication require-

ments for read and write operations and call it Tree

Quorum Protocol. It is a generalization of the static

voting protocol with two degrees of freedom for

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 165

choosing quorum. Instead of requiring each write

operation to write copies at all levels of the tree, it

is required to write only a quorum w of levels. As a

result, read operations are required to access a quorum

of levels so that there is an intersection of a level

between a read and a write operation. Also, instead of

requiring each operation to access a majority of

children, operations are required to access a quorum

of children such that read and write operations have an

intersection of at least one child. These generaliza-

tions result in quorums with two degrees of freedom:

the number of levels and the number of children.

Kumar [1] observed that a major problem with the

quorum consensus method of Gifford [36] for syn-

chronizing access by read and write operations to

copies of a replicated object is that the number of

copies required for a quorum increases linearly with

the total number of copies present of the object as the

minimum quorum size is [(N + 1)/2]. He proposed that

the copies of an object in a database be organized

logically into a multilevel tree of depth m with root at

level 0. The physical nodes be identified by the leaves

of this tree and higher level nodes of the tree shall

correspond to logical groups. The quorum is defined

by collecting a majority of votes at each level. This

reduces the quorum size required for synchronizing

operations on N copies to N 0.63. The hierarchical

quorum consensus algorithm (HQC algorithm) turned

out to be more general than ordinary voting, as there

are quorums, based on this algorithm, which do not

have an equivalent single-level representation.

In Ref. [4], the concept of multidimensional voting

was introduced. In multidimensional voting, the vote

assignment to each node and the quorum are k-dimen-

sional vectors of nonnegative integers. For a distrib-

uted system of N nodes, numbered 1, 2,. . ., N, themultidimensional vote assignment VN,k is an N� k

matrix, where vij represents the vote assigned to node

i in the jth dimension. Clearly, vijz 0 for i = 1, 2,. . ., Nand j = 1, 2,. . ., k. The votes assigned in various

dimensions are independent of each other. The quo-

rum-assignment q!k=( q1, q2,. . ., qk) is a k-dimensional

integer vector where qj>0 for j= 1, 2,. . ., k. In addition,

a number l, 1V lV k, is defined. It is the number of

dimensions of vote assignment for which the quorum

must be satisfied. Thus, there are two levels of require-

ment: the vote level and the dimension level. At the

vote level, the number of votes must be greater than or

equal to the quorum requirement in that direction and

at the dimension level, the number of dimensions for

which a quorum is collected must be greater than or

equal to l. Multidimensional voting with quorum

requirement in l out of k dimensions is denoted by

MD(l, k)-voting. In this paper, Ahmad et al. show that

though, in general, a coterie may not be vote-assign-

able, every coterie can be represented by an MD(l, k)

vote and quorum assignment. To support their result by

examples, they obtain an MD(1, 5) vote and quorum

assignment for tree-based coterie of Agrawal and

Abbadi [2] for a tree of depth three with seven nodes,

and give an example of a coterie for N = 6 which is not

vote-assignable ordinarily but has a corresponding

multidimensional vote assignment MD(1, 4). The

authors show that the multidimensional voting can

also be applied for reading and writing replicated data

and to partially replicated data and obtain a multi-

dimensional vote and quorum assignment for the 3� 2

grid protocol of Cheung et al. [21]. They also introduce

the concept of nested multidimensional voting and

apply it to obtain a generalization of the Hierarchical

Quorum Consensus method given by Kumar [1].

7.2. Dynamic algorithms

Dynamic algorithms use the knowledge of the

current state of the system to choose new vote assign-

ments. Though voting is resilient to partitions in the

sense that the system keeps on working despite

partitioning as long as there is at least one group

whose nodes have a majority of votes, there may arise

situations when none of the partitioned groups can

collect the required number of votes and the system

may enter a halted state. Halted states are undesirable

as they reduce system availability. Dynamic algo-

rithms help in improving performance by reducing

the incidence of halted states.

In Ref. [15], Barbara et al. suggest two methods of

dynamic vote reassignment: the group consensus

method and the autonomous reassignment method. In

the group consensus method, the nodes in the major-

ity-partitioned group agree upon the new vote assign-

ment, using either a distributed algorithm or by

electing a coordinator to perform the task. In the

autonomous reassignment method, each node makes

decision about changing its votes and picks up a new

vote value on its own. However, before the change is

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181166

made final, the node must collect a majority of votes.

While reassignment by the group consensus method is

more accurate and more resilient to future failures, its

implementation is complicated. On the other hand, the

autonomous reassignment method is quicker, simpler

and more flexible, although the global vote assignment

achieved by this method may not be as good as the one

achieved through group consensus [15].

Jajodia and Mutchler [46] propose two generaliza-

tions to voting schemes: dynamic voting and dynamic

voting with linear ordered copies (dynamic-linear vot-

ing, in short). In dynamic voting, the number of sites

necessary for carrying out an update is a function of the

number of up-to-date copies in existence at the time of

the update and not of the total number of copies, as in

the case of static algorithms. Changes to the quorum

occur dynamically, without any manual intervention.

Each copy of the file maintains two variables: a version

number, VN, and an update sites cardinality, SC, which

is the number of sites that participated in the most

recent update to the file. When a site wants to commit

an update, it sends messages to all the sites in the

partition to which it belongs. On receiving replies, it

learns about the VN (the most recent version number of

the copies available in the partition) and SC, the

number of copies in the partition with that version

number. It commits an update if the partition, to which

it belongs, is the distinguished partition. A partition P is

the distinguished partition if and only if it contains

more than half of SC sites with version number VN.

Dynamic-linear voting extends dynamic voting by

adding a third variable, the distinguished site, DS,

which is used to break the tie when a partition contains

exactly half of the sites with the up-to-date version of

the file. When the number of sites participating in an

update is even, each site sets its DS entry to one of the

participating sites. If, in a partition, the number of sites

with the most recent VN is exactly SC/2, an update is

permitted if and only if the site contains the distin-

guished site DS. In such a case, an update would not

have been permitted in dynamic voting. Moreover,

dynamic-linear voting accepts an update whenever

dynamic voting does.

7.3. Availability analysis for vote assignments

Availability is a very important performance meas-

ure for distributed mutual exclusion algorithms and

needs special attention. Due to node or communication

link failures, a network of computers may partition

into isolated disjoint groups of nodes (called parti-

tions). In the case of voting-based DME algorithms, a

node seeking entry to its CS has to collect a majority of

votes. However, due to network partitioning, it may so

happen that no partition is available so that the re-

quired quorum of votes can be collected. Availability

of a vote assignment tells us about the possibility of

finding an active partition (with total votes more than

or equal to the required quorum). In a highly available

vote assignment, the probability of finding such a

partition, and thus the probability of a restricted

operation being performed, is very high. Therefore,

relative performance of various vote assignments can

be compared on the basis of their availability. A lot of

work has been done to find the availability of various

vote assignments and to find vote assignments optimal

with respect to availability.

In Ref. [14], Barbara and Garcia-Molina study the

impact of the network topology and the number of

nodes on the availability of a system. Heuristics to

assign votes, so that the probability of having an active

group is maximized, are suggested in this work. These

heuristics take into consideration the reliabilities of

nodes and links and the network topology and give

good results in most cases. The authors use a proba-

bilistic metric to study vote assignments and derive

analytical expressions for system reliability in the case

of three homogeneous topologies—fully connected,

Ethernet and ring network, and find that, in the case of

these three topologies, for different link and node

reliabilities, different vote assignments (uniform, non-

uniform or singleton) are viable.

In Ref. [13], Barbara and Garcia-Molina study the

influence of vote assignments on node and edge

vulnerability and show that the vulnerability of a

distributed system can be improved by the use of good

assignments. The node vulnerability of a system is

the minimum number of crashed nodes that produce a

halted state, and its edge vulnerability is the minimum

number of link failures that produce a halted state. The

highest node vulnerability for a system with N nodes is

achieved by an (N + 1)/2 uniform vote assignment if N

is odd, and by an N/2 pseudo-uniform vote assignment

if N is even. In the case of edge vulnerability, the best

vote assignment cannot be determined without refer-

ring explicitly to a topology. For a ring of N nodes

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 167

(N>2) and any nondominated coterie (vote assign-

ment) other than the singleton coterie assigned to it,

its edge vulnerability is 3. For a completely connected

graph of N nodes, the best coterie to assign in terms of

edge vulnerability is an [N/2] uniform or pseudo-uni-

form coterie (N>2).

Jajodia and Mutchler [46] compare the availabil-

ities of various voting schemes under two models: the

site model and the link model. In the site model, only

sites fail and links are infallible. Any site that has not

failed can communicate with every other site that is

up. In the link model, the network is modeled as a

connected graph. Only the links are subject to failures

and repairs; the sites do not fail. Under the site model,

the availability of the dynamic-linear algorithm turns

out to be greater than that of the dynamic voting and

the static voting algorithms, when the number of sites

in the model is more than three. Also, under the link

model, the availability of the dynamic-linear voting

turns out to be larger than the best static algorithm.

Thus, the authors deduce that the dynamic-linear

voting has greater availability than any static algo-

rithm. Moreover, it is as easy to implement as static

voting and its message complexity is slightly more

than any static algorithm.

In Ref. [20], Cheung et al. address the problem of

optimizing performance for reading and writing repli-

cated data and provide a method to enumerate vote

and quorum assignments in the general read/write

case when the read and write quorums may be differ-

ent. They give an efficient method for finding the

availability for any vote and quorum assignment when

the availability of the nodes and read/write trans-

actions are given and use system availability to

illustrate how the set of vote and quorum assignments

can be used to find the optimal assignment for reading

and writing replicated data.

Spasojevic and Berman [83] prove that though

coteries are more general than vote assignments,

voting is an optimal static pessimistic scheme for

fully connected networks with negligible link failures

and for Ethernet systems. However, it may not be

optimal for general systems. They actually present a

network for which no voting scheme yields optimal

availability. The authors also propose an efficient

algorithm for computing optimal vote assignment

for fully connected networks when relative frequen-

cies of read and write operations are known.

Thus, we have seen that a number of voting-based

algorithms are available, and one may choose among

these the one that best suits the system requirement.

Voting-based algorithms are easy to implement and

provide high degree of flexibility. Availability can be

taken as a criterion to compare various voting-based

algorithms. Other things being equal, a vote assign-

ment with higher availability shall be chosen. One

may choose between static and dynamic algorithms.

Dynamic algorithms provide higher availability by

adapting to changing system conditions. System avail-

ability in the case of dynamic vote assignments is,

certainly, better than that in the case of static vote

assignments and of the two dynamic vote reassign-

ment methods given by Barbara et al. [15]; the

autonomous reassignment method is a viable alter-

native to the group consensus method since it yields

almost as much availability as the group consensus

approach and is much faster [15]. Prior knowledge of

the reliabilities of nodes in a system can be helpful in

making a decision about the selection of the algorithm

to be implemented. For example, of the various static

algorithms available, the HQC algorithm has better

availability than that of the majority voting for p < 0.5

(where p is the probability that a node of the system is

up), and for p>0.8, the majority voting algorithm does

better than the HQC algorithm [1]. Network topology

may also play an important role in selecting an

algorithm. For example, Barbara and Garcia-Molina

[14] show that for homogeneous, fully connected

systems with perfect links and for Ethernet systems,

if p = 0.5, then the uniform assignment is better than

any non-uniform assignment, whereas for the ring

network, there may be non-uniform assignments that

are superior to the uniform assignment even if p>0.5.

They observe that distribution of votes may not be

advisable (i.e., assigning all the votes to a single node

is preferable) in the ring networks or other networks

with low connectivity, but may prove to be advanta-

geous in improving system availability if a few

reliable links are added to these networks. Even while

dealing with replicated databases, high availability is

desirable because an algorithm with a high availability

is more likely to satisfy a request for an update. The

number of copies replicated may be important. For

example, for large number of copies, the Tree Quorum

Protocol provides higher availability of operations at

substantially lower rates [3].

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181168

8. Coterie-based algorithms

In coterie-based algorithms, a node trying to enter

CS has to seek permission from a group of nodes, and

for mutual exclusion to hold, such a group is picked

from a coterie. A number of coterie-based algorithms

have been proposed. Different algorithms have used

different methods to construct quorum groups and as a

result, they show different performances.

The algorithm of Ricart and Agrawala [74] is one of

the earliest known DME algorithms and needs special

mention. In this algorithm, a node i seeking entry to CS

would send requests to all the remaining nodes. On

receiving a request, a node j sends a reply immediately

if it is not interested in CS. If the node j is itself in-

terested in CS, it compares the priority of the incoming

request with its own priority, and the process with

higher priority gets the right to enter its CS. By sending

a reply message to a node i, the node j does not lock

itself for the node i; it may send reply messages to

other nodes as well. It is based on self-conflict, while

most of the algorithms developed later look for the

conflict among various requesting sites. This puts

Ricart and Agrawala’s algorithm in a class apart from

other algorithms. The algorithm is fully distributed,

symmetric and static in nature. It has message com-

plexity 2(N� 1) and synchronization delay, D = 1.

This is an improvement over Lamport’s algorithm

[56] which is perhaps the first DME algorithm in

message passing models and has message complexity

3(N� 1).

Carvalho and Roucairol [24] observed that the

message complexity of Ricart and Agrawala’s algo-

rithm could be improved by a slight modification. A

node seeking entry into CS for the first time sends

request messages to all the remaining nodes as in the

case of Ricart and Agrawala’s algorithm. However,

subsequently, it sends request messages only to those

nodes to which it had sent back its permission (owing

to a request message from them). The permission of

those nodes from which it had received a reply

message, and has not sent back its permission, is taken

for granted. This algorithm is dynamic in nature and its

message complexity varies from 0 to 2(N� 1).

These two algorithms use ‘self-conflict only’ cri-

teria for granting permissions. We now go through

various coterie-based algorithms in which nodes look

beyond self-conflict while granting permission to other

nodes. As in the case of voting-based algorithms,

coterie-based algorithms can also be static or dynamic,

depending on the fact whether the nodes keep track of

the current state of the system or not.

8.1. Static algorithms

The first ever coterie-based algorithm was given by

Maekawa [59]. It can be regarded as a stepping-stone

in the development of DME algorithms for a number

of reasons:

5 For the first time, it was assumed that nodes are

logically arranged.

5 Unlike Ricart and Agrawala’s algorithm, Maeka-

wa’s algorithm goes beyond self-conflict. On

receiving a request message, a node would send

its permission to the requesting node only if it has

not already granted permission to some other node.

It grants permission to only one node at a time,

locks itself exclusively for that node and only after

receiving a RELEASE message from the node, it

grants permission to another node: the node with

the highest priority among the requesting nodes.

5 The logical grid used by Maekawa and its

modifications have been frequently used in a

number of algorithms developed later.

The quorums in Maekawa’s algorithm are obtained

as follows. Each node i obtains permission from a set

of nodes Si. For mutual exclusion to hold, the con-

dition

(a) Si\Sj p F bi, j, 1V i, jVN is a must. In addition,

Maekawa assumed that

(b) iaSi, bia{1, 2,. . ., N}

(c) ASiA = k, bia{1, 2,. . ., N}

(d) For every j, jaSi, for exactly m values of i.

With these assumptions, the algorithm becomes truly

fully distributed. Every node has to do the same

amount of work (each node has to send and receive

the same number of messages for achieving mutual

exclusion due to condition (c)) and bears equal

responsibility (each node is an arbitrator for exactly

m nodes, as a result of condition (d)). Using results

from projective geometry, Maekawa showed that the

existence of Si’s satisfying conditions (a)–(d) is guar-

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 169

anteed and the size of each Si is equal toffiffiffiffiN

p. He also

gave a simpler method to obtain quorums Si’s. If the

set of nodes is logically arranged in the form of a grid,

Si, the quorum group for the node i, is the set of all the

nodes contained in the column and the row through

the node i. This gives the size of each Si as 2ffiffiffiffiN

p�1.

The algorithm has message complexity cffiffiffiffiN

pwhere c

varies from 3 to 5 and synchronization delay D = 2.

Singhal [79] observed that Maekawa’s algorithm is

prone to deadlock and the reason for deadlock is that in

this algorithm, a site granting permission to another

site locks itself for that site in an exclusive mode. He

suggested that the locks on sites must be made mutable

to avoid deadlocks. If a higher priority request arrives

at a site after the site has been locked by a lower

priority request, it converts the exclusive lock into a

shared lock. The lower priority request preempts the

lock if it has not yet been able to lock all the sites in its

request set: the set of nodes to which it is supposed to

send request messages. A request succeeds in locking a

site, if its priority is higher than the priority of all the

requests, which currently hold the lock on it. This

makes the algorithm deadlock-free but increases the

message complexity from o(ffiffiffiffiN

p) to o(N). The syn-

chronization delay is equal to 1. A technique for the

formation of optimal request sets is also discussed in

this paper.

Agrawal and Abbadi [2] gave another logical

structure-based algorithm called the tree quorum algo-

rithm. The sites are logically arranged into a tree with a

well-defined root. A quorum is constructed by select-

ing any path starting from the root and ending with any

of the leaves. If a path is not available as a result of the

failure of a node i, then i is substituted with two paths

(m paths, when the algorithm is extended to a tree in

which every node has degree m), both (all) starting

with the children of i and terminating with leaves.

Quorums (quorum groups) obtained using this algo-

rithm are called tree quorums. The collection of all

the tree quorums gives the coterie generated by this

algorithm. In a failure-free environment, the message

complexity of the tree quorum algorithm is [log N]

which is an improvement over Maekawa’s algorithm

(MC=ffiffiffiffiN

p). The algorithm exhibits the property of

graceful degradation: in a failure-free environment,

message complexity is low; as the number of node-

failures increases, the message complexity goes on

increasing. The penalty for failures closer to the root is

more severe than the failures in the vicinity of the

leaves. This is a quorum-construction algorithm and

other measures need to be taken to avoid deadlock and

starvation [2].

Another advantage of Agrawal and Abbadi’s algo-

rithm over Maekawa’s algorithm is that while in

Maekawa’s algorithm each node has only one fixed

request set attached to it, a node in Agrawal and

Abbadi’s algorithm may choose its request set from

among a number of quorums in the coterie. The failure

of a single site in its request set, Ri, may render node i

unable to form a quorum group in the case of Maeka-

wa’s algorithm; the tree quorum algorithm always

guarantees the formation of a quorum group as long

as the number of failures is less than [log N] sites.

Since all coterie-based algorithms require that a

node seeks permission from some nodes before enter-

ing CS and informs some set of nodes after complet-

ing the execution of CS, processes need to keep

information about other processes. Sanders [77]

observed this and introduced the concept of informa-

tion structure as a unifying principle behind various

DME algorithms. The information structure of a

DME algorithm consists of Ri, the request set for the

process i, and Ii, the inform set for the process i. The

request set for the process i consists of those nodes of

the system from which it is supposed to receive

permission before it enters the critical section,

whereas its inform set consists of those nodes of

the system, to which it must inform whenever it

changes its state from in-CS to not-in-CS or vice

versa. Also, Sti denotes the status set that is main-

tained by the node i and is the set of the nodes about

which the node i maintains information. Obviously,

iaIjZ jaSti. Sanders gave a generalized DME algo-

rithm (GEN ISDME algorithm) based on information

structure and showed that for a DME algorithm to be

correct, its information structure must satisfy

ðaÞ iaRi biaU

ðbÞ IioRi biaU

ðcÞ bi; jaU ; i pj; fIi \ Ij pFg or fiaRj ^ jaRigð8:1:1Þ

where U is the set of nodes in the system. The

performance of a DME algorithm can be studied as a

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181170

function of its information structure. Moreover, the

information structure of an algorithm can be static or

dynamic.

Although, in principle, the GEN ISDME algorithm

is supposed to resolve all conflicts, there may arise

situations when it is deadlocked. This may be caused

by unpredictable communication delays. To overcome

this difficulty, Sanders proposed a deadlock detecting

general information structure-based distributed mutual

exclusion algorithm (DDGEN ISDME algorithm),

which detects and avoids potential deadlocks using a

deadlock recovery scheme, similar to the one used by

Maekawa [59]. The modified algorithm has the same

necessary and sufficient conditions on the information

structures in order to ensure mutual exclusion.

Bonollo and Sonenberg [17] showed that by

restricting the class of information structure used by

Sanders, a deadlock-free ISDME algorithm (DFGEN

ISDME) could be obtained which does not require

any deadlock recovery scheme. The information struc-

ture conditions that must hold are

ðaÞ iaIi biaU

ðbÞ IioRi biaU

ðcÞ ðiaIj _ jaIiÞ or fiaRj ^ jaRig bi; jaU ; i pj

ð8:1:2Þ

First part of condition (c) requires that either i or j is in

Ii\Ij, whereas in the case of Sanders’ theorem, it was

sufficient to have some kaIi\Ij. The class of informa-

tion structure allowed by these conditions is a subset

of the class of information structure allowed by

Sanders’ ISDME theorem. The algorithm has D = 1

and MC=(2a + 3b)/N. In the best case, the smallest

value of MC is 3(N� 1)/2 which is the same as that of

Singhal’s algorithm [79], and in the worst case, the

largest value of MC is 2(N� 1) which is the same as

that of Ricart and Agrawala’s algorithm [74].

Before we proceed further, let us take a look at

some of the algorithms mentioned above, their infor-

mation structures and their interrelationship. Both

Maekawa’s algorithm and Sander’s ISDME algorithm

were deadlock-prone and needed deadlock recovery

schemes. Singhal provided a deadlock-free Maekawa-

type algorithm (an algorithm in which request sets

satisfy iaRi, biaU and Ri\Rj pF, bi, jaU), which

did not require a deadlock recovery scheme. Using

almost similar technique, Bonollo and Sonenberg gave

deadlock-free ISDME algorithm, which did not re-

quire a deadlock recovery scheme. In terms of infor-

mation structures, conditions for Maekawa’s algorithm

are

Ri \ Rj pF;bi; jaU ð8:1:3Þ

and for Singhal’s deadlock-free Maekawa-type algo-

rithm are

iaRj _ jaRi bi; jaU ; i pj ð8:1:4Þ

The constraint (8.1.4) imposed by deadlock-free Mae-

kawa-type algorithm is stronger than the constraint

(8.1.3) required for Maekawa’s algorithm and is

weaker than what is required in Ricart and Agrawala’s

algorithm, viz.,

iaRj ^ jaRi bi; jaU ; ipj ð8:1:5Þ

We have already seen that the class of information

structures (8.1.2), allowed by the DFGEN ISDME

algorithm of Bonollo and Sonenberg, is a subset of the

class of information structures (8.1.1) allowed by the

DDGEN ISDME algorithm of Sanders. Moreover, the

class of information structures (8.1.2) is a superset of

the class of information structures (8.1.4) admitted by

the deadlock-free Maekawa-type algorithm of Sin-

ghal. Thus, the DFGEN ISDME algorithm of Bonollo

and Sonenberg may be regarded as representing a

continuum from Singhal’s deadlock-free Maekawa-

type algorithm on one hand and to the Ricart and

Agrawala’s algorithm on the other [74].

Maekawa’s logical arrangement of nodes into a grid

came out to be quite handy for constructing quorums

(and thus coteries), and a number of quorum construc-

tion techniques have been proposed based on this.

Cheung et al. [21] present the grid protocol for

synchronizing reads and writes in a replicated data-

base. The nodes storing copies of data are assumed to

be logically arranged into a rectangular grid. Read and

write operations are required to lock groups of nodes

(quorum groups) in such a way that two write oper-

ations or a read and a write operation cannot be

executed simultaneously. This is achieved as follows.

Choosing one node from each column of the grid

forms a read quorum group. Such a group of nodes

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 171

is called a C-cover (column cover) as it intersects with

every column of the grid. A write quorum group is

formed by choosing a C-cover along with all the nodes

in any one column of the grid. This ensures that any

two write quorum groups and a write and a read

quorum group always have a node in common. For

even distribution of load among the nodes, the choices

of read and write quorums are made using random

permutations of rows and columns. The protocol

provides high data availability and low response time.

The algorithm dynamically searches for a quorum but

all possible quorums are fixed a priori. A number of

quorums are available for each node.

In Ref. [6], Kumar and Cheung combined the ideas

in Refs. [21] and [1] and organized nodes in an H-grid

(hierarchical grid) using a multilevel hierarchy with

physical nodes at level 0 and logical groups at higher

levels. The write quorums can be obtained by collect-

ing all objects in any one column of the grid at the level

L. At the level 1, this procedure translates into collect-

ing level 0 objects, which are physical nodes. The set

of nodes so collected gives a quorum of size o(ffiffiffiffiN

p).

Agrawal et al. [8] used a modified grid to con-

struct what they called billiard quorums. Instead of

using horizontal and vertical line segments of rows

and columns in the grid scheme of Maekawa, they

used paths that resemble billiard paths. This reduced

quorum size toffiffiffi2

p ffiffiffiffiN

p. However, the size of each

quorum has to be an odd integer, and the method

does not satisfy the equal responsibility property of

Maekawa.

Luk and Wong [58] suggest that if, instead of using

a square grid, we organize the nodes logically into a

triangular grid, then any two quorums will meet only

in one node. They constructed quorums using this

triangular grid and the quorum size turned out to be

approximately equal toffiffiffi2

p ffiffiffiffiN

pwhich is better than

that of Maekawa and almost as good as that of

Agrawal et al. [8]. The quorums can be row-based

or column-based, and if the two schemes are used

alternately for each new request, the equal responsi-

bility property is satisfied and each requesting site can

have two quorums.

Rangarajan et al. [76] divided N sites into N/G

groups of G sites each and called each such group a

subgroup. They constructed N/G quorum groups such

that each quorum group is made up offfiffiffiNG

qsubgroups

with each subgroup containing G sites. The quorum

groups are constructed using finite projective planes

just as was done in Maekawa’s algorithm. The only

difference being that now, instead of sites, subgroups

are used to construct quorum groups. There is only

one quorum group per site. In fact, each quorum

group caters to G sites. Intersection between any pair

of quorum groups is exactly one subgroup. The

algorithm has a message complexity oðffiffiffiffiffiffiffiffiffiffiffiffiffiffiN logN

and achieves asymptotically high resiliency to failures

for a much lower message overhead than majority

voting.

Cao et al. [25] suggest an algorithm, which reduces

synchronization delay and still has low message

complexity. The basic idea is as follows. A site i

can enter its CS only if it has got permission from all

the sites in its quorum group. However, instead of first

sending a RELEASE message to unlock the arbiter

site, which in turn sends a REPLY message to the next

site to enter the CS, a site exiting the CS directly sends

a REPLY message to the site which will next enter

CS. This reduces synchronization delay from 2 to 1.

This is a coterie-based algorithm but is independent of

the coterie being used, i.e., we can use any coterie we

like. The message complexity is o(ck), where c is a

constant lying between 3 and 6 and k is the average

size of the quorum. For example, k isffiffiffiffiN

pif we use

quorums formed by Maekawa’s algorithm and it is log

N if we use those formed by Agrawal and Abbadi’s

tree quorum algorithm. Use of the logical structure

depends on the coterie formation scheme being used.

Moreover, the resiliency to site and communication

link failures can be increased if a fault-tolerant coterie

is used.

Neilson and Mizuno [64] gave a method of pro-

ducing new coteries from known coteries with the

help of a function, which they called the coterie join

algorithm. The method is as follows.

For any set of nodes U1, U2 with U1\U2 =F and

for xaU1, define U3 =U1� {x}[U2. For any coterie

C1 on U1 and C2 on U2, we can obtain a coterie C3 on

U3 by retaining all quorums of C1 not containing x

and by replacing each occurrence of x in a quorum of

C1 by a quorum of C2. They proved that the new

coterie C3 thus obtained is nondominated whenever

both C1 and C2 are nondominated. They emphasize

that since it may be cumbersome to compute and store

all the quorums in the composite coterie, what needs

to be stored is (i) each of the input coterie and (ii) the

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181172

information about how the composite coterie is to be

constructed.

8.2. Dynamic algorithms

Most of the distributed mutual exclusion algorithms

are static in nature, as their information structures do

not change during the execution of the algorithm.

These algorithms do not take advantage of the

dynamic nature of the system. Dynamic ISDME algo-

rithms can adapt to fluctuating system conditions and

exploit them to optimize the performance. Singhal [80]

presents one such algorithm whose information struc-

ture evolves with time as sites learn about the system

through messages. The algorithm assigns an initial

information structure and the rules for changing the

information structure are such that the mutual exclu-

sion conditions are always satisfied. It gets rid of the

queue (in Maekawa’s and other similar algorithms),

which stores blocked requests. Instead, the algorithm

uses dynamically changing inform sets which lead to

more elegant presentation of dynamic ISDME algo-

rithms. Initially, the request set for the node i is set as

Ri={1, 2,. . ., i}. The cardinality of Ri decreases in

stepwise manner from left to right. The rules for

information exchange and updating request sets and

inform sets are such that the staircase pattern is

preserved in the system even after the sites have

executed CS a number of times. Deadlocks are avoided

by assigning a globally unique timestamp to each

request, which determines its priority. In this paper, a

simulation study is also carried out to compare the

algorithm with that of Ricart and Agrawala. The

message complexity of the algorithm is obtained as

N� 1 for low to medium traffic, and its average

message complexity for heavy traffic is 3(N� 1)/2.

This improvement over the Ricart and Agrawala’s

algorithm is achieved without sacrificing delay. More-

over, the dynamic nature of its information structure

does not complicate the algorithm’s recovery from

various types of failures.

Rabinovich and Lazowska [75] propose a mech-

anism that allows dynamic adjustment of quorum

sets when quorums are defined based on a logical

network structure. This improves the availability of

the replica control protocols. The basic principle is

as follows. Given an ordered set of nodes, one can

devise a rule that unambiguously imposes a desired

logical structure on this set. The read and write

operations rely on this rule for determining quorum

groups and not on the knowledge of the static

structure of the network. In this protocol, each node

is assigned a name and all names are linearly

ordered. Among all the nodes replicating a data item,

a set of nodes is identified as the current epoch and,

at any time, the data item may have only one current

epoch associated with it. Originally, all replicas of

the data item form the current epoch and at least a

write quorum of nodes from an existing epoch are

required to be included in the new epoch. The

authors use this technique to obtain a dynamic grid

protocol from the grid protocol of Cheung et al. [21]

and show that the dynamic grid protocol increases

the availability manifolds as compared to the static

grid protocol while preserving its good load sharing

and message traffic characteristics.

8.3. Availability of coteries

Availability is an important criterion, which can be

used to test the quality of a coterie. The availability of

a coterie is the availability of at least one of its quorum

group when required. When a node needs to enter CS,

it should be able to find at least one quorum group, all

of whose vertices are operational and connected. It is

defined as the probability that the set of operating

nodes has a connected subgraph consisting only of

fault-free nodes such that some quorum of the coterie

is included in the subgraph. The availability of a

coterie C with respect to G is denoted by AG(C).

Attempts have been made to find coteries optimal

with respect to availability. Ibaraki et al. [44] used the

concept of G-domination (domination with respect to

a graph) of coteries to calculate the availability of

coteries on rings and trees and obtained the following

results:

5 An optimal coterie for a graph G is always an

optimal coterie for one of its biconnected compo-

nent.

5 A singleton coterie is optimal if G is a tree.

5 A singleton coterie or a three-majority coterie

maximizes availability if G is a ring.

In Ref. [31], Diks et al. prove that for arbitrary

networks with low node reliability ( p= 1/2), the single-

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 173

ton coterie is always optimal. They also obtain optimal

coteries for complete multipartite networks with p = 1/2

and vote-assignable optimal coteries for (1,m)-, (2,m)-

bipartite networks for node reliability p = 1/2. A graph

G is multipartite if it can be decomposed into a

number of disjoint subsets V1, V2,. . ., Vk such that no

edge inG joins the vertices in the same subset. If Vi has

mi points, i= 1, 2,. . ., k, it is said to be (m1, m2,. . ., mk)-

multipartite. It is said to be bipartite if k = 2.

In Ref. [69], Peleg and Wool show that if p denotes

the probability that a node fails, then for 0 < p < 1/2,

the least available ND coterie is the singleton coterie

and the most available coterie is the majority coterie,

whereas if the nodes are fail-prone (1/2 < p < 1), the

best strategy is to pick a single centralized king. If piis different for different nodes and pi< 1/2 for every i,

then voting defines the optimal availability quorum

system. If 0 < pi < 1/2 for every i, then the optimal

weights are given by wi = log2 (1� pi)/pi [83,95].

Amir and Wool [9] complete this line of research

when they consider the most general case in which for

some or all values of i, piz 1/2. The authors

5 prove that any element with piz 1/2 is dummy and

can be assigned weight wi= 0;

5 show that if piz 1/2 bi, then a monarchy is an

optimal quorum system with one of the least

unreliable processor as the king;

5 present an algorithm for calculating the optimal

weights for a given quorum system.

8.4. Communication delay: an important perform-

ance metric for coteries on graphs

Communication delay is another important metric

on the basis of which we can test the quality of a

coterie. Fu et al. [35] and Fu [33] observed that

generally quorum size has been used to estimate the

communication cost of a coterie without considering

the actual distance between different sites. They

define two metrics to measure the communication

delay (or the communication cost) of coteries, which

take into account the network topology. The metrics

are max_delay and mean_delay and are defined as

follows.

Let G=(V, E) be a network and let C be a coterie on

G (on its node set V). Let dist(a, b) denote the distance

between any two nodes a and b in G. It is equal to the

minimum number of hops necessary for moving from

a to b through the edges in G. For any saG, and for

any QaC, we find the distance of s from every point

of Q and denote the maximum of these distances by

a(s, Q). That is,

aðs;QÞ ¼ maxfdistðs; vÞ : vaQg

The value of a(s, Q) is obtained for every quorum Q

in C, and the minimum of these values is called the

delay of the node s from the coterie C. It is denoted

by delay(s, C ). Thus,

delayðs;CÞ ¼ minfaðs;QÞ : QaCg¼ min

QaCmaxvaQ

distðs; vÞ

A quorum Q in C for which a(s, Q) is minimum is

called an optimal quorum for the node s. Therefore,

if a quorum Q in C is an optimal quorum for the node

s, then delay(s, C ) = a(s, Q). For the coterie C, its

max_delay is defined as

max�delayðCÞ ¼ maxfdelayðs;CÞ : saVgand its mean_delay as

mean�delayðCÞ ¼ 1

AVA

XsaV

delayðs;CÞ

A coterie on G with a minimal value of delay taken

over all coteries on G is called a delay optimal coterie

on G. Thus, if C(G) denotes the set of all coteries on

G, a coterie C* is said to be a max_delay optimal

coterie on G if

max�delayðC*Þ ¼ minfmax�delayðCÞ : CaCðGÞg

It is said to be mean_delay optimal if

mean�delayðC*Þ¼minfmean�delayðCÞ : CaCðGÞg

Fu et al. [35] and Fu [33] use these metrics to obtain

delay optimal coteries for a number of network top-

ologies like trees, rings, hypercubes and grids. They

also investigate cost optimal bicoteries/wr-coteries for

replicated databases on a clustered network. Saxena et

al. [88] obtain delay optimal coteries on the k-dimen-

sional folded Petersen Graph introduced by Ref. [67].

The same authors in Ref. [87] propose a technique to

obtain delay optimal coteries on regular symmetric

networks. In fact, the technique presented in this paper

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181174

also works for general graphs. Tsuchiya et al. [97]

propose a polynomial time algorithm to find max_de-

lay optimal coteries for general topology networks.

Kobayashi et al. [55] propose an algorithm to find

mean_delay optimal coteries for general topology net-

works.

8.5. Nondominated coteries

As defined earlier, a coterie is nondominated if

there is no coterie which dominates it. The importance

of nondominated coteries can be seen from the fact

that they provide more protection against partitions as

compared to the coteries they dominate. Also, the

message complexity of a nondominated coterie is

expected to be less than the coterie it dominates

because most of its quorums are subsets of the

quorums of the coterie it dominates.

Attempts have been made to find all possible

coteries and ND coteries. While Garcia and Barbara

[38] used hypergraphs to study coteries; Ibaraki and

Kameda [43] used Boolean functions. While Ibaraki

and Kameda [43] used the self-duality of a function

associated with a coterie to characterize ND coteries,

Bioch and Ibaraki [16] used the concept of almost

self-duality of Boolean functions to check whether or

not a given positive function is Boolean and gave an

algorithm to generate all positive self-dual functions

and thus all ND coteries.

Harada and Yamashita [41] characterize G-ND

coteries in graph theoretic terms and derive a necessary

and sufficient condition for the majority coterie on a

graph to be G-ND and show that if G= (V, E ) is a bi-

connected graph with minimum degree d(G)=(3jV j)/4,then the majority coterie C on G is G-ND. Kuo and

Huang [54] used availability to determine whether a

given coterie is ND or not. They showed that

5 A coterie C is ND if and only if AC ( p) = 0.5 when

p = 0.5.

5 The majority coterie is ND when N is odd.

5 The tree coteries are all ND.

5 The grid coterie in an m� n grid is dominated if

mz 1, n > 1.

Therefore, if we have a general method of finding

availability of a given coterie, we will be able to decide

whether the given coterie is nondominated or not. A

step in this direction has been taken by Tsuchiya and

Kikuno [96] where they present a technique for finding

availability of coteries on general topology networks.

Finding all possible ND coteries is a major research

issue.

Thus, we have seen that quite a number of tech-

niques have been suggested for obtaining coteries.

Different coteries have different quorum sizes and

thus varying message complexities. Selecting an

appropriate coterie from which to choose quorums

is an important issue in designing a distributed

system. Certainly, nondominated coteries are pre-

ferred as they have better performance in terms of

message complexity and availability. Two given

coteries can be compared on the basis of a number

of metrics like message complexity, synchronization

delay, fault tolerance, availability, communication

delay, etc. We present below, in Table 1, the trade-

off between message complexity and synchronization

delay of some of the well-known coterie-based

algorithms. Similarly, Table 2 depicts the comparison

of some permission-based algorithms vis-a-vis token-

Table 1

Tradeoff between message complexity and synchronization delay of various coterie-based DME algorithms

Algorithm Ricart and

Agrawala

[74]

Carvalho and

Roucairol

[24]

Maekawa

[59]

Bonollo and

Sonenberg

[17]

Kumar and

Cheung [6]

Cao

et al.’s

[25]

Singhal’s

dynamic

information

structure

[80]

Singhal’s

deadlock-

free [79]

Rangarajan

et al. [76]

Agrawal and

Abbadi [2]

Message

complexity

2(N� 1) 0 to 2(N� 1)ffiffiffiffiN

p(2a+ 3b)/N

ffiffiffiffiN

po(ck) 0 to 2(N� 1) 3(N� 1)/2 oð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiN logN

pÞ o(log N)

Synchronization

delay

1 1 2 1 2 1 1 1 2 2

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 175

based algorithms in terms of their message complex-

ities.

9. The k-mutex problem

In a distributed system, it may be possible some-

times to allow more than one node to access their

critical sections simultaneously. For example, a dis-

tributed system may have k identical copies of a

resource. Since each of these copies could be accessed

by only one node at a time, up to k nodes may be

allowed to use the resource simultaneously. This gives

rise to the k-mutual exclusion (or the k-mutex or the

multiple mutual exclusion) problem which states that

out of a number of processes seeking access to the

critical section, at most k are permitted to enter it

simultaneously.

Quite a few solutions have been provided to the k-

mutex problem. These solutions can be categorized as

token-based or permission-based algorithms, as in the

case of DME algorithms. In this section, we briefly

discuss some of these solutions. Note that any algo-

rithm designed to solve the k-mutex problem must

ensure that a node seeking entry to the critical section

is permitted to do so if less than k nodes are in CS. It is

denied access if k processes are already executing CS.

Raymond [71] was the first to provide a solution to

the k-mutex problem. The solution was permission-

based and was similar to the one provided by Ricart

and Agrawala [74] for the distributed mutual exclu-

sion problem. As in the case of Ricart and Agrawala’s

algorithm, a node seeking access to CS sends a

REQUEST message to all the remaining N� 1 nodes

of the system, but it enters CS as soon as it gets

REPLY messages from N� k nodes. Moreover, since

every REQUEST message eventually generates a

REPLY message and since the requesting node enters

the CS on receiving just N� k permissions, it will

receive the remaining k� 1 REPLY messages either

when it is executing CS or after it has left CS or

during the wait for another entry to CS. Therefore, it is

quite possible that a REPLY message to an earlier

request for CS is confused with that of the current

request for CS. To avoid such a situation, a sequence

number is attached to each attempt to enter CS. The

algorithm has message complexity varying between

2N� k� 1 and 2(N� 1).

Srimani and Reddy [90] proposed a token-based

solution to the k-mutex problem. They introduce k

tokens (PRIVILEGE messages) in the system. A node

seeking access to CS sends request messages to all the

nodes of the system and enters the CS only after it

obtains the token. Existence of k tokens in the system

enables up to k processes to access their critical

sections simultaneously. The worst-case message

complexity of the algorithm is N + k� 1 in the system.

The algorithm is analogous to that of Suzuki and

Kasami [89] for the distributed mutual exclusion

problem, in which the system has a unique token.

Special provision has been made in the algorithm to

take care of any possible starvation problem arising

due to the presence of multiple tokens.

In Ref. [50], Kakugawa et al. provide a set-theo-

retic solution to the k-mutex problem. They introduce

the concept of a k-coterie on a nonempty set V as a

collection C of nonempty, minimal subsets of V, called

quorums, and chosen in such a way that it is always

possible to find, starting from any quorum of C, a

group of k mutually disjoint quorums in C, and it is

never possible to find k + 1 mutually disjoint quorums

in C. Formally, a k-coterie is defined as follows.

A nonempty collection C={Q1, Q2,. . ., Qn} of

nonempty subsets of V is called a k-coterie on V if

the following conditions hold:

(i) Non-intersection property: let r be any positive

integer, r< k. For any collection of r members

Table 2

Comparison of some permission-based algorithms with token-based algorithms

Permission based algorithms Token-based algorithms

Coterie-based Voting-based

Algorithm Billiard

quorum’s [8]

Hierarchical

grid [6]

Majority

voting [94]

HQC [1] Suzuki and

Kasami’s [89]

Raymond’s

[72]

Naimi

et al.’s [66]

Singhal’s

[78]

Message

complexity

ffiffiffi2

p ffiffiffiffiN

poð

ffiffiffiffiN

pÞ [(N+ 1)/2] N 0.63 N o(log N) o(log N) N/2 to N

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181176

Q1,Q2,. . .,Qr ofC satisfyingQi\Qj =F for 1V i,

jV r, i p j, there exists a member QaC such that

Q\Qi =F bi, 1V iV r;

(ii) Intersection property: every collection Q1,Q2,. . .,Qk + 1 of members of C contains a pair Qi, Qj of

distinct elements such that Qi\Qj p F and;

(iii) Minimality property: Qi, QjaCZ QiKQj^QjKQi.

Members of a k-coterie are called quorums. The non-

intersection property not only guarantees the existence

of at least one set of r mutually non-intersecting

quorums in C for every integer r (rV k), but also

ensures that every set of less than k mutually disjoint

quorums in C can be extended to a set of k mutually

disjoint quorums in C. This ensures simultaneous

entry of up to k processes in the CS. The intersection

property ensures that C cannot have k + 1 non-inter-

secting quorums. Since a node grants permission to

only one process at a time and it is not possible to find

more than k non-intersecting quorums in C, more than

k processes cannot be in the CS at the same time. The

authors also define the k-majority coterie and the k-

singleton coterie and discuss the availability of these

coteries. The same authors, in Ref. [51], present a

detailed distributed k-mutual exclusion algorithm

based on k-coteries. The algorithm differs from the

coterie-based algorithms like that of Maekawa [59]

and Cheung et al. [21] in the sense that whereas in

these coterie-based algorithms a quorum from which a

node will seek permission is determined statically, in

Kakugawa et al.’s algorithm, a node seeking access to

CS searches for a quorum dynamically until it suc-

ceeds in obtaining permission from each and every

node of some quorum of the k-coterie or till the

possibility of finding such a quorum is ruled out.

Priority issues in this and all of the k-mutex algo-

rithms are resolved using time-stamps. The message

complexity of the algorithm varies between 3c and

6N, where c is the maximum quorum-size and N is the

system-size.

A number of techniques for the construction of k-

coteries have been proposed. In Ref. [52], Kuo and

Huang propose a scheme for constructing uniform k-

coteries on the n-complete graph with quorum-size

o(ffiffiffiffiN

p). The same authors in Ref. [53] present a

geometric approach for constructing coteries and k-

coteries. In Ref. [34], Fujita introduces the notion of a

weighted k-quorum system and using this proposes a

scheme for solving the k-mutex problem as a general-

ization of the quorum-based scheme for the distrib-

uted mutual exclusion problem. In Ref. [22], Chang

and Chen use the logical grid structure of Cheung et

al. [21] and propose a generalized grid quorum pro-

tocol for k-mutual exclusion, which shows good

performance in terms of quorum size and availability.

A sequence of quorum-based protocols Majr, where r

is a positive integral divisor of k, can be found in Ref.

[7], which are a generalization of the majority quo-

rums for k-mutual exclusion.

In Ref. [65], Neilsen and Mizuno introduce the

concept of a nondominated k-coterie and propose

techniques to construct nondominated k-coteries using

the majority voting of Gifford [36] and the coterie join

algorithm [64]. Harada and Yamashita [42] define

(nondominated) tree structured k-coteries as an exten-

sion of the tree coterie of Agrawal and Abbadi [2]

discussed earlier in Section 8.1. These tree-structured

coteries are expected to have high availability.

A method to obtain new k-coteries from given k-

coteries is also suggested by Saxena and Rai [92]. In

Ref. [91], the same authors define the communication

cost of a k-coterie. This performance metric takes into

account the network topology and is a generalization

of the communication cost of a coterie as defined by

Fu et al. [35]. The metric shall prove to be useful for

comparing the performance of various k-coteries. The

authors also show that the communication cost of a k-

coterie cannot exceed that of the k-coterie it domi-

nates.

Thus, we find that a number of solutions have been

proposed for solving the k-mutex problem, a general-

ization of the distributed mutual exclusion problem. In

this section, we have had a glimpse of some of the

solutions provided so far. A lot more literature is

available on k-mutual exclusion [27,28,45,48,49,61,

62,96,98,99].

10. Conclusion

Having gone through a number of permission-

based DME algorithms, we find that different algo-

rithms show different performance capabilities with

respect to different metrics. These algorithms can be

evaluated on the basis of performance measures like

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 177

message complexity, synchronization delay, availabil-

ity, communication delay, etc. An algorithm may be

optimal with respect to one of these parameters but

may show poor performance as regards another.

Which algorithm to choose for a particular application

is an important decision a system designer has to

make. Besides this, a designer, while choosing an

algorithm, has to cope with its implementation aspect

as well.

A choice can be made between voting-based algo-

rithms and coterie-based algorithms. Coteries are

more general and easy to enumerate. However, the

implementation of the algorithms based on coteries is

complex because a coterie can be exponential in size.

On the other hand, voting is more flexible and easier

to implement as each node has to maintain only its

own vote assignment and an operation can proceed if

the number of votes collected by it is more than or

equal to the required quorum. One can choose

between static and dynamic algorithms. Dynamic

algorithms optimize performance by adapting to fluc-

tuating system conditions. One would, certainly, like

to choose an algorithm, which is reasonably reliable

and has high availability. The higher the availability,

the better the algorithm.

Moreover, network topology may play an impor-

tant role in selecting the mutual exclusion technique to

be employed. For some networks like fully connected

networks and for Ethernet systems, voting gives

optimal availability as shown by Spasojevic and Ber-

man [83]. However, for general networks, coteries

may give better availability. At times, actual distances

between sites in a network may also be important as

suggested by Fu [33].

Thus, in a nutshell, we can say that it is not

possible to single out an algorithm as the best algo-

rithm for achieving mutual exclusion. A number of

algorithms are available from which we can choose

the one which best suits the application in hand,

keeping an eye on the system requirements, network

topology and the performance measure that needs to

be optimized.

In this paper, we have presented an overview of

permission-based DME algorithms. The generalized

k-mutual exclusion problem has also been discussed

in brief because a discussion on distributed mutual

exclusion would have been incomplete without a

reference to the multiple mutual exclusion. We do

not claim our survey to be exhaustive, but we have

tried to include a good number of algorithms that

have contributed to the development of the subject.

We have refrained from getting involved in the

intricacies of these algorithms. Rather, we have tried

to concentrate on the comparative study of these

algorithms, bringing out their common features and

listing out their differences. Wherever possible, we

have listed out important results that may be useful to

those working in this area. This, we hope, will come

out to be quite handy for a quick reference to

permission-based distributed mutual exclusion algo-

rithms and for developing reasonable understanding

on the subject.

Acknowledgements

The authors are thankful to the reviewers for their

valuable comments. It would not have been possible

for us to put the paper in its present form without their

useful suggestions.

References

[1] A. Kumar, Hierarchical quorum consensus: a new algorithm

for managing replicated data, IEEE Transactions on Com-

puters 40 (9) (1991) 996–1004.

[2] D. Agrawal, A.El. Abbadi, An efficient and fault tolerant

solution for distributed mutual exclusion, ACM Transactions

on Computer Systems 9 (1) (1991) 1–20.

[3] D. Agrawal, A.El. Abbadi, The generalized tree quorum

protocol: an efficient approach for managing replicated data,

ACM Transactions on Database Systems 17 (4) (1992)

689–717.

[4] M. Ahmad, M.H. Ammar, S.Y. Cheung, Multi-dimensional

voting, ACM Transactions on Computer Systems 9 (4)

(1991) 399–431.

[5] M. Ahmad, M.H. Ammar, S.Y. Cheung, Replicated data

management in distributed systems, Readings in Distributed

Computing Systems, IEEE Computer Society Press, Los

Alamitos, CA, 1994, pp. 572–591.

[6] A. Kumar, S.Y. Cheung, A high availability MN hierarchical

grid algorithm for replicated data, Information Processing

Letters 40 (1991) 311–316.

[7] D. Agrawal, O. Egecioglu, A.E. Abbadi, Analysis of quo-

rum-based protocols for distributed (k+ 1)-exclusion, IEEE

Transactions on Parallel and Distributed Systems 1 (5) (1997,

May) 533–537.

[8] D. Agrawal, O. Egecioglu, A.El. Abbadi, Billiard quorums on

the grid, Information Processing Letters 64 (1997) 9–16.

[9] Y. Amir, A. Wool, Optimal availability quorum systems:

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181178

theory and practice, Information Processing Letters 65 (1998)

223–228.

[10] J. Bacon, Concurrent Systems, 2nd ed., Addison Wesley,

Harlow, England, 1998.

[11] J.G. De Bruijn, Additional comments on a problem in con-

current programming control, Communications of the ACM

10 (3) (1967) 137–138.

[12] P.A. Bernstein, N. Goodman, An algorithm for concurrency

control and recovery for replicated database, ACM Trans-

actions on Database Systems 9 (4) (1984) 596–615.

[13] D. Barbara, H. Garcia-Molina, The vulnerability of vote-

assignments, ACM Transactions on Computer Systems 4

(3) (1986) 187–213.

[14] D. Barbara, H. Garcia-Molina, The reliability of voting mech-

anisms, IEEE Transactions on Computers c-36 (10) (1987)

1197–1208.

[15] D. Barbara, H. Garcia-Molina, A. Spauster, Increasing avail-

ability under mutual exclusion constraints with dynamic vote

reassignment, ACM Transactions on Computer Systems 7

(4) (1989) 394–428.

[16] J.C. Bioch, T. Ibaraki, Generating and approximating non-

dominated coteries, IEEE Transactions on Parallel and Dis-

tributed Systems 6 (9) (1995) 905–914.

[17] U. Bonollo, E.A. Sonenberg, Deadlock-free information struc-

ture distributedmutual exclusion algorithms, Australian Com-

puter Science Communications (ACSC’96) 18 (1) (1996)

234–241.

[18] P.A. Bernstein, D.W. Shipman, W.S. Wong, Formal aspects

of serializability in database concurrency control, IEEE

Transactions on Software Engineering SE 5 (3) (1979)

203–216.

[19] Y.I. Chang, A simulation study on distributed mutual exclu-

sion, Journal of Parallel Distributed Computing 33 (1996)

107–121.

[20] S.Y. Cheung, M. Ahmad, M.H. Ammar, Optimizing vote and

quorum assignments for reading and writing replicated data,

IEEE Transactions on Knowledge and Data Engineering 1

(3) (1989) 387–397.

[21] S.Y. Cheung, M.H. Ammar, M. Ahamad, The grid protocol:

a high performance scheme for maintaining replicated data,

IEEE Transactions on Knowledge and Data Engineering 4

(6) (1992) 582–592.

[22] Y.I. Chang, B.H. Chen, A generalized grid quorum strategy

for k-mutual exclusion in distributed systems, Information

Processing Letters 80 (4) (2001, Nov. 30) 205–212.

[23] G. Coulouris, J. Dollimore, T. Kindberg, Distributed Systems:

Concepts and Design, 2nd ed., AddisonWesley, Harlow, Eng-

land, 1994.

[24] O.S.F. Carvalho, C. Roucairol, On the mutual exclusion in

computer networks, Communications of the ACM 26 (2)

(1983) 146–147.

[25] G. Cao, M. Singhal, Y. Deng, N. Rishe, W. Sun, A delay-

optimal quorum-based mutual exclusion scheme with fault-

tolerance capability, Proceedings of the 18th International

Conference on Distributed Computing Systems (ICDCS’98),

IEEE Computer Society, Amsterdam, 1998, pp. 444–451.

[26] Y.I. Chang, M. Singhal, M.T. Liu, An improved o(log(n))

mutual exclusion algorithm for distributed systems, Interna-

tional Conference on Parallel Processing (1990) 295–302.

[27] E.H. Choi, T. Tsuchiya, T. Kikuno, On the availability of k-

coteries in networks with unreliable nodes and links, Pro-

ceedings of the Fourth International Workshop on Object-

Oriented Real-Time Dependable Systems, Jan. 1999, IEEE,

Santa Barbara, California, 1999, (January), pp. 148–155.

[28] E.H. Choi, T. Tsuchiya, T. Kikuno, New constructions for

nondominated k-coteries, IEICE Transactions on Information

and Systems E83D (7) (2000, Jul.) 1526–1532.

[29] E.W. Dijkstra, Co-operating sequential processes, in: F.

Genuys (Ed.), Programming Languages, Academic Press,

New York, 1965, pp. 43–112.

[30] S.B. Davidson, H. Garcia-Molina, D. Skeen, Consistency in

partitioned networks, Computing Surveys 17 (3) (1985)

341–369.

[31] K. Diks, E. Kranakis, D. Krizanc, B. Mans, A. Pelc, Optimal

coteries and voting schemes, Information Processing Letters

51 (1994) 1–6.

[32] M.A. Eisenberg, M.R. McGuire, Further comments on Dijk-

stra’s concurrent programming control problem, Communi-

cations of the ACM 15 (11) (1972) 999.

[33] A.W. Fu, Delay optimal quorum consensus for distributed

systems, IEEE Transactions on Parallel and Distributed Sys-

tems 8 (1) (1997) 59–69.

[34] S. Fujita, A quorum based k-mutual exclusion by weighted k-

quorum systems, Information Processing Letters 67 (4)

(1998, Aug. 31) 191–197.

[35] A.W. Fu, M.H. Wong, T.W. Lau, G.F. Ng, Cost Optimal

Coteries. Technical Report CS-TR-94-07, Chinese Univer-

sity of Hong Kong, 1994.

[36] D.K. Gifford, Weighted voting for replicated data, Proceed-

ings of 7th Symposium on Operating Systems, ACM, New

York, 1979, pp. 150–162.

[37] S. Gupta, Delay optimal mutual exclusion algorithms in dis-

tributed systems. PhD Thesis, School of Computer and Sys-

tems Sciences, Jawahar Lal Nehru University, New Delhi,

1998.

[38] H. Garcia-Molina, D. Barbara, How to assign votes in a

distributed system, Journal for the Association for Comput-

ing Machinery 32 (4) (1985) 841–860.

[39] M. Helary, A. Mostefaoui, M. Raynal, A general scheme for

token and tree based distributed mutual exclusion algorithms,

IEEE Transactions on Parallel and Distributed Systems 5

(11) (1994) 1185–1196.

[40] M. Helary, N. Plouzeau, M. Raynal, A distributed algorithm

for mutual exclusion in an arbitrary network, The Computer

Journal 31 (4) (1988) 289–295.

[41] T. Harada, M. Yamashita, Non-dominated Coteries on

Graphs, IEEE Transactions on Parallel and Distributed Sys-

tems 8 (6) (1997) 667–672.

[42] T. Harada, M. Yamashita, Coterie join operation and tree

structured k-coteries, IEEE Transactions on Parallel and Dis-

tributed Systems 12 (9) (2001, September) 865–874.

[43] T. Ibaraki, T. Kameda, A theory of coteries: mutual exclusion

in distributed systems, IEEE Transactions on Parallel and

Distributed Systems 4 (7) (1991) 779–793.

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 179

[44] T. Ibaraki, H. Nagamochi, T. Kameda, Optimal coteries for

rings and related networks, Distributed Computing 8 (1995)

191–201.

[45] J.-R. Jiang, S.-T. Huang, Y.-C. Kuo, Cohorts structures for

fault-tolerant k entries to a critical section, IEEE Transactions

on Computers 46 (2) (1997, Feb.) 222–228.

[46] S. Jajodia, D. Mutchler, Dynamic voting algorithms for main-

taining the consistency of a replicated data, ACM Transac-

tions on Database Systems 15 (2) (1990) 230–280.

[47] D.E. Knuth, Additional comments on a problem in concur-

rent programming control, Communications of the ACM 9

(5) (1966) 321–322.

[48] Y.C. Kuo, k-arbiter join operation, Journal of Information

Science and Engineering 17 (4) (2001, Jul.) 615–631.

[49] Y.-C. Kuo, Composite k-arbiters, IEEE Transactions on Par-

allel and Distributed Systems 12 (11) (2001, November)

1134–1145.

[50] H. Kakugawa, S. Fujita, M. Yamashita, T. Ae, Availability of

k-coterie, IEEE Transactions on Computers 42 (5) (1993,

May) 553–558.

[51] H. Kakugawa, S. Fujita, M. Yamashita, T. Ae, A distributed

k-mutual exclusion algorithm using k-coterie, Information

Processing Letters 49 (1994) 213–218.

[52] Y. Kuo, S. Huang, A simple scheme to construct k-coteries

with OðffiffiffiffiN

pÞ uniform quorum sizes, Information Processing

Letters 59 (1996) 31–36.

[53] Y. Kuo, S. Huang, A geometric approach for constructing

coteries and k-coteries, IEEE Transactions on Parallel and

Distributed Systems 8 (4) (1997, April) 402–411.

[54] Y. Kuo, S. Huang, Recognizing ND-coteries and wr-coteries

by availability, IEEE Transactions on Parallel and Distrib-

uted Systems 9 (8) (1998) 721–728.

[55] N. Kobayashi, T. Tsuchiya, T. Kikuno, Minimizing the mean

delay of quorum-based mutual exclusion schemes, The Jour-

nal of Systems and Software 58 (2001) 1–9.

[56] L. Lamport, Time, clocks and the ordering of events in a

distributed computing, Communications of the ACM 21 (7)

(1978) 558–565.

[57] L. Lamport, R. Shostak, M. Paese, Byzantine generals prob-

lem, ACM Transactions on Programming Languages and

Systems 4 (3) (1982) 382–401.

[58] W. Luk, T. Wong, Two new quorum based algorithms for

distributed mutual exclusion, Proceedings of the 17th Inter-

national Conference on Distributed Computing Systems,

ICDCS’ 97, Baltimore, MD, USA, IEEE (1997, May),

pp. 100–106.

[59] M. Maekawa, A MN algorithm for mutual exclusion in de-

centralized systems, ACM Transactions on Computer Sys-

tems 3 (2) (1985) 145–159.

[60] S. Mullender, Distributed Systems, 2nd ed., Addison-Wesley,

Harlow, England, 1993.

[61] Y. Manabe, R. Baldoni, M. Raynal, S. Aoyagi, k-Arbiter: a

safe and general scheme for h-out of-kmutual exclusion, The-

oretical Computer Science 193 (1–2) (1998, Feb.) 97–112.

[62] M.L. Neilsen, Properties of nondominated k-coteries, Journal

of Systems and Software 37 (1) (1997, Apr.) 91–96.

[63] L.N. Neilson, M. Mizuno, A DAG based algorithm for dis-

tributed mutual exclusion, International Conference on Dis-

tributed Computer Systems (1991) 354–360.

[64] L.N. Neilson, M. Mizuno, Coteries join algorithm, IEEE

Transactions on Parallel and Distributed Systems 3 (5)

(1992) 582–590.

[65] M.L. Neilsen, M. Mizuno, Nondominated k-coteries for mul-

tiple mutual exclusion, Information Processing Letters 50

(1994) 247–252.

[66] M. Naimi, M. Trehel, A. Arnold, A log(n) distributed mutual

exclusion algorithm based on path reversal, Journal of Paral-

lel and Distributed Computing 34 (1996) 1–13.

[67] S. Ohring, S.K. Das, Folded Petersen cube networks: new

competitors for the hypercubes, IEEE Transactions on Paral-

lel and Distributed Systems 7 (2) (1996) 151–168.

[68] G.L. Peterson, Myths about the mutual exclusion problem,

Information Processing Letters 12 (3) (1981) 115–116.

[69] D. Peleg, A. Wool, The availability of quorum systems, In-

formation and Computation 123 (1995) 210–223.

[70] M. Raynal, Algorithms for Mutual Exclusion, North Oxford

Academic, London, 1986.

[71] K. Raymond, A distributed algorithm for multiple entries to a

critical section, Information Processing Letters 30 (1989)

189–193.

[72] K. Raymond, A tree based algorithm for distributed mutual

exclusion algorithms, ACM Transactions on Computer Sys-

tems 7 (1) (1989) 61–77.

[73] M. Raynal, A simple taxonomy for distributed mutual ex-

clusion algorithms, ACM Operating Systems Review 23 (2)

(1991) 47–51.

[74] G. Ricart, A.K. Agrawala, An optimal algorithm for mutual

exclusion in computer networks, Communications of the

ACM 24 (1) (1981) 9–17.

[75] M. Rabinowich, E.D. Lazowska, Improving fault-tolerance

and supporting partial writes in structured coterie protocols

for replicated objects, Proceedings ACM SIGMOD (1992)

226–235.

[76] S. Rangarajan, S. Setia, S.K. Tripathi, A fault-tolerant algo-

rithm for replicated data management, IEEE Transactions on

Parallel and Distributed Systems 6 (12) (1995) 1271–1282.

[77] B.A. Sanders, The information structure of distributed mu-

tual exclusion algorithms, ACM Transactions on Computer

Systems 5 (3) (1987) 284–299.

[78] M. Singhal, A heuristically aided algorithm for mutual ex-

clusion in distributed systems, IEEE Transactions on Com-

puters 38 (8) (1989) 651–661.

[79] M. Singhal, A class of deadlock-free Maekawa-type algo-

rithms for mutual exclusion in distributed systems, Distrib-

uted Computing 4 (1991) 131–138.

[80] M. Singhal, A dynamic information-structure mutual exclu-

sion algorithm for distributed systems, IEEE Transactions on

Parallel and Distributed Systems 3 (1) (1992) 121–125.

[81] M. Singhal, A taxonomy of distributed mutual exclusion,

Journal of Parallel and Distributed Computing 18 (1993)

94–101.

[82] S.H. Son, Synchronization of replicated data in distributed

systems, Information Systems 12 (20) (1987) 191–202.

[83] M. Spasojevic, P. Berman, Voting as the optimal static pes-

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181180

simistic scheme for managing replicated data, IEEE Trans-

actions on Parallel and Distributed Systems 5 (1) (1994)

64–73.

[84] P.C. Saxena, S. Gupta, A performance comparison of token

and tree based mutual exclusion algorithms on chordal rings,

Journal of Networks and Computer Applications 21 (1998)

187–201.

[85] P.C. Saxena, S. Gupta, A token-based delay optimal algo-

rithm for mutual exclusion in distributed systems, Computer

Standards and Interfaces 21 (1999) 33–50.

[86] A. Silberschatz, P. Galvin, Operating System Concepts, 5th

ed., Addison-Wesley, Harlow, England, 1998.

[87] P.C. Saxena, S. Gupta, J. Rai, Delay optimal coteries on

regular symmetric networks, Computer Standards and Inter-

faces 23 (2001) 341–353.

[88] P.C. Saxena, S. Gupta, J. Rai, A delay optimal coterie on the

k-dimensional folded Petersen Graph. Appearing.

[89] I. Suzuki, T. Kasami, A distributed mutual exclusion algo-

rithm, ACM Transactions on Computer Systems 3 (4) (1985)

344–349.

[90] P.K. Srimani, R.L.N. Reddy, Another distributed algorithm

for multiple entries to a critical section, Information Process-

ing Letters 41 (1992) 51–57.

[91] P.C. Saxena, J. Rai, Communication Cost of k-coteries, sub-

mitted for publication.

[92] P.C. Saxena, J. Rai, Cross Product of Coteries, submitted for

publication.

[93] R. Schlichting, F. Schneider, Fail-stop processors: an ap-

proach to designing fault-tolerant computing systems,

ACM Transactions on Computer Systems 1 (3) (1983, Au-

gust) 222–238.

[94] T.H. Thomas, A majority consensus approach to concurrency

control, ACM Transactions on Database Systems 4 (2)

(1979) 180–209.

[95] Z. Tong, R.Y. Kain, Vote assignments in weighted voting

mechanisms, Proceedings of 7th IEEE Symposium on Reli-

able Distributed Systems, SRDS’ 88, Ohio State University,

Columbus, Ohio, USA, IEEE Computer Society Press, 1988,

pp. 138–143.

[96] T. Tsuchiya, T. Kikuno, Availability evaluation of quorum-

based mutual exclusion schemes in general topology net-

works, The Computer Journal 42 (7) (1999) 613–622.

[97] T. Tsuchiya, M. Yamaguchi, T. Kikuno, Minimizing the max-

imum delay for reaching consensus in quorum-based mutual

exclusion, IEEE Transactions on Parallel and Distributed

Systems 10 (4) (1999) 337–345.

[98] S.-M. Yuan, H.-K. Chang, Comments on availability of k-

coterie, IEEE Transactions on Computers 43 (12) (1994)

1457.

[99] M. Yamashita, T. Kameda, Modeling k-coteries by well-cov-

ered graphs, Networks 34 (3) (1999, Oct.) 221–228.

[100] Y. Yan, X. Zhang, H. Yang, A fast token chasing mutual

exclusion algorithm in arbitrary network topologies, Journal

of Parallel and Distributed Computing 35 (1996) 156–172.

Prof. P.C. Saxena is a professor of com-

puter science in the School of Computer

and Systems Sciences, Jawaharlal Nehru

University, NewDelhi, India. He has super-

vised nine PhDs in the area of Database

Management, Networking andMultimedia.

He has guided 55 M.Tech. Dissertations.

His research interests include DBMS, Data

Communication, Networking, Distributed

Computing and Multimedia.

Dr. Jagmohan Rai is a Reader in the

Department of Mathematics, P.G.D.A.V.

College (Eve), University of Delhi, New

Delhi, India. He received MSc degree in

Mathematics in 1982 and M.Phil degree in

Mathematics (specialization: Operator

Theory) in 1985, both from university of

Delhi. He has recently completed his PhD

in Computer Science from the School of

Computer and Systems Sciences, Jawahar-

lal Nehru University, New Delhi, India.

His research interests include Distributed Mutual Exclusion, Net-

works, Distributed Computing and Multimedia.

P.C. Saxena, J. Rai / Computer Standards & Interfaces 25 (2003) 159–181 181