Fault Tolerant Distributed Systems A Survey by Nirmit Desai.

Fault Tolerant Distributed Systems

A Survey

by

Nirmit Desai

10/17/2002 Fault tolerant DS - a survey 2

The track

Review the approaches to fault tolerance (coupled with mutex algorithms in some cases, 1 - 3), Get an idea of the terminology and field

#1: Revannaswamy, Bhatt - ‘97

#2: Chang, Singhal, Liu - ‘90

#3: Helary, Mostefaoui - ‘94

#4: Stumm, Zhou - ‘90

#5: Nett, Mock, Theisohn - ‘97

#6: Ballarini, Bernardi, Donatelli - ‘02

#7: Hardekopf, Kwiat, Upadhyaya - ‘01

#8: Injong Rhee - ‘95

#9: Goyer, Momtahan, Selic - ‘90

#10: Frank Mueller - ‘01

Summary

Questions


#1: Revannaswamy, Bhatt - ‘97

O(log N), N = Number of nodes in the spanning tree

Communicate with neighbours only (Raymonds algorithm)

Tolerant to single link / single node failures

Eliminate the failed component and obtain different tree structure (network is biconnected)

Mechanism to detect the failures is assumed to exist

Only “Branches” are used for message exchanges


#1 Contd - Reconfiguaration

Repeat the steps below till all connected OR insufficient chords

At the end of reconfiguration, all data-structures are reset and algorithm is restarted, older token holder is pre-empted and token is transferred to a newly elected leader


#2: Chang, Singhal, Liu - ‘90

1. O((2+K) * logN), Each node has K alternate paths to token node

2. Works with DAG, no cycles should ever develop

3. FIFO violation for greedy strategy


#2 Contd. - Fault Tolerance

Mechanism to detect the failures is assumed to exist

Single link / single node failure and recovery

Any path constructed by taking outgoing edges always leads to the token holder

Studies all the states of the system at which the failure may occur and gives solutions to each of the states for tolerance

At recovery, the node asks its neighbours about its state in order to reconstruct the data structures

Fairness compromized at recovery - queue order is arbitrary


#3: Helary, Mostefaoui - ‘94

Worst case O(lg n) for mutex, average O(lg n) for fault-tolerance

Fairness compromized at recovery - queue order is arbitrary

Best of both the worlds: Naimi (dynamic), Raymond(static)

Node topology preserves binomial tree (open cube) structure

Assumes finite message delay for detection of node failure


#3 Contd. - Fault Tolerance

Handles multiple and simultaneous node failures, no link failures

Due to bounded message delays and estimated critical section lengths, a node can suspect a failure by timeouts.

Enquire appropriate nodes about their state and deduce whether the suspicion is correct.

In case of failure, some data structures are updated, tree is reconfigured


#4: Stumm, Zhou

Extension of coherent DSM algorithms for fault tolerance

Tolerance to single host failures, no link failures

Availability of broadcast (multicast) communication is assumed

Communication cost assumed to be larger than memory access costs

Critical state (DSM data, state info) replicated to multiple hosts

Central server algo: central server manages data (request/response) Backup the central server state, fault detection by timeouts

Full replication algo: all data on each host, local reads, sequenced writes (by sequencers), simple recovery by obtaining latest copy from other hosts


#4 Contd.

Migration algo: Data always migrated to the accessing host, can be integrated with the VMM of the host OS, maintain sequential consistency

Read replication algo: Similar to migration but no migration for read accesses

Comparisons: Fault tolerant versions of central server and full replication

do not introdue a substantial overhead Central server is better for infrequent shared data accesses Full replication is better for sparse write accesses


#5: Nett, Mock, Theisohn

Specification and realization of a generalized distributed dependency management system for fault tolerance

Key observation: Management of dynamically evolving dependencies is a key problem for fault tolerance

Dependency graph: nodes as sites, edges as dependencies

Dependency set: set of nodes reachable from start nodes

Traversal of dependency set yields solution: at each node The node is added to the dep. set ? Search continues ? Which dep. types to follow now ? In which direction ?


#5 Contd - Example

Stability: completed actions have stable dependenciesThe proposed system supports only applications that have stable dependency statesDoes our application conform to this requirement ? Next slide


#5 Contd - Language

Generate a state machine with this language and, verify for stability


#5 Contd - Example

Send messages and generate global dependency graph to look for inconsistencies


#6: Ballarini, Bernardi, Donatelli

Formal and theoretical discussion of synchronization problem in distributed systems


#7: Hardekopf, Kwiat, Upadhyaya

Security and fault tolerance for aerospace distributed communications by means of voting mechanism

Basic voting protocol: Initiator sends a request to one of the voters The voter multicasts a request to other voters Voters decide locally and send their vote to the client client waits for f+1 alike votes where f is the degree of fault

tolerance

Problems: not scalable, assumes exact voting

Proposes a model of set of voters, interface module and user where majority of voters are assumed to be always correct !!


#7 Contd - The algorithm

Voters are authenticated when they commitO(1) messages with any number of votersHashing and digital signatures for WAN

Each voter will follow these steps If no other voter has committed an answer, commit self-answer and skip If some voter has committed and the result conflict, then commit self-answer After everyone had a chance, determine the majority result If majority is in conflict, then go to step 1.

Interface module will receive first commit, buffer it and start the timer. If new commit is received old is over-written and timer is restarted. Otherwise the buffered result is output to the client


#8: Injong Rhee

Fault tolerance in distributed resource allocation by locality

Failure locality = max. number of processes dead due to failure Depends on degree of distributedness of algorithm

Tight lower bound on failure locality is where, is the maximum number of conflicting processesDining philosopher problemhas the optimal failure locality

2

Proposed algorithm breaks the chain by not allowing a process to wait for a process which is also blocked, achieves 2


#8 Contd - Model

Each process is modeled in steps: send, receive, local, fail and local state

Guarded command set: [receive -> Action,...]

Configuration C is a set of current local

Network is a process too

Enabling and execution of guarded sommand set

System is initial configuration C0 and execution of guarded command set of all the processes

No constraints on relative timing of execution of process steps

Priorities by logical clocks (with waiting factor)

Centralized resource managers manage exclusion by responding to the requests for resource (with preemption)


#9: Goyer, Momtahan, Selic

Hierarchical control strategy for fault tolerance (centralized app)

Simplicity, efficiency, predictability

Transient and hard failures - recovers or doesn’t recover ?

Soft QoS guarantees in terms of response times


#10: Frank Mueller

Extension of Naimi’s protocol - single node failures

Backup communication in form of ring structure

Fault handler node: Find out the faulty node by timeout, collect fault information (from 2 nodes) and take appropriate action


Some ideas - for our protocol

The first node experiencing timeout may send a message to every other node asking for their states

The node experiencing timeout must be: waiting for a grant/token to arrive OR waiting for a release to arrive

In case of grant/token, propagate the message to the parent chain to detect the pooint of failure

In case of release, send the message down the chain of children to see who is supposed to send a release

Once detected the point, recalculate the state of the affected nodes - the ones falling on the chain

Assuming no link failures for now, multiple failures can be detected


SummaryApproaches fall into following categories: Embedded fault tolerance into the base algorithms - inspiring,

concrete, but hard to import Generalized approaches - demonstrate strength of applicability

but no concrete algorithmic description Voting approaches - can be used to reach consensus, useful in

case of multiple failures Centralized approach - simple, predictable but may be not

desirable Replication based approaches - simple, incur overhead,

sometimes the only solution (esp for complex algorithms)

No prominent solution for detecting failure - even for very specific problems/algorithms


Questions

??

Fault Tolerant Distributed Systems A Survey by Nirmit Desai.

Documents

Transcript of Fault Tolerant Distributed Systems A Survey by Nirmit Desai.