Fundamentals

Post on 10-Sep-2014

55 views 1 download

Tags:

Transcript of Fundamentals

CS60002:Distributed Systems

Textbook etc.

No one textbookWill follow for some time

“Advanced Concepts in Operating Systems”by Mukesh Singhal and Niranjan G. Shivaratri

supplemented by copies of papers

Will give materials from other books, papers etc. from time to time

Introduction

Distributed System

A broad definitionA set of autonomous processes that

communicate among themselves to perform some task

Modes of communicationMessage passingShared memory

Includes single machine with multiple communicating processes also

A more common definitionA network of autonomous computers that

communicate by message passing to perform some task

A practical distributed system may have bothComputers that communicate by messagesProcesses/threads on a computer that communicate by messages or shared memory

Advantages

Resource SharingHigher throughputHandle inherent distribution in problem structureFault ToleranceScalability

Representing Distributed Systems

Graph representationNodes = processesEdges = communication linksLinks can be bidirectional (undirected graph) or unidirectional (directed graph)Links can have weights to represent different things (Ex. delay, length, bandwidth,…)Links in the graph may or may not correspond with physical links

Why are They Harder to Design?

Lack of global shared memoryHard to find the global system state at any point

Lack of global clockEvents cannot be started at the same timeEvents cannot be ordered in time easily

Hard to verify and proveArbitrary interleaving of actions makes the system hard to verify Same problem is there for multi-process programs on a single machineHarder here due to communication delays

Example: Lack of Global Memory

Problem of Distributed SearchA set of elements distributed across multiple machinesA query comes at any one machine A for an element XNeed to search for X in the whole system

Sequential algorithm is very simpleSearch and update done on a single array in a single machineNo. of elements also known in a single variable

A distributed algorithm has more hurdles to solveHow to send the query to all other machines?Do all machines even know all other machines?How to get back the result of the search in each m/c?Handling updates (both add/delete of elements at a machine and add/remove of machines) – adds more complexity

Main problemNo one place (global memory) that a machine can look

up to see the current system state (what machines, what elements, how many elements)

Example: Lack of Global Clock

Problem of Distributed Replication3 machines A, B, C have copies of a data X, say initialized to 1Query/Updates can happen in any m/cNeed to make the copies consistent within short time in case of update at any one machineNaïve algorithm

On an update, a machine sends the updated value to the other replicasA replica, on receiving an update, applies it

3 3

2X=2

X=2

2 3

2X=2

3 3

3

X=33 1

1X=3

2 3

2X=2

Node accepts X=2

2 2

2

1 1

2X=2

X=2

2 1

2X=2

3 3

3 X=2

X=33 1

2X=2

X=3

3 3

3X=2

What should this node do now?Reject X=2, right?

But it has received exactly thesame messages in the same order

But then, consider the following scenario

Could be easily solved if all nodes had a synchronized global clock

Models for Distributed Algorithms

Informally, guarantees that one can assume the underlying system will (or will not!) give

Topology : completely connected, ring, tree, arbitrary,…

Communication : shared memory/message passing (Reliable? Delay? FIFO? Broadcast/multicast?…)Synchronous/asynchronousFailure possible or not

What all can fail?Failure models (crash, omission, Byzantine, timing,…)

Unique IdsOther Knowledge : no. of nodes, diameter

Less assumptions => weaker modelA distributed algorithm needs to specify the model on which it is supposed to workThe model may not match the underlying physical system always

Physical System

Gap between assumption and system available

Model assumed

Need to implementwith h/w-s/w

Complexity Measures

Message complexity : total no. of messages sentCommunication complexity/Bit Complexity : total no. of bits sentTime complexity : For synchronous systems, no. of rounds. For asynchronous systems, different definitions are thereSpace complexity : total no. of bits needed for storage at all the nodes

Example: Distributed Search Again

Assume that all elements are distinctNetwork represented by graph G with n nodes and m edges

Model 1Asynchronous, completely connected topology,

reliable communicationAlgorithm:

Send query to all neighborsWait for reply from all, or till one node says FoundA node, on receiving a query for X, does local search for X and replies Found/Not found.

Worst case message complexity = 2(n – 1) per query

Model 2Asynchronous, completely connected topology,

unreliable communicationAlgorithm:

Send query to all neighborsWait for reply from all, or till one node says FoundA node, on receiving a query for X, does local search for X and replies Found/Not found.If no reply within some time, send query again

Problems!How long to wait for? No bound on message delay!Message can be lost again and again, so this still does not solve the problem.In fact, impossible to solve (may not terminate)!!

Model 3Synchronous, completely connected topology, reliable

communication

Maximum one-way message delay = αMaximum search time at each m/c = βAlgorithm:

Send query to all neighborsWait for reply from all for T = 2α + β, or till one node says FoundA node, on receiving a query for X, does local search for X and replies Found if found, does not reply if not foundIf no reply received within T, return “Not found”Message complexity = n -1 if not found, n if foundMessage complexity reduced, possibly at the cost of more time

Model 4Asynchronous, reliable communication, but not

completely connected

How to send the query to all?Algorithm (first attempt):

Querying node A sends query for X to all its neighborsAny other node, on receiving query for X, first searches for X. If found, send back “Found” to A. If not, send back “Not found” to A, and also forward the query to all its neighbors other than the one it received from (flooding)Eventually all nodes get it and replyMessage complexity – O(nm) (why?)

But are we done?Suppose X is not there. A gets many “Not found”messages. How does it know if all nodes have replied? (Termination Detection)

Lets change (strengthen) the modelSuppose A knows n, the total number of nodes

A can now count the number of messages received. Termination if at least one “Found” message, or n “Not found” messagesMessage complexity – O(nm)

Suppose A knows upper bound on network diameter and synchronous system

Can be done with O(m) messages only

Can you do it without changing the model?

So Which Model to Choose?Ideally, as close to the physical system available as possible

The algorithm can directly run on the systemShould be implementable on the physical system by additional h/w-s/w

Ex., reliable communication (say TCP) over an unreliable physical system

Sometimes, start with a strong model, then weaken it

Easier to design algorithms on a stronger model (more guarantees from the system)Helps in understanding the behavior of the systemCan use this knowledge to then design algorithms on the weaker model

Some Fundamental Problems

Ordering events in the absence of a global clockCapturing the global stateMutual exclusionLeader electionClock synchronizationTermination detectionBuilding structures

Spanning treeShortest path tree…

Ordering of Events and Logical Clocks

Ordering of Events

Lamport’s Happened Before relationship:

For two events a and b, a → b (a happened beforeb) if

a and b are events in the same process and a occurred before ba is a send event of a message m and b is the corresponding receive event at the destination processa → c and c → b for some event c

a → b implies a is a a potential cause of bCausal ordering : potential dependencies“Happened Before” relationship causally orders events• If a → b, then a causally affects b• If a → b and b → a, then a and b are concurrent

( a || b)

Logical Clock

Each process i keeps a clock Ci

Each event a in i is timestamped C(a), the value of Ci when a occurredCi is incremented by 1 for each event in iIn addition, if a is a send of message m from process i to j, then on receive of m,

Cj = max(Cj, C(a)+1)

Points to note:

• Increment amount can be any positive number no necessarily 1

• if a → b, then C(a) < C(b)

• → is an irreflexive partial order

• Total ordering possible by arbitrarily ordering concurrent events by process numbers (assuming process numbers are unique)

Limitation of Lamport’s Clock

a → b implies C(a) < C(b)

BUT

C(a) < C(b) doesn’t imply a → b !!

So not a true clock !!

Solution: Vector Clocks

Ci is a vector of size n (no. of processes)C(a) is similarly a vector of size nUpdate rules:• Ci[i]++ for every event at process i• if a is send of message m from i to j with vector

timestamp tm, on receive of m:Cj[k] = max(Cj[k], tm[k]) for all k

For events a and b with vector timestamps ta and tb,• ta = tb iff for all i, ta[i] = tb[i]

• ta ≠ tb iff for some i, ta[i] ≠ tb[i]

• ta ≤ tb iff for all i, ta[i] ≤ tb[i]

• ta < tb iff (ta ≤ tb and ta ≠ tb)

• ta || tb iff (ta < tb and tb < ta)

a → b iff ta < tb

Events a and b are causally related iff ta < tb or tb < ta, else they are concurrent

Causal ordering of messages: Application of vector clocks

Delivery in Causal Order:If send(m1)→ send(m2), then every recipient of both message m1 and m2 must “deliver” m1 before m2

“deliver” – when the message is actually given to the application for processing

Birman-Schiper-StephensonProtocol

To broadcast m from process i, increment Ci(i), and timestamp m with VTm = Ci[i]When j ≠ i receives m, j delays delivery of m until

Cj[i] = VTm[i] –1 andCj[k] ≥ VTm[k] for all k ≠ iDelayed messaged are queued in j sorted by vector time. Concurrent messages are sorted by receive time.

When m is delivered at j, Cj is updated according to vector clock rule

Problem of Vector Clock

Message size increases since each message needs to be tagged with the vector

Size can be reduced in some cases by only sending values that have changed

Capturing Global State

Global State Collection

Applications: Checking “stable” properties, checkpoint & recovery,…

Issues:Need to collect both node and channel statesSystem cannot be stoppedNo global clock

But what is global state??

Some Notations

LSi : local state of process isend(mij) : send event of message mij from process i to process jrec(mij) : similar, receive instead of sendtime(x) : time at which state x was recordedtime (send(m)) : time at which send(m) occured

send(mij) є LSi ifftime(send(mij)) < time(LSi)

rec(mij) є LSj ifftime(rec(mij)) < time(LSj)

transit(LSi,LSj) = { mij | send(mij) є LSi and rec(mij) єLSj}

inconsistent(LSi, LSj) = {mij | send(mij) є LSi and rec(mij) є LSj}

Global state: collection of local statesGS = {LS1, LS2,…, LSn}

GS is consistent ifffor all i, j, 1 ≤ i, j ≤ n,

inconsistent(LSi, LSj) = Ф

GS is transitless ifffor all i, j, 1 ≤ i, j ≤ n,

transit(LSi, LSj) = Ф

GS is strongly consistent if it is consistent and transitless.Note that channel state may be specified explicitly in a global state, or implicitly in node states using transit()

Chandy-Lamport’s Algorithm

Uses special marker messages

One process acts as initiator, starts the state collection by following the marker sending rule below

Marker sending rule for process P:P records its state; then for each outgoing channel C from P on which a marker has not been sent already, P sends a marker along C before any further message is sent on C

When Q receives a marker along a channel C:

If Q has not recorded its state then Q records the state of C as empty; Q then follows the marker sending rule

If Q has already recorded its state, it records the state of C as the sequence of messages received along C after Q’s state was recorded and before Q received the marker along C

Points to Note

Markers sent on a channel distinguish messages sent on the channel before the sender recorded its states and the messages sent after the sender recorded its stateThe state collected may not be any state that actually happened in reality, rather a state that “could have” happenedRequires FIFO channelsNetwork should be strongly connected (works obviously for connected, undirected also)Message complexity O(|E|), where E = no. of links

Lai and Young’s Algorithm

Similar to Chandy-Lamport’s, but does not require FIFOBoolean value X at each node, False indicates state is not recorded yet, True indicates recordedValue of X piggybacked with every application messageValue of X distinguishes pre-snapshot and post-snapshot messages, similar to the MarkerRequires log of all messages sent before the state is recorded

Mutual Exclusion

Mutual Exclusion

Very well-understood in shared memory systems

Requirements:at most one process in critical section (safety)if more than one requesting process, someone enters (liveness)a requesting process enters within a finite time (no starvation)requests are granted in order (fairness)

Classification of Distributed Mutual Exclusion Algorithms

Non-token based/Permission basedNode takes permission from all/subset of other nodes before entering critical sectionPermission from all processes: e.g. Lamport, Ricart-Agarwala, Raicourol-Carvalho etc.Permission from a subset: ex. Maekawa

Token basedSingle token in the systemNode enters critical section if it has the tokenAlgorithms differ in how the token is circulatedex. Suzuki-Kasami

Some Complexity Measures

No. of messages/critical section entrySynchronization delayResponse timeThroughput

Lamport’s Algorithm

Every node i has a request queue qi, keeps requests sorted by logical timestamps (total ordering enforced by including process id in the timestamps) To request critical section:

send timestamped REQUEST (tsi, i) to all other nodesput (tsi, i) in its own queue

On receiving a request (tsi, i):send timestamped REPLY to the requesting node i put request (tsi, i) in the queue

To enter critical section:i enters critical section if (tsi, i) is at the top if its own queue, and i has received a message (any message) with timestamp larger than (tsi, i) from ALL other nodes.

To release critical section:i removes its request from its own queue and sends a timestamped RELEASE message to all other nodesOn receiving a RELEASE message from i, i’srequest is removed from the local request queue

Some points to note

Purpose of REPLY messages from node i to j is to ensure that j knows of all requests of i prior to sending the REPLY (and therefore, possibly any request of i with timestamp lower than j’s request)Requires FIFO channels. 3(n – 1 ) messages per critical section invocationSynchronization delay = max. message transmission timeRequests are granted in order of increasing timestamps

Ricart-Agarwala AlgorithmImprovement over Lamport’sMain Idea:

node j need not send a REPLY to node i if j has a request with timestamp lower than the request of i (since i cannot enter before j anyway in this case)

Does not require FIFO2(n – 1) messages per critical section invocationSynchronization delay = max. message transmission timerequests granted in order of increasing timestamps

To request critical section:send timestamped REQUEST message (tsi, i)

On receiving request (tsi, i) at j:send REPLY to i if j is neither requesting nor executing critical section or if j is requesting and i’s request timestamp is smaller than j’s request timestamp. Otherwise, defer the request.

To enter critical section:i enters critical section on receiving REPLY from all nodes

To release critical section:send REPLY to all deferred requests

Roucairol-Carvalho Algorithm

Improvement over Ricart-AgarwalaMain idea

once i has received a REPLY from j, it does not need to send a REQUEST to j again unless it sends a REPLY to j (in response to a REQUEST from j)no. of messages required varies between 0 and 2(n – 1) depending on request patternworst case message complexity still the same

Maekawa’s Algorithm

Permission obtained from only a subset of other processes, called the Request Set (or Quorum)Separate Request Set Ri for each process iRequirements:

for all i, j: Ri ∩ Rj ≠ Φfor all i: i Є Rifor all i: |Ri| = K, for some Kany node i is contained in exactly D Request Sets, for some D

K = D = sqrt(N) for Maekawa’s

A simple version

To request critical section:i sends REQUEST message to all process in Ri

On receiving a REQUEST message:send a REPLY message if no REPLY message has been sent since the last RELEASE message is received. Update status to indicate that a REPLY has been sent. Otherwise, queue up the REQUEST

To enter critical section:i enters critical section after receiving REPLY from all nodes in Ri

To release critical section:send RELEASE message to all nodes in Ri

On receiving a RELEASE message, send REPLY to next node in queue and delete the node from the queue. If queue is empty, update status to indicate no REPLY message has been sent.

Message Complexity: 3*sqrt(N)Synchronization delay =

2 *(max message transmission time)

Major problem: DEADLOCK possible

Need three more types of messages (FAILED, INQUIRE, YIELD) to handle deadlock. Message complexity can be 5*sqrt(N)

Building the request sets?

Token based Algorithms

Single token circulates, enter CS when token is presentNo FIFO requiredMutual exclusion obviousAlgorithms differ in how to find and get the tokenUses sequence numbers rather than timestamps to differentiate between old and current requests

Suzuki Kasami Algorithm

Broadcast a request for the tokenProcess with the token sends it to the requestor if it does not need it

Issues:

Current vs. outdated requestsdetermining sites with pending requestsdeciding which site to give the token to

The token:Queue (FIFO) Q of requesting processesLN[1..n] : sequence number of request that j executed most recently

The request message:REQUEST(i, k): request message from node i for its kth critical section execution

Other data structuresRNi[1..n] for each node i, where RNi[j] is the largest sequence number received so far by i in a REQUEST message from j.

To request critical section:If i does not have token, increment RNi[i] and send REQUEST(i, RNi[i]) to all nodesif i has token already, enter critical section if the token is idle (no pending requests), else follow rule to release critical section

On receiving REQUEST(i, sn) fat j:set RNj[i] = max(RNj[i], sn)if j has the token and the token is idle, send it to i if RNj[i] = LN[i] + 1. If token is not idle, follow rule to release critical section

To enter critical section:enter CS if token is present

To release critical section:set LN[i] = RNi[i]For every node j which is not in Q (in token), add node j to Q if RNi[ j ] = LN[ j ] + 1If Q is non empty after the above, delete first node from Q and send the token to that node

Points to note:

No. of messages: 0 if node holds the token already, n otherwise

Synchronization delay: 0 (node has the token) or max. message delay (token is elsewhere)

No starvation

Raymond’s Algorithm

Forms a directed tree (logical) with the token-holder as root

Each node has variable “Holder” that points to its parent on the path to the root. Root’s Holder variable points to itself

Each node i has a FIFO request queue Qi

To request critical section:Send REQUEST to parent on the tree, provided i does not hold the token currently and Qi is empty. Then place request in Qi

When a non-root node j receives a request from iplace request in Qj

send REQUEST to parent if no previous REQUEST sent

When the root r receives a REQUESTplace request in Qrif token is idle, follow rule for releasing critical section (shown later)

When a node receives the tokendelete first entry from the queuesend token to that node (maybe itself)set Holder variable to point to that nodeif queue is non-empty, send a REQUEST message to the parent (node pointed at by Holder variable)

To execute critical sectionenter if token is received and own entry is at the top of the queue; delete the entry from the queue

To release critical sectionif queue is non-empty, delete first entry from the queue, send token to that node and make Holder variable point to that nodeIf queue is still non-empty, send a REQUEST message to the parent (node pointed at by Holder variable)

Points to note:

Avg. message complexity O(log n)

Sync. delay (T log n)/2, where T = max. message delay

Leader Election

Leader Election in Rings

ModelsSynchronous or AsynchronousAnonymous (no unique id) or Non-anonymous (unique ids)Uniform (no knowledge of ‘n’, the number of processes) or non-uniform (knows ‘n’)

Known Impossibility ResultThere is no deterministic, synchronous, non-uniform leader election protocol for anonymous rings

Election in Asynchronous Rings

Lelann-Chang-Robert’s Algorithmsend own id to node on leftif an id received from right, forward id to left node only if received id greater than own id, else ignoreif own id received, declares itself “leader”

Works on unidirectional ringsWorst case message complexity = O(n2)Average case message complexity = O(nlogn)

Hirschberg-Sinclair AlgorithmOperates in phases, requires bidirectional ringIn kth phase, send own id to 2^k processes on both sides of yourself (directly send only to next processes with id and k in it)If id received, forward if received id greater than own id, else ignoreLast process in the chain sends a reply to originator if its id less than received idReplies are always forwardedA process goes to (k+1)th phase only if it receives a reply from both sides in kth phaseProcess receiving its own id – declare itself “leader”

Message Complexity: O(nlgn)Lots of other algorithms exist for ringsLower Bound Result:

Any comparison-based leader election algorithm in a ring requires Ώ(nlgn) messagesWhat if not comparison-based?

Leader Election in Arbitrary Networks

FloodMaxSynchronous, round-basedAt each round, each process sends the max. id seen so far (not necessarily its own) to all its neighborsAfter diameter no. of rounds, if max. id seen = own id, declares itself leaderComplexity = O(d.m), where d = diameter of the network, m = no. of edgesDoes not extend to asynchronous model trivially

Variations of building different types of spanning trees with no pre-specified roots. Chosen root at the end is the leader

Clock Synchronization

Clock Synchronization

Multiple machines with physical clocks. How can we keep them more or less synchronized?Internal vs. External synchronizationPerfect synchronization not possible because of communication delaysEven synchronization within a bound can not be guaranteed with certainty because of unpredictability of communication delays.But still useful !! Ex. – Kerberos, GPS

How clocks work

Computer clocks are crystals that oscillate at a certain frequencyEvery H oscillations, the timer chip interrupts once (clock tick). No. of interrupts per second is typically 18.2, 50, 60, 100; can be higher, settable in some casesThe interrupt handler increments a counter that keeps track of no. of ticks from a reference in the past (epoch)Knowing no. of ticks per second, we can calculate year, month, day, time of day etc.

Clock Drift

Unfortunately, period of crystal oscillation varies slightly If it oscillates faster, more ticks per real second, so clock runs faster; similar for slower clocksFor machine p, when correct reference time is t, let machine clock show time as C = Cp(t)Ideally, Cp(t) = t for all p, tIn practice,

1 – ρ ≤ dC/dt ≤ 1 + ρρ = max. clock drift rate, usually around 10-5 for cheap oscillatorsDrift => Skew between clocks (difference in clock values of two machines)

Resynchronization

Periodic resynchronization needed to offset skew

If two clocks are drifting in opposite directions, max. skew after time t is 2 ρ t

If application requires that clock skew < δ, then resynchronization period

r < δ /(2 ρ)

Usually ρ and δ are known

Cristian’s Algorithm

One m/c acts as the time serverEach m/c sends a message periodically (within resync. period r) asking for current timeTime server replies with its timeSender sets its clock to the replyProblems:

message delaytime server time is less than sender’s current time

Handling message delay: try to estimate the time the message with the timer server’s time took to each the sender

Measure round trip time and halve itMake multiple measurements of round trip time, discard too high values, take average of restUake multiple measurements and take minimumuse knowledge of processing time at server if known

Handling fast clocksDo not set clock backwards; slow it down over a period of time to bring in tune with server’s clock

Berkeley Algorithm

Centralized as in Cristian’s, but the time server is activeTime server asks for time of other m/cs at periodic intervalsTime server averages the times and sends the new time to m/csM/cs sets their time (advances immediately or slows down slowly) to the new timeEstimation of transmission delay as before

External Synchronization

Clocks must be synchronized with real time

Cristian’s algorithm can be used if the time server is synchronized with real time somehow

Berkeley algorithm cannot be used

But what is “real time” anyway?

Measurement of time

Astronomicaltraditionally usedbased on earth’s rotation around its axis and around the sunsolar day : interval between two consecutive transits of the sunsolar second : 1/86,400 of a solar dayperiod of earth’s rotation varies, so solar second is not stablemean solar second : average length of large no of solar days, then divide by 86,400

Atomicbased on the transitions of Cesium 133 atom1 sec. = time for 9,192,631,770 transitionsabout 50+ labs maintain Cesium clockInternational Atomic Time (TAI) : mean no. of ticks of the clocks since Jan 1, 1958highly stableBut slightly off-sync with mean solar day (since solar day is getting longer)A leap second inserted approx. occasionally to bring it in sync. (so far 32, all positive)Resulting clock is called UTC – Universal Coordinated Time

UTC time is broadcast from different sources around the world, ex.

National Institute of Standards & Technology (NIST) – runs radio stations, most famous being WWV, anyone with a proper receiver can tune inUnited States Naval Observatory (USNO) –supplies time to all defense sources, among othersNational Physical Laboratory in UKGPS satellitesMany others

NTP : Network Time ProtocolProtocol for time sync. in the internetHierarchical architecture

Primary time servers (stratum 1) synchronize to national time standards via radio, satelite etc. Secondary servers and clients (stratum 2, 3,…) synchronize to primary servers in a hierrachicalmanner (stratum 2 servers sync. with stratum 1, startum 3 with stratum 2 etc.).

Reliability ensured by redundant serversCommunication by multicast (usually within LAN servers), symmetric (usually within multiple geographically close servers), or client server (to higher stratum servers)Complex algorithms to combine and filter timesSync. possible to within tens of milliseconds for most machinesBut just a best-effort service, no guaranteeshttp://www.ntp.org for more details

Termination Detection

Termination Detection

Modelprocesses can be active or idleonly active processes send messagesidle process can become active on receiving an computation messageactive process can become idle at any timetermination: all processes are idle and no computation message are in transitCan use global snapshot to detect termination also

Huang’s Algorithm

One controlling agent, has weight 1 initiallyAll other processes are idle initially and has weight 0Computation starts when controlling agent sends a computation message to a processAn idle process becomes active on receiving a computation messageB(DW) – computation message with weight DW. Can be sent only by the controlling agent or an active processC(DW) – control message with weight DW, sent by active processes to controlling agent when they are about to become idle

Let current weight at process = W

1. Send of B(DW):• Find W1, W2 such that W1 > 0, W2 > 0, W1 + W2 = W• Set W = W1 and send B(W2)

2. Receive of B(DW):• W += DW; • if idle, become active

3. Send of C(DW):• send C(W) to controlling agent• Become idle

4. Receive of C(DW):• W += DW• if W = 1, declare “termination”

Building Spanning Trees

Building Spanning Trees

Applications:BroadcastConvergecastLeader election

Two variations:from a given root rroot is not given a-priori

Flooding Algorithm

Starts from a given root rr initiates by sending message M to all neighbours, sets its own parent to nilFor all other nodes, on receiving M from i for the first time, set parent to i and send M to all neighbors except i. Ignore any M received after thatTree built is an arbitrary spanning treeMessage complexity

= 2m – (n -1) where m = no of edgesTime complexity ??

Constructing a DFS tree with given root

Plain parallelization of the sequential algorithm by introducing synchronizationEach node i has a set unexplored, initially contains all neighbors of iA node i (initiated by the root) considers nodes in unexplored one by one, sending a neighbor j a message M and then waiting for a response (parentor reject) before considering the next node in unexplored If j has already received M from some other node, j sends a reject to i

Else, j sets i as its parent, and considers nodes in its unexplored set one by onej will send a parent message to i only when it has considered all nodes in its unexplored seti then considers the next node in its unexplored setAlgorithm terminates when root has received parentor reject message from all its neighboursWorst case no. of messages = 4mTime complexity O(m)