Autonomous routing algorithms for networks with wide ...modiano/papers/CV_C_123.pdf · Wajahat...

AUTONOMOUS ROUTING ALGORITHMS FOR NETWORKS WITHWIDE-SPREAD FAILURES

Wajahat Khan, Long Bao Le and Eytan Modiano

Communications and Networking Research GroupMassachusetts Institute of Technology

Cambridge, MA, USA 02139Emails: {wajahat, longble, modiano} @mit.edu

Abstract- We study end-to-end delay performance ofdif-ferent routing algorithms in networks with random failures.Specifically, we compare delay performances of DifferentialBacklog (DB) and Shortest Path (SP) routing algorithmsand show that DB routing outperforms SP routing in termsof throughput when the network is heavily loaded and/or thefailure rate is high while SP routing achieves better delayperformance in the low load regime. Then, we investigatedelay performance of a hybrid routing algorithm that combines principles of both SP and DB routing algorithms andshow that it outperforms both of these routing algorithms.Finally, we demonstrate improvements in delay performanceofDB routing through the use ofa digitalfountain approachwhich was originally proposed for multicast applications.In addition, our results show that there exists an optimalcoding rate where digitalfountain based DB routing achievesminimum end-to-end delay. To the best of our knowledge,this is the first work which investigates delay performanceof DB routing and its enhanced versions for networks withlink failures.

Index Terms-Differential backlog routing, shortest pathrouting, digital fountain, end-to-end delay, throughput region

I. INTRODUCTION

A robust communications network must have built-inredundancies to recover from failures. However, survivable topology design forms a necessary but not sufficientcondition for failure recovery. Recovering from a failureusually involves re-routing of data along pre-plannedback-up paths (typically used for recovery from singlelink failures) [1], [2], [3]. However, in the event of acatastrophic failure, the use of pre-planned back-up pathsis not practical. When dealing with networks on thescale of a nation-wide communications infrastructure, itis impossible to pre-plan against all the possible link

This work was supported by NSERC Postdoctoral Fellowshipand by ARO Muri grant number W9llNF-08-l-0238, DTRA grantnumber HDTRAl-07-l-0004, NSF grant numbers CNS-062678l andCNS-083096l.

978-1-4244-5239-2/09/$26.00 ©2009 IEEE

failures. For example, the number of possible failurescenarios for a graph with just thirty links is over abillion. Also, for such an approach to work, every nodein the network must know the state of the entire networkwhich may not be achieved in most practical networks.Hence, autonomous re-routing algorithms which candynamically respond to failure(s) and require minimalnetwork-state knowledge are a key solution for networkswith failures.

Traditional approaches to autonomous routing havebeen based on shortest path algorithms such as OpenShortest Path First (OSPF) [4]. These algorithms havetheir drawbacks when responding to changes in networktopologies (which includes failures) both in terms ofefficient capacity utilization and speed of recovery. Thetime complexity of computing all-pair shortest pathsusing a simple algorithm like Bellman-Ford algorithmis O(N3 ) where N is the number of nodes [5]. Forlarge networks this computation over short time periodsbecomes not only challenging but practically impossible.Distributed implementations which are necessary in anation-wide network have yet greater time complexity.Our goal in this paper is to develop novel approachesto the autonomous re-routing problem for rapid andefficient failure recovery.

The DB routing algorithm was originally proposedby Tassiulas and Ephremides in the context of wirelessnetworks which was shown to achieve the maximumthroughput performance [6]. DB routing can be easilyimplemented in a distributed manner in wired networks,such as optical backbone networks, since each nodeonly needs to know the states of its neighbors. Thethroughput-optimal performance and ease of implementation of DB routing makes it an obvious candidate forautonomous re-routing. Although recent testbed implementation shows that DB routing achieves good throughput and fairness performance, its delay performance isnot well understood [11].

In this paper, we investigate delay performance of DB

Paper ID# 901159.PDF

routing and its enhanced variants in an optical backbonenetwork of major US service providers. Specifically,we compare end-to-end delay performance of DB andSP routing in this backbone network with and withoutfailures. We show that SP routing indeed achieves lowerdelay in the low load condition but it has much lowerthroughput performance compared to the DB routing. Wethen compare delay performance of a hybrid (HybridDB)routing algorithm which combines the principles of DBand SP routing algorithms by incorporating routing costsinto the "differential backlog" routing metric. We showthat the HybridDB routing algorithm indeed outperformsboth DB and SP routing algorithms while achievingthe maximum throughput performance. Given the factthat long end-to-end delay of a few packets from a fileresults in significant increase of file transfer delay, wepropose to combine the digital fountain approach withDB routing to reduce to the file transfer delay. Our resultsshow that digital fountain based DB routing successfullyimproves end-to-end file transfer delay and there existsan optimal coding rate that achieves the minimum delayperformance.

This paper is organized as follows. In section II, webriefly review the SP and DB routing algorithms. We alsodescribe the HybridDB routing and the digital fountainbased DB routing algorithms. In section III, we describeour simulation setup and present extensive simulation results which compare performance of the aforementionedrouting algorithms. Conclusions are stated in section IV.

II. SYSTEM MODEL AND ROUTING ALGORITHMS

In this section, we describe the system model, reviewthe SP and DB routing algorithms and present twoenhanced routing algorithms, namely HybridDB anddigital fountain based DB routing algorithms whoseperformances are investigated in the next section.

A. System Models

We consider a wired network (e.g., optical backbonenetworks) which is formed by a set of nodes and links.Assume that the network has N nodes. It is furtherassumed that time is slotted. Assume that files arrive randomly over time which are destined to randomly chosendestination nodes. For any such arriving file, a routingalgorithm is employed for end-to-end data delivery. Inthe following, we refer to a combination of such a file,its source and destination nodes as a data session. Wewill consider four different routing algorithms, namelySP, DB, HybridDB and digital fountain based DB routingalgorithms. Among these routing algorithms, SP is a

single-path routing policy while the others are multipathrouting policies.

Data packets of different sessions are buffered at eachnetwork node according to their destinations. Specifically, each node maintains N - 1 FIFO queues to bufferdata packets which are destined to other N - 1 nodes.A destination node of any file/session waits until itreceives a sufficient number of packets to reconstructthe underlying file. After the file is reconstructed, it isdelivered to the higher layers and exits the network.

We are interested in end-to-end packet and file transferdelay performances of the aforementioned routing algorithms where end-to-end packet/file delay is measuredfrom the time instant a packet/file enters the network tothe time instant it is successfully received/reconstructedat the destination node. Note that a file exits the networkonly after the destination node receives a sufficientnumber of packets to reconstruct it. It is assumed thatnetwork links are either in "WORKING" or "FAILED"states in any time slot. A "WORKING" link can transmitone packet/time slot while a "FAILED" link cannottransmit any packet until its state is changed to the"WORKING" state. We assume that network failuresonly impact network links. Because data packets arebuffered at node queues, there is no need for errorrecovery at the link or transport layer.

B. Routing Algorithms

1) SP Routing Algorithm: SP routing is by far themost common class of algorithms employed in modemnetworks. As might be implied from its name, SP routingconsiders all the possible paths between a source anddestination and chooses the one with minimum cost. Thecosts are typically a function of length, congestion ormonetary costs of a link. Bellman-Ford and Dijkstra [5]are two widely used shortest path algorithms. The bestknown running times of these algorithms are O(NE)and O(N log N + E) respectively, where N representsthe number of nodes and E represents the number oflinks in a network. In this paper, we assume that routingcosts of the SP routing algorithm are simply the numberof hops along a route (SP routing algorithm is simplyminimum-hop routing).

2) DB Routing Algorithm: The underlying idea behind DB routing is to use all network resources to distribute data, differentiated by destination (data destinedto one particular destination is called the correspondingcommodity). Routing decisions are made at the beginning of each time slot. To describe the DB routingin more details, let U~c) (t) be the number of packetswaiting at node a destined for node c in time slot t. For

2 of 6


each pair of directly connected nodes, let us say nodes aand b, the commodity c~b (t) with the highest differentialbacklog which is calculated as

c~b(t) = argmax {U~c)(t) - U~C)(t)} (1)CE{l, ... ,N}

is transmitted over the link (a, b) in time slot t where{I, ..., N} is the set of network nodes. In the case wherethere are multiple commodities achieving the maximumdifferential backlog, one of them is chosen randomly.Also, if the number of packets at a particular queue forsome commodity is smaller than that required by therouting algorithm, outgoing links of the correspondingnode are randomly chosen to transmit available packets.

It has been shown that the DB routing algorithmachieves maximum throughput performance [6], [7].Specifically, let throughput region be the union of allflow/sessions arrival rate vectors such that there existssome routing algorithm which can stabilize all the network queues (i.e., their queue lengths are bounded). Itcan be shown that DB routing algorithm stabilizes allarrival rate vectors which lie strictly inside the throughput region [6], [7]. Although DB routing is throughputoptimal, its delay performance is not well-understood.In fact, some recent works show that the maximumweight scheduling (a version of DB routing for singlehop wireless traffic flows) achieves order optimal delayin a wireless cellular network and most practical wirelessad hoc networks [9], [10]. However, its actual delayperformance was not investigated in these papers.

3) HybridDB Routing Algorithm: It has been recognized that although DB routing algorithm is throughputoptimal, its delay performance may not be very good inlow network load conditions [7], [11]. This is becausethe DB routing algorithm exploits all possible routesincluding loops in the network to achieve the maximumthroughput. In the low load regime, the DB routingalgorithm tends to use long routes [11] which may hurtthe delay performance. In fact, routing data along shortroutes may achieve good delay performance while stillmaintaining queue stability (i.e., bounded queue length)in the low network load. Therefore, an adaptive DBrouting which uses long routes only when necessarycan potentially achieves both good throughput and delayperformances.

One way to exploit advantages of both DB and SProuting algorithms is to incorporate routing cost into thedifferential backlog metric presented in (1) so that data isrouted along short routes when the network load is low.Specifically, we propose the HybridDB routing algorithmwhich chooses a commodity for link (a, b) in time slot

t as follows [7]:

c~b(t) = argmax {U~c)(t) - U~c)(t)CE{l, ... ,N}

+a (Va(c)(t) - ~(C)(t))} (2)

where Vi(c) is the routing cost to deliver data from nodei to node c and a is a weighting factor. If the underlyingSP routing is minimum-hop routing then Vi(c) is simplythe number of hops on the minimum-hop route fromnode i to node c and vlc

)(t) - ~(c) (t) == 1. In addition,the larger the weighting factor a the more likely shortroutes are selected for data delivery.

4) Digital Fountain Based DB Routing Algorithm:For end-to-end delivery from a source node to the corresponding destination node, a long file must be brokeninto a large number of packets. Because DB routingmay use a long route to deliver a packet, it is highlylikely that it takes a long time before the destinationnode receives all required packets to reconstruct the file.Hence, the end-to-end file transfer delay could be potentially reduced if the destination node can reconstructthe file by using only a subset of the original packets.In fact, the digital fountain coding technique enables usto achieve this goal. The digital fountain approach wasoriginally proposed by Byers, Luby and Mitzenmacherfor the asynchronous multicast application [12].

In an ideal digital fountain approach, x packets brokenfrom a file are encoded into n > x packets which arethen transmitted over the network. A receiver whichreceives any x distinct packets out of the n transmittedpackets in any order can reconstruct the original file.This ideal digital fountain approach can be realized byusing the classical Reed Solomon (RS) erasure code.However, it has been indicated in [12] that the idealdigital fountain using RS erasure code has several implementation limitations. The authors of [12] have alsoproposed several codes including Tornado codes [13] andLuby Transform codes [14] which can approximate theideal digital fountain and are easy to implement. Theapproximate digital fountain solution usually requires thenumber of received packets to be slightly larger thanthe number of original packets before a receiver canreconstruct the original file.

To make our investigation independent of code designissues, we assume an ideal digital fountain solutionin this paper. It can be observed that digital fountainapproach can be jointly employed with any routingalgorithms presented in the previous subsections. This isbecause the digital fountain approach is only needed toencode data at a source node and to reconstruct originalfiles at a destination node.

3 of 6


leoO

leoO

ShortestPath-File ShortestPath-Packet

DifferentialBacklog-File ... +.Different ial Backlog-Packet +

1400

1200

70

60

HybridDB :alpha=5 -File -.,..,-HybridDB :alpha=5 -Pacl<:el

DifferenlialBacklog(alpha=O)-Flle ... +- ..Differen tial Backlog (alpha =O)-Packet

Shortest Path-FileShortest Path-Packet

1000

eoo

eoo

400

200

B 50

!! 40

i! 30

t ...20

403525201510O '-==--.L-_~_~_~_~_~_~_-----'

o10

Fig. I. File and packet delays of SP and DB versus failure rate (forA=1/4, p= 1/4, simulation time : I million slots)

Fig. 2. Delay performance of SP, DB, and HybridDB versus networkload (for p= 1/4, simulation time: 10 million slots)

III . PERFORMANCE EVALUATION

In this section, we investigate and compare delayperformances of the routing algorithms presented inthe previous section. All numerical results are obtainedfor an optical backbone network of major US serviceproviders with 29 bidirectional links and 13 nodes [8].

250

200

HybridDB :alpha=5-File -HybridDB :alpha=5 -Packet

Differen tial Backlog (alpha =O)-File .

DifferenlialBaCkI09J~~~:s~Ot:~~~

ShorteslPath-Packel

.: /

40

20

35

. /

15

2520

10

1510

.........

..............

.... .......................

HybridDBalpha=5-File -;.;-HybridDB :alpha=5-Packet

Differential Backlog(alpha=O)-File ... + ..

Differential BaCklo9J~~~:s~O~tt~~~

Shortest Path-Packet

O '-------~---~---~---~-----'o

."..--/ /

~/"'''''''''. '''''... " ... - .

50

~:

300 ,-------~ ---::c --~---~___,

250

50

I 200

Ii 150

§sj 100

I1, 50

Fig. 3. File and packet delays of SP, DB, and HybridDB versusfailure rates (for A=1/10, p= 1/4, simulation time : 10 million slots)

Fig. 4. File and packet delays of SP, DB, and HybridDB versusfailur e rates (for A=1/4, p= 1/4, simulation time: 10 million slots)

A. Simulation Settings and Parameters

Recall that time is slotted. Files arrive at the beginningof each slot at each node according to a Poisson processwith arrival rate A files per slot. A file is equally likelyto be destined to any node in the network except thesource node. The number of packets in each file is ageometric random variable with parameter p. In order toinvestigate the impacts of code rate on delay performanceof different routing algorithms using the digital fountainsolution, we fix file sizes at x packets (in Figs. 6, 7).Also, for a file of x packets, n = rx / fl packets aregenerated where f is the code rate. At the destinationnode, a file reconstruction is assumed to be performedsuccessfully when any x of the n = rx / fl packetsoriginally transmitted by source node have been received.

Nodes keep track of the number of packets they havereceived for each file destined to them. Since there isno packet loss, redundant packets in a file do get to thenodes sometimes whereby they are discarded. A file iscalled active if its destination has not received all thepackets required to reconstruct that file. A list of allactive files, is maintained at each destination node.

For DB-based routing algorithms, after all arrivalsfor a slot take place, each link is marked with thecommodity that has the maximum differential backlogacross that link, with ties broken randomly. For linkswith a positive differential backlog, one packet of themarked commodity is transmitted across the link. If there

4 of 6


- ~ - - - - ><- - - - -.,>;- - - --

HybridDB :alpha=5 ,O%Failures-File -BHybndDB:alpha=5, 0% Failures-Packet

HybridDB:alpha=5. 5% Failures-FileHybridDB:alpha=5 , 5% Failures-Packet

06 ,0% Failures-File --x-DB. 0% Failures-Packet

DB, 5% Failures-File .. +.DB, 5% Failures-Packet +

120

20

40

I 100

,i 60

ji~ 60

i

. ,..

DB, 0% Failures-File -';-.DB. 0% Failures-Packet

OB,5%Failures-File .+..DB, 5% Failures-Packet +

.... .......

60

40

140 ,-------~--~--~---~-______,

1.1 1.2 1.3 1.4 1.51.1 1.2 1.3 1.4 1.5

Fig. 5. Delay performance of Digital Fountain appro ach in DBversus code rate (for ..\=1125, x=20, simulation time : I million slots)

Fig. 6. Delay performance of Digital Fountain approach in DB andHybridDB versus code rate with and without failures (for ..\=1125,x=20, simulation time: 1 million slots)

40353025201510

HybridOBwithout Digital Fountain-File --e-

HybridDB wilh Oi ~~?~~~~t:~i~~i~~ ::~~~ali~-ii:~~HybndDB with Dlgrta~ Fountain (coding rate = 1/1.2)-Packet

DB~~~~~o~it ~~T~~Jn~~~~~~~~~ -+-DB with Digrtal Fountain (CO~i ng rate = 1/1.2)-File .. +.

DB With Digrtal Fountain (coding rate = 1'1 .2)-Packet ... -/

d

o'------~-~-~-~-~-~-~-------'o

400

100

500

~!! 300

i!~ 200

l

Fig. 7. Delays of Digital Fountain approach in SP, DB andHybridDB versus failure rate with different values of code rate (for..\=1150, x = 20, simulation time: I million slots)

it does not achieve the full throughput of the network.Therefore, DB routing achieves much greater throughputthan SP routing. This can be confirmed by noticing thatfile transfer delay under SP routing increases sharplywhen failure probability is larger than 15% while rapidincrease in file transfer delay under DB routing onlyoccurs for failure probability larger than 35%.

In Figs. 2, 3, 4, we compare delay performancesof SP, DB, and HybridDB routing algorithms. Thesefigures show that HybridDB routing achieves better delayperformance than DB routing. It can also be seen fromFig. 2 that DB and HybridDB routing algorithms havesimilar throughput performance. This can be interpretedas follows . HybridDB routing can exploit all possiblepaths to achieve maximum throughput when the networkload is high but it tends to use shorter and more directpaths to deliver data in the low load condition. Fig. 3shows that for networks with link failures , the delayimprovement of HybridDB routing compared to DBrouting becomes less significant when the failure rate is

B. Comparison of Sp' DB, and HybridDB Routing

In Fig . 1, we show delay performances of SP and DBrouting algorithms versus the steady state probability of"FAILED" state (called failure probability in the following) . It can be observed that in the SP routing, packetsare routed deterministically toward their destinations;hence SP routing behaves better in terms of delays whenthe network load is low (i.e., low failure probability).However, since SP routing does not utilize all possiblerouting paths in the network to perform load balancing,

is a shortage of commodity packets at a network node ,the competing links are randomly served. All packettransmissions are completed by the end of a slot.

At the beginning of each slot, a "WORKING" linkchanges its state to "FAILED" with probability PI anda "FAILED" state changes it state to "WORKING" withprobability Pw (i.e ., link status is modeled as a two stateMarkov chain). All links start in the "WORKING" state.Hence, the steady state probability, 1rI ' of being in the"FAILED" state is given as

1r1 - PI (3)PI +Pw

For SP routing, we use Dijkstras All-pair Shortest-Pathalgorithm. As each link can transfer up to one packet ineach slot, the cost of all links are set to be equal toI. Each node maintains an address classifier to routepackets to outgoing links based on their destinations.Queues at network nodes buffer new arrivals to the nodesas well as routed packets from other parts of the network.The address classifiers are updated every time slot toaccount for any changes in network topology resultingfrom any link failures or activations. When a link fails ,all packets residing in its queue get rerouted to the linksource. End-to-end delay are averaged over all sessions.

5 of 6


high. In addition, SP routing has good delay performancein the low load condition but it achieves lower throughputcompared to the other two routing algorithms.

C. Routing Performance with Digital Fountain Solution

In Fig. 5, we show delay performance of DB routingwith the digital fountain solution versus code factorwhich is equal to l/code rate for networks with andwithout failures. It can be observed from this figure thatend-to-end file transfer delay of DB routing decreaseswhen the code factor increases from 1; then it increaseswhen the code factor reaches a certain optimal value.In addition, end-to-end packet delay always increaseswhen the code factor increases for both networks withand without failures. These results can be interpretedas follows. With the increase in the code factor, sourcenodes add increasing amount of redundant informationinto the original data. Addition of more redundant information would reduce the end-to-end file transfer delayfor low code factors because a receiver can reconstructan original file by using a smaller fraction of the encodedfile. However, adding more redundant information alsoincreases network load and, therefore, congestion inthe network which would potentially increase queueingdelay. Therefore, there exists an optimal code factorwhich depends on network load and failure rates. Incontrast, packet delay always increases with increasedcode factor because individual packets do not benefitfrom the use of coding.

In Fig. 6, we compare delay performances of DB andHybridDB routing algorithms with the digital fountainsolution for different values of code factors. This figureshows that by employing the digital fountain solution,HybridDB routing can further improve delay performance compared to the original DB routing. Also, thereexists an optimal code factor for the HybridDB routingwhich is smaller than the optimal code factor underthe DB routing. This is because the HybridDB routingalgorithm tends to use shorter routes which makes thenetwork "more congested" compared to the DB routingalgorithm. Finally, we compare delay performances ofDB and HybridDB routing algorithms with and withoutthe digital fountain solution versus failure rates in Fig.7. Again, HybridDB routing using the digital fountainsolution achieves the best file transfer delay performancecompared to other algorithms.

IV. CONCLUSIONS

In this paper, we investigated delay performances ofthe DB routing algorithm and its enhanced versions

for networks with link failures. Specifically, we haveshown through extensive numerical investigation that SProuting achieves good delay performance in the lownetwork load regime but it has very low throughputperformance compared to the DB routing algorithm. Inaddition, by combining the principles of both SP and DBrouting algorithms, HybridDB routing has better delayperformance than DB routing while achieving similarthroughput performance. Moreover, the digital fountainapproach can be combined with DB or HybridDB routingto further improve end-to-end file transfer delay. Finally,there exists an optimal code factor which results in theminimum end-to-end file transfer delay which dependson the network load and failure probability.

REFERENCES

[1] R. Bhandari, Survivable Networks: Algorithms for Diverse Routing, Kluwer Academic Publishers, 1999.

[2] E. Modiano and A. Narula-Tam, "Survivable lightpath routing:A new approach to the design of WDM-based networks," IEEEJ. Sel. Areas Commun., vol. 20, no. 4, pp. 800-809, 2002.

[3] A. Narula-Tam, E. Modiano, and A. Brzezinski, "Physical topology design for survivable routing of logical rings in WDM-basednetworks," IEEE J. Sel. Areas Commun., vol. 22, no. 8, pp. 15251538, 2004.

[4] Christian Huitema, Routing in the Internet, Prentice-Hall, Inc.,Upper Saddle River, NJ, USA, 1995.

[5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to Algorithms, Second Edition. The MIT Press,September 2001.

[6] L. Tassiulas and A. Ephremides, "Stability properties of constrained queueing systems and scheduling policies for maximumthroughput in multihop radio networks," IEEE Trans. Aut. Control, vol. 37, no. 12, pp. 1936-1948, Dec. 1992.

[7] M. J. Neely, E. Modiano, and C. E. Rohrs, "Dynamic powerallocation and routing for time varying wireless networks," IEEEJ. Sel. Areas Commun., vol. 23, no. 1, pp. 89-103, Jan. 2005.

[8] W. F. Khan, "Autonomous routing algorithms for networks withwide-spread failures: A case for differential backlog routing."Master thesis, Massachusetts Institute of Technology, 2008.

[9] M. Neely, "Delay analysis for max weight opportunistic scheduling in wireless systems," Allerton 2008, Sept. 2008.

[10] L. B. Le, K. Jagannathan, and E. Modiano, "Delay analysisof maximum weight scheduling in wireless ad hoc networks,"CISS'2009, Mar. 2009.

[11] A. Warrier, S. Janakiraman, and I. Rhee, "DiffQ: Practicaldifferential backlog congestion control for wireless networks,"INFOCOM 2009.

[12] J. W. Byers, M. Luby, and M. Mitzenmacher, "A digital fountainapproach to asynchronous reliable multicast," IEEE J. Sel. AreasCommun., vol. 20, no. 8, pp. 1528-1540, 2002.

[13] M. Luby, "Information additive code generator and decoder forcommunications systems," U.S. Pat. No. 307 487, Oct. 2001.

[14] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman,"Efficient erasure correcting codes," IEEE Trans. Inf. Theory, vol.47, pp. 569-584, Feb. 2001.

6 of 6

Autonomous routing algorithms for networks with wide ...modiano/papers/CV_C_123.pdf · Wajahat...

Documents

Transcript of Autonomous routing algorithms for networks with wide ...modiano/papers/CV_C_123.pdf · Wajahat...