Dual link failure resiliency through backup link mutual exclusion

13
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008 157 Dual-Link Failure Resiliency Through Backup Link Mutual Exclusion Srinivasan Ramasubramanian, Member, IEEE, and Amit Chandak Abstract—Networks employ link protection to achieve fast re- covery from link failures. While the first link failure can be pro- tected using link protection, there are several alternatives for pro- tecting against the second failure. This paper formally classifies the approaches to dual-link failure resiliency. One of the strategies to recover from dual-link failures is to employ link protection for the two failed links independently, which requires that two links may not use each other in their backup paths if they may fail simul- taneously. Such a requirement is referred to as backup link mu- tual exclusion (BLME) constraint and the problem of identifying a backup path for every link that satisfies the above requirement is referred to as the BLME problem. This paper develops the nec- essary theory to establish the sufficient conditions for existence of a solution to the BLME problem. Solution methodologies for the BLME problem is developed using two approaches by: 1) for- mulating the backup path selection as an integer linear program; 2) developing a polynomial time heuristic based on minimum cost path routing. The ILP formulation and heuristic are applied to six networks and their performance is compared with approaches that assume precise knowledge of dual-link failure. It is observed that a solution exists for all of the six networks considered. The heuristic approach is shown to obtain feasible solutions that are resilient to most dual-link failures, although the backup path lengths may be significantly higher than optimal. In addition, the paper illustrates the significance of the knowledge of failure location by illustrating that network with higher connectivity may require lesser capacity than one with a lower connectivity to recover from dual-link fail- ures. Index Terms—Backup link mutual exclusion, dual-link failures, link protection, optical networks. I. INTRODUCTION T HE ever-increasing transmission speed in the communi- cation networks calls for efficient fault-tolerant network design. Today’s backbone networks employ optical communi- cation technology involving wavelength division multiplexing (WDM). A link between two nodes comprises of multiple fibers carrying several tens of wavelengths with transmission speed on a wavelength at 40 Gb/s. Due to the large volume of information transported, it is necessary to reduce the resource unavailability Manuscript received October 28, 2004; revised March 27, 2006, and October 7, 2006; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor N. Taft. This work was supported by the National Science Foundation under Grant 0325979, Grant 0435490, and Grant EEC-0333046. Parts of this paper were previously published in the Proceedings of the Second International Conference on Broadband Networks (BROADNETS), Boston, MA, October, 2005. S. Ramasubramanian is with the Department of Electrical and Computer En- gineering, University of Arizona, Tucson, AZ 85721 USA (e-mail: srini@ece. arizona.edu). A. Chandak is with the Switch Fabric and Line Card Group, Cisco Systems, San Jose, CA 95134 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TNET.2007.900368 time due to failures. Hence, efficient and fast recovery tech- niques from node and link failures are mandated in the design of high-speed networks. As link failures are the most common case of the failures seen in the networks, this paper restricts its scope to link failures alone. Optical networks of today operate in a circuit-switched manner as optical header processing and buffering technolo- gies are still in the early stages of research for wide-scale commercial deployment. Protecting the circuits or connections established in such networks against single-link failures may be achieved in two ways: path protection or link protection. Path protection attempts to restore a connection on an end-to-end basis by providing a backup path in case the primary (or working) path fails. The backup path assignment may be either independent or dependent on the link failure in the network. For example, a backup path that is link-disjoint with the primary path allows recovery from single-link failures without the pre- cise knowledge of failure location. On the other hand, more than one backup path may be assigned for a primary path and the connection is reconfigured on the backup path corresponding to the failure scenario that resulted in the primary path failure. The former is referred to as failure-independent path protection (FIPP) while the latter is referred to as failure-dependent path protection (FDPP). Link protection recovers from a single link failure by rerouting connections around the failed link. Such a recovery may be achieved transparent to the source and destination of the connections passing through the failed link. Link protection at the granularity of a fiber switches all of the connections on a fiber to a separate (spare) fiber on the backup path. The time needed to detect the fault, communicate to the end-nodes, re-initiate connection requests on the backup paths, and recon- figure the switches at the intermediate nodes could sometimes cause the layers above the optical layer to resort to their own restoration solutions. Link protection reduces the communica- tion requirement as compared to path protection, thus providing fast recovery. However, the downside of link protection is that its capacity requirement is higher than that of path protection, specifically when protection is employed at the connection granularity [2]. Algorithms for protection against link failures have tradition- ally considered single-link failures [3]–[5] (for a detailed de- scription on protection approaches, refer to [6]). However, dual- link failures are becoming increasingly important due to two reasons. First, links in the networks share resources such as con- duits or ducts and the failure of such shared resources result in the failure of multiple links. Second, the average repair time for a failed link is in the order of a few hours to few days [6], and this repair time is sufficiently long for a second failure to occur. Al- though algorithms developed for single-link failure resiliency is 1063-6692/$25.00 © 2007 IEEE

description

 

Transcript of Dual link failure resiliency through backup link mutual exclusion

Page 1: Dual link failure resiliency through backup link mutual exclusion

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008 157

Dual-Link Failure Resiliency Through Backup LinkMutual Exclusion

Srinivasan Ramasubramanian, Member, IEEE, and Amit Chandak

Abstract—Networks employ link protection to achieve fast re-covery from link failures. While the first link failure can be pro-tected using link protection, there are several alternatives for pro-tecting against the second failure. This paper formally classifies theapproaches to dual-link failure resiliency. One of the strategies torecover from dual-link failures is to employ link protection for thetwo failed links independently, which requires that two links maynot use each other in their backup paths if they may fail simul-taneously. Such a requirement is referred to as backup link mu-tual exclusion (BLME) constraint and the problem of identifying abackup path for every link that satisfies the above requirement isreferred to as the BLME problem. This paper develops the nec-essary theory to establish the sufficient conditions for existenceof a solution to the BLME problem. Solution methodologies forthe BLME problem is developed using two approaches by: 1) for-mulating the backup path selection as an integer linear program;2) developing a polynomial time heuristic based on minimum costpath routing. The ILP formulation and heuristic are applied to sixnetworks and their performance is compared with approaches thatassume precise knowledge of dual-link failure. It is observed that asolution exists for all of the six networks considered. The heuristicapproach is shown to obtain feasible solutions that are resilient tomost dual-link failures, although the backup path lengths may besignificantly higher than optimal. In addition, the paper illustratesthe significance of the knowledge of failure location by illustratingthat network with higher connectivity may require lesser capacitythan one with a lower connectivity to recover from dual-link fail-ures.

Index Terms—Backup link mutual exclusion, dual-link failures,link protection, optical networks.

I. INTRODUCTION

THE ever-increasing transmission speed in the communi-cation networks calls for efficient fault-tolerant network

design. Today’s backbone networks employ optical communi-cation technology involving wavelength division multiplexing(WDM). A link between two nodes comprises of multiple fiberscarrying several tens of wavelengths with transmission speed ona wavelength at 40 Gb/s. Due to the large volume of informationtransported, it is necessary to reduce the resource unavailability

Manuscript received October 28, 2004; revised March 27, 2006, and October7, 2006; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor N.Taft. This work was supported by the National Science Foundation under Grant0325979, Grant 0435490, and Grant EEC-0333046. Parts of this paper werepreviously published in the Proceedings of the Second International Conferenceon Broadband Networks (BROADNETS), Boston, MA, October, 2005.

S. Ramasubramanian is with the Department of Electrical and Computer En-gineering, University of Arizona, Tucson, AZ 85721 USA (e-mail: [email protected]).

A. Chandak is with the Switch Fabric and Line Card Group, Cisco Systems,San Jose, CA 95134 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TNET.2007.900368

time due to failures. Hence, efficient and fast recovery tech-niques from node and link failures are mandated in the designof high-speed networks. As link failures are the most commoncase of the failures seen in the networks, this paper restricts itsscope to link failures alone.

Optical networks of today operate in a circuit-switchedmanner as optical header processing and buffering technolo-gies are still in the early stages of research for wide-scalecommercial deployment. Protecting the circuits or connectionsestablished in such networks against single-link failures may beachieved in two ways: path protection or link protection. Pathprotection attempts to restore a connection on an end-to-endbasis by providing a backup path in case the primary (orworking) path fails. The backup path assignment may be eitherindependent or dependent on the link failure in the network. Forexample, a backup path that is link-disjoint with the primarypath allows recovery from single-link failures without the pre-cise knowledge of failure location. On the other hand, more thanone backup path may be assigned for a primary path and theconnection is reconfigured on the backup path correspondingto the failure scenario that resulted in the primary path failure.The former is referred to as failure-independent path protection(FIPP) while the latter is referred to as failure-dependent pathprotection (FDPP).

Link protection recovers from a single link failure byrerouting connections around the failed link. Such a recoverymay be achieved transparent to the source and destination ofthe connections passing through the failed link. Link protectionat the granularity of a fiber switches all of the connectionson a fiber to a separate (spare) fiber on the backup path. Thetime needed to detect the fault, communicate to the end-nodes,re-initiate connection requests on the backup paths, and recon-figure the switches at the intermediate nodes could sometimescause the layers above the optical layer to resort to their ownrestoration solutions. Link protection reduces the communica-tion requirement as compared to path protection, thus providingfast recovery. However, the downside of link protection is thatits capacity requirement is higher than that of path protection,specifically when protection is employed at the connectiongranularity [2].

Algorithms for protection against link failures have tradition-ally considered single-link failures [3]–[5] (for a detailed de-scription on protection approaches, refer to [6]). However, dual-link failures are becoming increasingly important due to tworeasons. First, links in the networks share resources such as con-duits or ducts and the failure of such shared resources result inthe failure of multiple links. Second, the average repair time fora failed link is in the order of a few hours to few days [6], and thisrepair time is sufficiently long for a second failure to occur. Al-though algorithms developed for single-link failure resiliency is

1063-6692/$25.00 © 2007 IEEE

Page 2: Dual link failure resiliency through backup link mutual exclusion

158 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

shown to cover a good percentage of dual-link failures [7]–[10],these cases often include links that are far away from each other.Considering the fact that these algorithms are not developed fordual-link failures, they may serve as an alternative to recoverfrom independent dual-link failures. However, reliance on suchapproaches may not be preferable when the links close to oneanother in the network share resources, leading to correlated linkfailures.

Dual-link failures may be modeled as shared risk link group(SRLG) failures. A connection established in the network maybe given a backup path under every possible SRLG failure. Thisapproach assumes a precise knowledge of failure locations to re-configure the failed connections on their backup paths. An alter-native is to protect the connections using link protection, whereonly the nodes adjacent to the failed link (and those involved inthe backup path of the link) will perform the recovery. The focusof this paper is to protect end-to-end connections from dual-linkfailures using link protection.

A. Dual-Link Failure Resiliency With Link Protection

Assume that two links, and , failed one after the other(even if they occur together, assume that one failed first fol-lowed by the other) in a network. The backup path of the firstfailed link is analogous to a connection (at the granularity of afiber) established between two nonadjacent nodes in the networkwith link removed. The connection is required to be protectedagainst a single-link failure. Therefore, strategies developed forprotecting connections against single link failures may be di-rectly applied for dual-link failures that employ link protectionto recover from the first failure.

Dual-link failure resiliency strategies are classified based onthe nature in which the connections are recovered from first andsecond failures. The recovery from the first link failure is as-sumed to employ link protection strategy. Fig. 1 shows an ex-ample network where link 1-2 is protected by the backup path1-3-4-2. The second protection strategy will refer to the mannerin which the backup path of the first failed link is recovered.

Link Protection—Failure Independent Protection (LP-FIP):One approach to dual-link failure resiliency using link protec-tion is to compute two link-disjoint backup paths for every link.Given a three-edge-connected1 network, there exists three link-disjoint paths between any two nodes [11]. Thus, for any twoadjacent nodes, there exists two link-disjoint backup paths forthe link connecting the two nodes. Let and denote thetwo link-disjoint backups for link . Without loss of generality,assume that, on failure of link , a connection routed along link

will be rerouted on . If any link in the backup path fails,the backup path of will be reconfigured to . Hence, the nodesconnected to link must have the knowledge of the failure in itsbackup paths (not necessarily the location).

Fig. 2 shows the backup paths assigned by LP-FIP for link 1-2under dual-link failures. The backup path is identical under anysecond failure that affects the path 1-3-4-2. When the secondfailure occurs, a failure notification must be sent to nodes 1 and2, although this notification need not explicitly mention whichlink failed in path 1-3-4-2.

1A k-edge-connected is simply referred to as a k-connected graph in the restof this paper.

Fig. 1. Link 1-2 protected by backup path 1-3-4-2 when failed.

Link Protection—Failure Dependent Protection (LP-FDP):For every second failure that affects the backup path, a backuppath under dual-link failure is provided. This backup path iscomputed by eliminating the two failed links from the networkand computing shortest path between the specific node pairs.Fig. 3 shows the backup path assigned for link 1-2 underdual-link failures. It may be observed that the backup pathassignment is different for different failures that affect the path.When a second link failure occurs, a failure notification must besent to node 1, explicitly mentioning the failure location in thepath 1-3-4-2. It is fairly straight forward to see that the averagebackup path length under dual-link failures using LP-FDP willbe lesser than that using LP-FIP. Every link is assigned onebackup path for single link failure and multiple backup paths(depending on the number of links in the backup path for thesingle link failure) under dual-link failures.

Link Protection—Link Protection (LP-LP): Notification ofthe second failed link to different nodes for them to reconfiguretheir backup paths may result in a high recovery time. In orderto avoid notification to the other nodes and reconfiguring atthe end of the paths, link protection may be adopted to recoverfrom the second link failure as well. Fig. 4 shows how thebackup path 1-3-4-2 is reconfigured after the second failure.Under this strategy, every link will have only one backup path(for all failure scenarios). In order for this strategy to work, thebackup path under the second failure must not pass throughthe first failed link! This constraint is referred to as the backuplink mutual exclusion (BLME) constraint. Note that the aboveapproach would fail if the backup path for link 3-4 is 3-1-2-4for the example in Fig. 4.

Assume we have a network , where denotes theset of nodes and denotes the set of links, and a set of dual-linkfailures where each element contains at most two linksthat can fail and that the failure does not disconnect the network.Then, for every link , identify a backup path , such that,if link is in and both links and can fail together, thenthe backup path of link does not use link .

The problem is concisely described as follows. Givenand , identify such that as

B. Background and Prior Work

A network must be three-connected for it to be resilient toany two arbitrary link failures, irrespective of the protectionstrategy employed. In [12] and [13], a heuristic solution to the

Page 3: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 159

Fig. 2. Dual-link failure resiliency using LP-FIP. Backup path after the second failure remains the same irrespective of the failure.

Fig. 3. Dual-link failure resiliency using LP-FDP.

Fig. 4. Dual-link failure resiliency using LP-LP.

BLME problem for arbitrary dual-link failures is developed.The heuristic solution, referred to as the Maximum ArbitraryDouble-Link Protection Algorithm (MADPA), is shown to findbackup paths satisfying the BLME constraint for all links in twonetworks out of the three networks considered in their paper;however, it is not guaranteed to find a solution even if one exists.In [14], a polynomial time algorithm is developed for solutionto the BLME problem considering only adjacent link failures.To the best of our knowledge, there is no prior work that estab-lishes the existence of a solution to the BLME problem.

C. Contribution

This paper develops the necessary theory to prove the suf-ficient conditions for the existence of a solution to the BLMEproblem. Solution methodologies to the BLME problem are de-veloped using two approaches by: 1) formulating the BLMEproblem as an integer linear program (ILP) and 2) developinga polynomial time heuristic algorithm. The tradeoffs involvedin solutions that have and do not have the precise knowledgeof the failure location are compared by applying the solutiontechniques to six networks. In addition, the paper also estab-lishes the potential benefits of a four-connected network over athree-connected network when the knowledge of the failure lo-cation is available. This paper assumes that every link employsone primary fiber and the failure recovery using link protectionis performed at the granularity of a fiber. Hence, issues relatingto specific connections flowing through a link (such as wave-length continuity constraint) are implicitly taken care of. It is

also assumed that the failure of a link results in the failure of allthe three fibers on that link.

D. Organization

The remainder of this paper is organized as follows. Section IIdevelops the necessary theory for the BLME problem and es-tablishes the sufficient condition for the existence of a solution.Section III describes the formulation of the BLME problem asan ILP with some specific insights into the objective functionand formulation for networks that may not be three-connected.Section IV develops a polynomial time approximation algo-rithm, called Iterative Minimum Cost Path heuristic. Section Vdiscusses the possible loop formation under the BLME ap-proach and identifies a solution to resolve such loop formations.Section VI describes the results obtained by applying the ILPformulation and heuristic to six networks. Section VII showsthe advantages of designing a four-connected network over athree-connected network given that the location of two failuresare known. Section VIII concludes the paper.

II. THEORY

Consider a network represented as a graph , whereand denote a set of nodes and undirected links, respectively.The nodes are numbered from 1 through . A link isassumed to be bi-directional. Let and denote the identifiersof the nodes connected by link such that . Letrepresent the set of directional links, or arcs, in the network. Anarc from node to is denoted as .

Page 4: Dual link failure resiliency through backup link mutual exclusion

160 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

Fig. 5. (a) Example network. (b) Network after failure of link 1.

The failure of link is assumed to affect the arcs in both direc-tions. Let denote the set of dual-link failures to be tolerated.An element consists of exactly two undirected links or,correspondingly, four directed arcs.

A. Sufficient Condition for Existence of a Solution

Three-edge-connectivity is a necessary condition for a net-work to be resilient to dual-link failures. It is also sufficient thata network is three-edge-connected in order to obtain a solutionfor BLME problem, proved as follows.

Assume that the given network is divided into auxiliarygraphs. An auxiliary graph is constructed by removing link

from the original network: . In eachauxiliary graph , the goal is to identify a path from node

to . Let be a binary variable that indicates whether linkis present in the backup path of link : 1 if true, 0 otherwise.Let be the Boolean function that denotes the connectivity

between nodes and in the auxiliary graph , representedas a function of the variable set .

Consider the example network shown in Fig. 5(a). The aux-iliary graph corresponding to link 1 is shown in Fig. 5(b). TheBoolean function representing the connectivity between nodesA and B is shown in (1) and (2) at the bottom of the page insum-of-product and product-of-sum forms, respectively.

It may be observed that is a unate function and, hence, hasa trivial solution. If the function evaluates to 1 (true) for some

input combination of , then there exists a pathbetween the nodes and in .

Observation 1: The connectivity functions and are in-dependent of each other for any two distinct links , i.e.,the functions and do not have any variables in common.

Observation 2: If for some , then the networkis one-edge connected. The failure of link disconnects the net-work. Conversely, if a network is at least two-edge-connected,then , .

The different connectivity functions are related to eachother through the BLME constraint. The BLME constraintcorresponding to a dual-link failure is written in thesum-of-product and product-of-sum forms, as shown (3) and(4), respectively.

The BLME problem is then written as a Boolean satisfiabilityproblem, denoted by , as shown in (5). It is observed that isa function of the set of variables . If theBoolean function is identically 0 for all input combinations,then the BLME problem does not have a feasible solution. If

evaluates to a non-zero function, then there exists an inputcombination for which the function evaluates to 1 (true).

Theorem 1: If a network is at least two-edge-connected and, then there exists a dual-link failure that discon-

nects the graph.Proof: Given that the network is at least two-edge-con-

nected, , . Clearly, , . Therefore:1) ;2) .

Hence, for , the conjunction of a combination of connec-tivity terms with one of the BLME constraints results in anidentically zero-function.

A dual-link failure scenario involving links and has theBLME constraint as shown in (3). If the BLME constraint corre-sponding to dual-link failure combines with the conjunction ofthe connectivity functions resulting in an identically zero-func-tion, then the conjunction of the connectivity terms must takethe form as shown in (6). Note that the first term of (6) cancelsthe BLME constraint involving the links and resulting in azero function for .

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Page 5: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 161

Fig. 6. ILP formulation of backup path selection with BLME constraint.

For any two distinct links and , and are independentof each other. Hence, (6) implies that and must be of theform

The above equations imply that, upon failure of link , any pathfrom nodes and must traverse link and, on failure oflink , any path from to must traverse link . Linksand are mutually dependent on each other for their backuppaths. Therefore, the dual-link failure involving links anddisconnects the network.

Corollary: Given a three-edge-connected network, there ex-ists a solution to the BLME problem under any arbitrary twolink failures.

Proof: The corollary follows from Theorem 1 by consid-ering all dual-link combinations in . Assume that the BLMEproblem does not have a solution for a three-edge-connectednetwork. Hence, . By Theorem 1, there exists a dual-linkfailure that disconnects the network. However, no twolink failures can disconnect the network as the network is three-edge-connected, resulting in a contradiction. Hence, .Clearly, (identically 1 for all input combinations). Hence,for a three-edge-connected network, evaluates to a nontrivial(non-zero, non-unity) Boolean function and thus must evaluateto 1 for some input combination.

III. INTEGER LINEAR PROGRAM FORMULATION

The BLME problem is formulated as an Integer Linear Pro-gram (ILP) using undirected links. The central idea behind thisILP formulation is to view the network as distinct graphs.Each graph, denoted as , will provide a backup path for link .Equivalently, each graph will have a ring traversing throughlink . Let denote the existence of a failure such that

and ; 1 if true, 0 otherwise.Let be a binary variable that indicates whether link is

present in graph : 1 if present, 0 otherwise. Similarly, letbe a binary variable that indicates whether node is present ingraph or not: set to 1 if present, 0 otherwise. Let denotewhether node is attached to link or not: 1 if true, 0 otherwise.

The formulation of backup path selection for all links satis-fying the BLME constraint is shown in Fig. 6. The objectivefunction is set to minimize the sum of the backup path lengthsof all links or, equivalently, the average backup path length of a

link under a single-link failure. The average backup path lengthunder single-link failure, denoted by , is computed as

(8)

The constraints ensures that a graph must contain aring with link present in the ring by forcing the correspondinglink variables to take a value of 1. The constraint en-sures that, for two links and that belong to , if link ispresent in , then is not present in graph if the two links

and may be unavailable at the same time. Otherwise, such arestriction is not imposed.

The constraint RC ensures that every graph has a ring andevery node that is present in a graph must have exactly twooutgoing (or incoming) links. The above constraint introduces

additional variables to the formulation, however,they are strongly correlated to the link variables .

The variables employed in the formulation are limited to takebinary values using Bounds.

A. Restricting to at Most One Ring in a Graph

The ring constraints ensure that there will be at least one ringwhich includes a particular link in that graph, however, it doesnot ensure that there will be exactly one ring in the graph. Theformulation in Fig. 6 restricts the number of rings formed in agraph to at most one implicitly through the objective functionby optimizing on the number of links in the backup path.

There are scenarios in which the formation of exactly onering in a graph may need to be explicitly mentioned. First, forlarge problems, the ILP may not result in an optimal solution;however, it may provide intermediate feasible solutions that aresignificantly far away from optimal. Even if the objective func-tion implicitly restricts the formation of only one ring per graph,suboptimal solutions may still have multiple rings in a graph.Second, if the goal is to just obtain any feasible solution, thenone might specify an objective function that is simply a con-stant, such as . The objective function always takesthe value ; therefore, any feasible solution is an optimal solu-tion. Solutions with such objectives may also result in multiplerings in a graph.

A graph that has exactly one ring present in it has a path fromevery node in the graph to every other node in the graph. A graphthat has multiple node-disjoint rings would fail this requirement.Alternatively, if we elect at most one master node that is presentin the graph, then every node that is present in the graph musthave a path to the master node. Given the nature of formulation,

Page 6: Dual link failure resiliency through backup link mutual exclusion

162 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

Fig. 7. Flow constraints to ensure formation of single ring in a graph, if the objective function does not capture the requirement inherently.

it is obvious that a graph must at least have nodes andpresent in it. Let the master node be . Every node except themaster node is assumed to source one unit capacity of traffic thatsinks at the master node. The constraints shown in Fig. 7 achievethis purpose, where is an arbitrarily large number andis the capacity of flow routed on arc in graph . The con-straint SRC 1 specifies that the master node sinksflows. SRC 2 specifies that every node except the master mustsource one unit of flow if they are present, otherwise none. Theconstraint SRC 3 specifies that the paths found for these flowsmay use only those links that are present in the graph. Everylink present in the graph is assumed to have an arbitrarily largecapacity .

The variable is formulated as real-valued, however,it will be restricted to integers due to the above constraints. Asevery graph is assumed to have a counter-rotating ring present, itis sufficient to specify the constraint for one direction. Note thatan explicit representation of this constraint requires flow-basedformulation that requires consideration of “directional links.”

B. Formulation for Networks That are Not Three-Connected

For a network to be resilient to any two link failures, the net-work must be three-edge-connected. However, some real-lifenetworks may not be three-edge-connected; hence, certain dual-link failures may disconnect the network. For example, in theNJ-LATA network of Fig. 10(f), the network will be discon-nected under three dual-link failures involving link pairs 1 and2, 11 and 16, and 22 and 23. If a dual-link failure involving twolinks and disconnects the network, it will not be possible tofind backups for links and satisfying the BLME constraintas they will have to include each other. The BLME constraintis relaxed for these failure scenarios by not considering thesedual-link failures. In other words, is assigned 0 (assumednot to fail) even though the links can be unavailable at the sametime, if their joint failure will leave the network disconnected.

C. Minimizing Spare Capacity

If the objective is to minimize the spare capacity allocatedin the network, then links may be removed from the given net-work until the transformed network just meets the necessaryconditions for the existence of a solution. For a given network

, let denote the number of link-disjoint paths be-tween the nodes and . If there exists more than three link-dis-joint paths, then is truncated to three, as three-edge-con-nectivity is a sufficient condition for the existence of a solutionto the BLME problem. Let denote the transformedgraph, where , such that the connectivity between allnode pairs is maintained in graph . Given , the spare ca-pacity on a link may be computed as follows: 1) if

a link is required to maintain three-connectivity between twonode pairs in , then it requires two spare fibers; 2) if a link isrequired to maintain two-connectivity between two node pairs,then it requires one spare fiber; and 3) if a link is required tomaintain (one-)connectivity between two node pairs, it does notrequire any spare fibers. The links removed from the given net-work need not be equipped with spare fibers.

If a given network is three-edge-connected, then the objec-tive is to keep the graph three-edge-connected with the min-imum number of links, which is a well-known to be NP-Hard.In such a reduced network, every link will employ two sparefibers. When a network is less than three-edge connected, thena link may still require two spare fibers even though a dual-linkfailure involving may disconnect the graph. For example, con-sider the NJ-LATA network shown in Fig. 10(f). There are threedual-link failures that disconnect the graph. However, links 1and 2 are required to maintain three-connectivity between nodes2 and 3. Hence, two spare fibers are required on links 1 and 2.On the other hand, the removal of one of the other four links (11,16, 22, and 23) does not violate the three-connectivity of othernodes. Hence, links 11, 16, 22, and 23 may be equipped withonly one spare fiber as long as they are not required to main-tain the connectivity between the node pairs when other linksare removed. Note that if link 20 is removed, then links 22 and23 are required to maintain three-connectivity between nodes 9and 10, hence will require two fibers on each. The increase inspare capacity required in other links due to removal of a linkdepends on the network topology. It is not necessary that min-imizing the spare capacity implies minimizing in a graphthat is less than three-connected (or vice versa). If a networkis at least three-connected, then minimizing spare capacity isequivalent of minimizing the number of links to keep the graphthree-connected.

IV. HEURISTIC APPROACH

As ILP solution times for large networks may be prohibitivelyhigh, a heuristic approach is also developed. The heuristic solu-tion is based on iterative computation of minimum cost routing.The network is treated as an undirected graph . A set of auxil-iary graphs corresponding to failure of a link is created:

. In each auxiliary graph , the objec-tive is to obtain a path between the nodes that were originallyconnected by link . Let denote the path selected in auxiliarygraph . If a link is a part of the path selected on graph ,then the path in graph must avoid the use of link . This isaccomplished by imposing a cost on the links in the auxiliarygraphs and having the path selection approach select the min-imum cost path. Let denote the cost of link on graphsuch that it indicates that graph contains link and the two

Page 7: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 163

Fig. 8. Steps involved in the IMCP heuristic solution.

links and may be unavailable simultaneously. Hence, thecost values are binary in nature.

The cost of a path in an auxiliary graph is the sum of the costof links in it. At any given instant during the computation, thetotal cost of all the paths is the sum of the cost of the pathsacross all auxiliary graphs. It may be observed that the total costmust be an even number, as every link in a path that has acost of 1 implies that link in path would also have a costof 1. For a given network, the minimum value of the total costwould then be two times the number of dual-link failure sce-narios that would have the network disconnected. If denotesthe number of dual-link failure scenarios that would disconnectthe graph, then the termination condition for the heuristic isgiven by . Fig. 8 shows the steps involved in the Iter-ative Minimum Cost Path (IMCP) heuristic.

The complexity of an iteration of the IMCP heuristic isdictated by the (backup) path-selection step (Step 4.2.), whosecomplexity is . The complexity of an iter-ation of IMCP (decided by Step 4) is .The number of iterations required to obtain a solution dependson the connectivity of the network. Although there is a termi-nating condition specified for this algorithm, the algorithm isnot guaranteed to terminate for certain networks. Hence, onecan limit the number of iterations to be performed to a fixedthreshold. If the algorithm is restricted to a maximum ofiterations, the IMCP heuristic is pseudopolynomial with a worstcase complexity of .

Note that the weights of the links in the auxiliary graphs areinitialized to zero (Step 2 of the IMCP heuristic). The weightsmay be initialized to a small positive value to obtain backuppaths with shorter path lengths. Such a modification, however,would result in a trade-off between the average backup pathlength and the number of dual-link failures tolerated.

V. LOOP FORMATION

The backup path of a link after its failure is analogous toa connection established in a network which is protected usinglink protection (for the second failure). Hence, all of the prop-erties of a link protection strategy for a connection in a regularnetwork is valid in dual-link failure resiliency using link protec-tion. Loop formation is one among them!

Fig. 9. Illustration of loop formation in dual-link failure resiliency with BLME.A solid line indicates a single-hop path. A dotted line indicates a single- ormulti-hop path. (a) Looping with a link traversed in opposite directions. (b)Looping with links 2-3 and 6-7 are traversed twice in the same direction, thusrequiring pruning.

Consider a failed link (connecting nodes 1 and 8) whosebackup is established along a path where the nodes in the pathare numbered from 1 through 8. Fig. 9 shows two kinds of loopformation.

In Fig. 9(a), link 4-5 is protected by the backup path 4-7-6-5.Upon failure of link 4-5, the backup between nodes 1 and 8 ismodified as 1-2-3-4-7-6-5-6-7-8, resulting in the loop 7-6-5-6-7.While this loop could be pruned using signaling, it is only nec-essary to reduce path delay. The backup path for 4-5 will routeboth its primary fiber and the secondary fiber of link 1-8 throughthe path 4-7-6-5, hence two spare fibers will be used in each ofthese links. The loop formation involves the same links, howeverthe links are traversed in opposite direction. As every link needsto be equipped with two spare fibers in each direction (for bi-di-rectional connectivity), there will not be a resource contention.

Fig. 9(b) shows another kind of loop formation where there isa potential resource contention. The backup path of link 4-5 is4-2-3-6-7-5. Upon failure of link 4-5, the backup between nodes1 and 8 is modified as 1-2-3-4-2-3-6-7-5-6-7-8, resulting in twoloops 2-3-4-2 and 7-5-6-7. Note that links 2-3 and 6-7 are tra-versed in the same direction. If such a loop formation is allowed,then links 2-3 and 6-7 must be equipped with three spare fibersas the backup path for link 4-5 would be switching two fibers.As the network is assumed to have at most two link failures,it must be sufficient to equip every link with two spare fibers.

Page 8: Dual link failure resiliency through backup link mutual exclusion

164 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

Fig. 10. Networks considered for performance evaluation. (a) ARPANET (20 nodes, 32 links). (b) NSFNET (14 nodes, 23 links). (c) Node-16 (16 nodes, 24 links).(d) Node-28 (28 nodes, 42 links). (e) Mesh-4 � 4 (16 nodes, 32 links). (f) NJ-LATA (11 nodes, 23 links).

Therefore, if the links have only two spare fibers, pruning of thebackup paths cannot be avoided. The pruned backup path be-tween nodes 1 and 8 after link 4-5 failure would be 1-2-3-6-7-8,while the backup path of link 4-5 would be 4-2-3-6-7-5.

The looping problem described here is similar to those en-countered in any link protection mechanism. The paths can bepruned using a signaling mechanism that would be required toestablish the backup paths. Note that the signaling cannot becompletely avoided as a link can serve as backup for more thantwo other links, hence protection switches cannot be configuredprior to failure.

VI. PERFORMANCE EVALUATION

The performance of the ILP and heuristic algorithm de-veloped in this paper are evaluated by applying them to sixnetworks as shown in Fig. 10: (a) ARPANET, (b) NSFNET,2(c) Node-16, (d) Node-28, (e) Mesh-4 4, and (f) NJ-LATA. Allnetworks except NJ-LATA are three-connected. The NJ-LATAnetwork is not three-connected as nodes 1, 6, and 11 havedegree 2 and is considered “as is” for performance evaluation.The formulation for the NJ-LATA network has been modifiedas outlined in Section III-B. The Node-16 and Node-28 net-works are hypothetical networks used to test the limits of theILP. All of the nodes in these two networks have exactly threelinks connected to them, thus these two networks are minimally3-connected.

2The NSFNET network considered here has been modified from the originalnetwork with the addition of link numbered 23 to keep the network three-con-nected.

Dual-link failure scenarios occur in networks due to two rea-sons, as mentioned in Section I. First, link resources such as con-duit or duct are shared by multiple links for ease of layout. Suchsharing of resources is typically limited to links that are close toeach other, such as adjacent links. Hence, dual-link failure sce-narios under such shared resource failure typically affect onlynearby links. The second case of dual-link failure scenario isdue to the time required to repair a failed link. Before a failedlink is repaired, another link in the network could fail; however,such failures are typically rare. If it can be assumed that mostof the dual-link failures may be caused because of failures ofshared resources, then it is of interest to identify backup pathassignments by considering only failures of nearby links.

Three kinds of dual-link failures are considered: 1) any arbi-trary two link failures; 2) links that are one node away; and 3)links that are two nodes away. Note that any dual-link failurethat will disconnect the network is not considered in computingthe number of failures that can be tolerated.

A. Performance Metrics

The performance metrics considered specifically for the ILPsolutions are: 1) solution time and 2) optimality bound. The op-timality bound is relevant in scenarios where the ILP could notobtain optimal solution, but has a feasible solution with a knownbound on optimality. The ILP is solved using the CPLEX 8.1solver [15] on a single-processor Pentium4 2.53 GHz computerwith 512 MB of RDRAM. The optimality bounds reported inthis paper are those provided by the CPLEX solver. The metricthat is considered specifically for heuristic is the number of

Page 9: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 165

TABLE IILP RESULTS FOR TOLERATING ANY TWO-LINK FAILURES

dual-link failures that can be tolerated, as the heuristic is notguaranteed to recover from all dual-link failures.

In addition to the metrics that are specific to the ILP andheuristic, certain common metrics for both the approaches arealso evaluated: 1) average and maximum backup path lengthfor a single link failure; 2) average and maximum backup pathlength under dual-link failure scenario; and 3) total spare ca-pacity required.

The backup path length under a single link failure scenariois simply the average of the length of the backup paths of allthe links. If denotes the length of the backup path of link ,then the average backup path length under a single link failurescenario, denoted by , is obtained as

(9)

The maximum backup path length under single-link failure sce-nario is obtained as

(10)

The backup path of a link under dual-link failure scenario iscomputed as: 1) the backup path length of the link alone if thebackup does not use the second link and 2) it is computed afterpruning if the backup employs the second failed link. Letdenote the length of the backup path of link when links and

have failed. The average and maximum backup path lengthunder dual-link failures, denoted by and , respectively,are computed as

(11)

(12)

The additional capacity requirement is computed as discussedin [12] as follows.

• If a link is not in the backup of any other link, i.e.,, then no additional capacity is required on that

link.• A link will require 200% additional capacity under two

conditions: 1) link appears in the backup path for twolinks and that could be unavailable simultaneously,i.e.,

or 2) link is in the backup of at least one link , link isalso in the backup of at least one link , and links and

can both be unavailable at any given time as

• In all other cases, the link requires one additional fiber,which is a 100% additional capacity.

B. ILP Results

Table I shows the results for the six networks to be resilient toany arbitrary two-link failure obtained using ILP with the objec-tive to optimize the average backup path length under single-linkfailure scenario. The CPLEX program terminated due to insuffi-cient memory for Node-28 and Mesh-4 4 networks. While thisis indicative of the complexity of the problem, feasible solutionswere obtained as intermediate values. The best value obtainedbefore termination is reported for these networks. It is observedthat the solution time increases with increase in the network sizebut decreases with an increase in average node degree. Note thatNSFNET and NJ-LATA networks both have 23 links, however,the solution times are significantly different due to their connec-tivity. For scenarios where an optimal solution is not found, thevalue of optimality bound indicates the worst case deviation ofbest value from the optimal. It is to be noted that, although anoptimal solution may not be obtained, a feasible solution is ob-tained for all the networks considered, confirming the existenceof a solution.

It is observed that a 200% additional capacity (two sparefibers) is required in all of the links of all of the networks exceptNJ-LATA. Such a requirement can be immediately deducedfrom the connectivity of the network. For example, whenevera link is necessary3 to keep the network three-connected, thensuch a link must have two spare fibers. Thus, the networksNode-16 and Node-28 will require 200% additional capacityeven when only adjacent links may fail together. Such a 200%requirement in capacity may be reduced only on those linkswhose removal does not affect the three-connectivity prop-erty of the network. For example, the link between WA andUT in the NSFNET may be removed without affecting thethree-connectivity property of the network. However, such asolution would have an increased average backup path length.For the NJ-LATA network, two of the three link pairs whose

3Removal of the link will result in the network not being three-connected.

Page 10: Dual link failure resiliency through backup link mutual exclusion

166 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

TABLE IIILP RESULTS FOR TOLERATING ANY TWO-LINK FAILURES WITH THE OBJECTIVE FUNCTION SET TO BE CONSTANT

(INDEPENDENT OF THE VARIABLES) TO OBTAIN ANY FEASIBLE SOLUTION

TABLE IIIILP RESULTS FOR TOLERATING TWO ADJACENT LINK FAILURES

failure disconnects the network were not present in more thanone backup path, thus a reduction of four fibers was obtained.The high connectivity in the NJ-LATA network results in thisreduction even when the objective function is not set to mini-mize capacity, which is purely coincidental. Such a reductioncannot be guaranteed for all networks. It is also observed thatthe average backup path length under dual-link failures (forNode-16 and Node-28 networks) may be lower than the averagebackup path length under single-link failure scenarios due topath pruning.

Typical of any ILP formulation, the solution time increaseswith the network size. In order to obtain any solution that wouldsatisfy the constraints, the ILP is solved with an objective func-tion whose value is a constant, such as . The resultsfor the six networks are shown for this scenario in Table II. Itis observed that the solution times are significantly reduced ascompared with that needed to solve for minimizing the averagebackup path length. The deviation of the average backup pathunder single-link failures computed with the trivial objectivefunction from the best/optimal values shown in Table I is ob-served to be between 23% to 68% for the different networks.These results indicate that attempting to find any solution at thecost of reduced solution time will result in backup paths whoseaverage length is significantly higher than that of the optimal.It is also observed that the average backup path length underdual-link failures is lesser than that under single-link failuresfor more networks, which is a result of path pruning.

Table III shows the results obtained from ILP consideringonly adjacent link failures. The number of dual-link failure sce-narios in such a case decreases, thus leading to a significant im-provement in solution time and average hop length. The CPLEXprogram terminated due to insufficient memory for Node-28network. The best integer solution obtained at the time of ter-mination is reported in the table. The backup path length undera single link failure scenario is considerably reduced when com-pared to the results for any arbitrary link failures (10.191 to6.786 for Node-28).

The backup path length under dual-link failures decreasesfor adjacent failures compared with arbitrary failures, and sodoes the number of dual-link failures in the network. If the re-duction in the backup path length dominates the reduction inthe number of dual-link failures, then the average backup pathlength for adjacent link failures will be lesser than that of arbi-trary link failures, otherwise it will be higher. The former is thecase for Node-16 and Node-28 networks but latter is the casefor NSFNET, ARPANET, NJ-LATA networks. Mesh-4 4 re-mains unaffected.

C. Heuristic Results

Tables IV–VI show the results obtained using the IMCPheuristic approach for tolerating any arbitrary two link failures,dual-link failure within a distance of two, and two adjacentlink failures, respectively. Interestingly, this simple heuristicobtains a solution that can tolerate most dual-link failures for

Page 11: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 167

TABLE IVPERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR TOLERATING ANY TWO LINK FAILURES

TABLE VPERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR RECOVERING FROM DUAL-LINK FAILURES WITHIN DISTANCE 2

TABLE VIPERFORMANCE COMPARISON OF IMCP HEURISTIC AND LP-FDP FOR TOLERATING TWO ADJACENT LINK FAILURES

the networks considered; the solution cannot recover from fourdual-link failures for Node-28 for any arbitrary two link failuresand one for any adjacent failures.

The heuristic produces a solution in relatively less number ofiterations for five of the six scenarios. A maximum of 30 iter-ations were performed. While the objective of the heuristic isto obtain a feasible solution, it is not guaranteed to find a solu-tion (as seen in the Node-28 network scenario for any arbitrarytwo link failure scenario). The results presented in this paper areobtained by iterating over a given ordered set of links. In ourstudy where we considered iterating the auxiliary graphs in thereverse order, the heuristic solution could not find a feasible so-lution for Node-16 and Node-28 network scenarios for arbitrary

two link failure scenarios. The number of iterations required toarrive at the solution depends on a lot of parameters, specificallythe order in which the auxiliary graphs are considered and theweights employed. Comparing the results of the heuristic to thatof the ILP, it is observed that the heuristic can be as bad as 60%above optimal for average backup path lengths.

Comparing with the MADPA heuristic in [12] and [13], theIMCP heuristic obtains a solution that recovers from all dual-link failures for ARPANET while MADPA tolerates only 490(out of 496) dual-link failure combinations. Comparison withthe average path length obtained by MADPA is not performedas the authors of the work indicated that the results presented in[12] and [13] were obtained before pruning.

Page 12: Dual link failure resiliency through backup link mutual exclusion

168 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 1, FEBRUARY 2008

D. Comparison With LP-FDP

The tables also compare the heuristic results (LP-LP) withthe LP-FDP approach which assigns failure-dependent backuppaths for every link. First, LP-FDP recovers from all dual-linkfailures that do not disconnect the network. It can be observedthat the benefits of knowing the link failures is significantly highunder dual-link failure scenarios. The optimal (or best) valuesof the backup path lengths obtained by optimizing the averagebackup path lengths under LP-LP are found to be twice as highas the backup path lengths obtained under LP-FDP. Note thatthis increased backup path length for LP-LP may also result inan increased recovery time from the second failure compare toLP-FDP. While the significant advantages of employing LP-LPrecovery strategy in the network is that every link will have onlyone backup path and the failure need not be explicitly notifiedto nodes that are not present in the backup path, the increasedbackup path length under both single- and dual-failure scenariosmay pose significant operational challenges.

VII. CAPACITY OPTIMIZATION FOR DUAL-LINK FAILURE

RESILIENCY IN HIGHLY CONNECTED NETWORKS

The ILP developed in this paper has an objective of reducingthe average path length while the IMCP heuristic attempts tojust find any feasible solution. Often network designers are in-terested in minimizing network capacity. While the ILP couldbe modified to minimize total capacity, a direct way to reducethe capacity would be to reduce the network to a minimallythree-connected network with every link having one workingand two spare fibers. This capacity calculation is only appli-cable for the LP-LP strategy. The ILP or IMCP heuristic maybe employed on the minimally three-connected graph to obtainbackup paths. The average backup path lengths (under singleand double link failures) will increase as the reduced networkwill have lower connectivity.

It is possible to reduce the network capacity given the preciseknowledge of the dual-link failure (even under any two arbitrarylink failures). For example, assume that every link is assigneda backup path for every dual-link failure (similar to LP-FDP).Under a dual-link failure involving links and , if the backuppaths (for ) and (for ) are link-disjoint, then every linkon these two backup paths will require only one spare fiber forthis joint failure. If such a requirement is satisfied for all dual-link failures, then every link requires only one spare capacity.If a network is minimally three-connected, then every link willrequire two spare fibers.

If a network is (even minimally) four-connected, then theabove requirement can be satisfied for all the links in the net-work, as shown by the following theorem.

Theorem: Given a four-connected network witha dual-link failure involving any two arbitrary links (con-necting nodes and ) and (connecting and ), thereexists backup paths for link and for link such that

and are link-disjoint.Proof: After two link failures, the resultant network

is either: 1) at least three-connectedor 2) two-connected.

Case 1: If is three-connected, then by a variant ofMenger’s theorem [11] there exists three link-disjoint pathsfrom a source to three different destinations , , and .

Fig. 11. A four-connected network where removal of links ` and ` results ina two-connected network.

Setting , , , and , there existsthree link-disjoint paths , , and .By combining the paths , and , a path

(through ) is obtained that is link-disjoint with.

Case 2: If is two-connected, there exists a cut-set4 in thegraph as shown in Fig. 11.

Given that the original network is four-connected, there existsfour link-disjoint paths between any two nodes. Therefore, thereexists four link-disjoint paths from to . The four link-dis-joint paths must be of the form

As the above four paths between and are link-disjoint,the paths , , and must also bemutually link-disjoint. By combining the paths and

, a path can be formed that is link-disjointwith the path . Using a similar argument, two link-dis-joint paths and may be obtained.

Note that the paths and are obviouslylink-disjoint with the paths and as they be-long to two disjoint cut-sets. Using the above link-disjoint paths,backup paths for links and may be obtained as

The above two backup paths are link-disjoint as the individualpath segments are mutually link-disjoint.

Incorporating such a constraint into LP-FDP will result inincreased backup path lengths and requires further study. Thecapacity savings obtained in the network with the knowledge offailure locations could be significant. For example, consider anetwork with nodes (assume to be even) that can be eitherminimally three-connected or minimally four-connected. Fora network to be minimally three (four)-edge-connected, everynode must have a degree of at least three (four). The minimumnumber of links required to form a minimally three-connectednetwork is . These links must be equipped with one

4The figure depicts only the nodes and links that are of interest. Some nodesand links that lie within the sets to keep the network four-connected are notshown.

Page 13: Dual link failure resiliency through backup link mutual exclusion

RAMASUBRAMANIAN AND CHANDAK: DUAL-LINK FAILURE RESILIENCY THROUGH BLME 169

working fiber and two spare fibers to recover from any two linkfailures (irrespective of whether failure locations are knownor not). Thus, a total of capacity is required. On theother hand, the minimum number of links required to form aminimally four-connected graph is , where each link willbe equipped with one working and one spare fiber. Thus, atotal of capacity is required.5 The MESH-4 4 network(16 nodes and 32 links) considered in this study is a minimallyfour-connected network that can be reduced to a minimallythree-connected network by removing eight links. The capacityprovisioned under minimally three-connected and minimallyfour-connected scenarios would be 72 (24 working 48 spare)and 64 (32 working 32 spare) fiber links, respectively.

The advantages of a minimally four-connected networkover a minimally three-connected network are: 1) reductionin total capacity by up to 11% (and possibly associated linkcosts like line amplifiers); 2) higher connectivity (resulting inshorter paths, faster recovery notification); and 3) up to 33%increased primary capacity even with the 11% reduction intotal capacity. The drawbacks of the four-connected networkare: 1) possible increase in network cost due to longer links;2) increased switching requirement at nodes (due to increasein node degree); and 3) increased link failure rate due to in-crease in the number of links by up to 33%. Although precisequantification of the tradeoffs involved in the design will varyamong networks, it is worth noting that designing a networkwith increased network connectivity may not necessarily be anexpensive proposition.

VIII. CONCLUSION

This paper formally classifies the approaches for providingdual-link failure resiliency. Recovery from a dual-link failureusing an extension of link protection for single link failureresults in a constraint, referred to as BLME constraint, whosesatisfiability allows the network to recover from dual-linkfailures without the need for broadcasting the failure location toall nodes. The paper develops the necessary theory for derivingthe sufficiency condition for a solution to exist, formulates theproblem of finding backup paths for links satisfying the BLMEconstraint as an ILP, and further develops a polynomial timeheuristic algorithm. The formulation and heuristic are appliedto six different networks and the results are compared. Theheuristic is shown to obtain a solution for most scenarios with ahigh failure recovery guarantee, although such a solution mayhave longer average hop lengths compared with the optimalvalues. The paper also establishes the potential benefits ofknowing the precise failure location in a four-connected net-work that has lower installed capacity than a three-connectednetwork for recovering from dual-link failures.

REFERENCES

[1] A. Chandak and S. Ramasubramanian, “Dual-link failure resiliencythrough backup link mutual exclusion,” in Proc. IEEE Int. Conf. Broad-band Networks, Boston, MA, Oct. 2005, pp. 258–267.

5Note that this computation does not take into account the lengths of the link.The cost of increasing the connectivity could be higher if the additional linksare longer, resulting in lower capacity savings.

[2] J. Doucette and W. D. Grover, “Comparison of mesh protection andrestoration schemes and the dependency on graph connectivity,” inProc. 3rd Int. Workshop Design of Reliable Communication Networks(DRCN 2001), Budapest, Hungary, Oct. 2001, pp. 121–128.

[3] M. Medard, S. G. Finn, and R. A. Barry, “WDM loop-back recoveryin mesh networks,” in Proc. IEEE INFOCOM, 1999, pp. 752–759.

[4] S. S. Lumetta, M. Medard, and Y. C. Tseng, “Capacity versus ro-bustness: A tradeoff for link restoration in mesh networks,” J. Lightw.Technol., vol. 18, no. 12, pp. 1765–1775, Dec. 2000.

[5] G. Ellinas, G. Halemariam, and T. Stern, “Protection cycles in WDMnetworks,” IEEE J. Sel. Areas Commun., vol. 8, no. 10, pp. 1924–1937,Oct. 2000.

[6] W. D. Grover, Mesh-Based Survivable Networks: Options and Strate-gies for Optical, MPLS, SONET and ATM Networking. Upper SaddleRiver, NJ: Prentice-Hall, 2003.

[7] M. Fredrick, P. Datta, and A. K. Somani, “Sub-graph routing: A novelfault-tolerant architecture for shared-risk link group failures in WDMoptical networks,” in Proc. 4th Int. Workshop Design of Reliable Com-munication Networks (DRCN 2003), Banff, AB, Canada, Oct. 2003,pp. 296–303.

[8] M. Clouqueur and W. Grover, “Availability analysis of span-restor-able mesh networks,” IEEE J. Sel. Areas Commun., vol. 20, no. 4, pp.810–821, May 2002.

[9] M. Clouqueur and W. D. Grover, “Mesh-restorable networks withcomplete dual-failure restorability and with selectively enhanceddual-failure restorability properties,” in Proc. OPTICOMM, 2002, pp.1–12.

[10] J. Doucette and W. D. Grover, “Shared-risk logical san groups in span-restorable optical networks: Analysis and capacity planning model,”Photon. Netw. Commun., vol. 9, no. 1, pp. 35–53, Jan. 2005.

[11] J. A. Bondy and U. S. R. Murthy, Graph Theory With Applications.New York: Elsevier, 1976.

[12] H. Choi, S. Subramaniam, and H. Choi, “On double-link failure re-covery in WDM optical networks,” in Proc. IEEE INFOCOM, 2002,pp. 808–816.

[13] H. Choi, S. Subramaniam, and H. Choi, “Loopback recovery fromdouble-link failures in optical mesh networks,” IEEE/ACM Trans.Netw., vol. 12, no. 6, pp. 1119–1130, Dec. 2004.

[14] H. Choi, S. Subramaniam, and H.-A. Choi, “Loopback recovery fromneighboring double-link failures in WDM mesh networks,” Inf. Sci. J.,vol. 149, no. 1, pp. 197–209, Jan. 2003.

[15] CPLEX Solver. [Online]. Available: http://www.cplex.com

Srinivasan Ramasubramanian (S’99–M’02)received the B.E. (Hons.) degree in electrical andelectronics engineering from Birla Institute of Tech-nology and Science (BITS), Pilani, India, in 1997,and the Ph.D. degree in computer engineering fromIowa State University, Ames, in 2002.

Since August 2002, he has been an Assistant Pro-fessor with the Department of Electrical and Com-puter Engineering, University of Arizona, Tucson. Heis a codeveloper of the Hierarchical Modeling andAnalysis Package (HIMAP), a reliability modeling

and analysis tool, which is currently being used at Boeing, Honeywell, andseveral other companies and universities. His research interests include archi-tectures and algorithms for optical and wireless networks, multipath routing,fault tolerance, system modeling, and performance analysis. He has served asthe TPC Co-Chair of BROADNETS 2005 conference and is an editor of theSpringer Wireless Networks Journal.

Amit Chandak received the B.Tech. degree in elec-trical engineering from the Indian Institute of Tech-nology, Bombay, India, in 2002, and the M.S. degreein electrical and computer engineering from the Uni-versity of Arizona, Tucson, in 2004.

From April 2005 to February 2007, he was aNetwork Software Engineer with Intel Corporation,where he was involved with the fields of networkprocessors and virtualization technology. SinceFebruary 2007, he has been a Software Engineerwith Cisco Systems, San Jose, CA, where he is part

of the Switch Fabric and Line Card group for CRS-1 platform. His interestsinclude networking protocols, embedded systems, and fault tolerance.