Design, Implementation and Evaluation of an IP Fast ReRoute ...

55
Budapest University of Technology and Economics Faculty of Electrical Engineering and Informatics Dept. of Telecommunications and Media Informatics Design, Implementation and Evaluation of an IP Fast ReRoute Prototype Authors: eterSzil´agyi,Zolt´anT´oth Supervisors: G´abor Enyedi (BME–TMIT), G´abor R´ etv´ari(BME–TMIT), Andr´asCs´asz´ar(EricssonHungaryLtd.) Budapest, 2008

Transcript of Design, Implementation and Evaluation of an IP Fast ReRoute ...

Page 1: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Budapest University of Technology and EconomicsFaculty of Electrical Engineering and Informatics

Dept. of Telecommunications and Media Informatics

Design, Implementation and Evaluation of an

IP Fast ReRoute Prototype

Authors:

Peter Szilagyi, Zoltan Toth

Supervisors:

Gabor Enyedi (BME–TMIT), Gabor Retvari (BME–TMIT),Andras Csaszar (Ericsson Hungary Ltd.)

Budapest, 2008

Page 2: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype i

Contents

Abstract 1

Kivonat (Abstract in Hungarian) 2

1 Introduction 3

2 Failure recovery in traditional IP networks 52.1 Routing protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Open Shortest Path First (OSPF) . . . . . . . . . . . . . . . . . . . . . . . 62.3 Failure handling and recovery using OSPF . . . . . . . . . . . . . . . . . . 8

3 IP Fast ReRoute 93.1 Equal Cost Multiple Path (ECMP) . . . . . . . . . . . . . . . . . . . . . . 93.2 Loop-free Alternates (LFA) . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Multiple Routing Configuration (MRC) . . . . . . . . . . . . . . . . . . . . 113.4 Failure Insensitive Routing (FIR) . . . . . . . . . . . . . . . . . . . . . . . 113.5 Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5.1 Not-via addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.5.2 Repair path computation . . . . . . . . . . . . . . . . . . . . . . . . 133.5.3 The LAN problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.6 Lightweight Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.6.2 Redefining the semantics of not-via addresses . . . . . . . . . . . . 173.6.3 Repair path computation . . . . . . . . . . . . . . . . . . . . . . . . 173.6.4 Removing corner cases . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Design 194.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Notations and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Routing table calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Lightweight Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Routing table management . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Transient-to-persistent failure switch-over . . . . . . . . . . . . . . . . . . . 30

5 Prototype implementation 325.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2 Software components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.3 Address management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3.1 Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3.2 Lightweight Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Page 3: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype ii

5.4 Messages and states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.5 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Evaluation 386.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 Failure recovery time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.2.1 Not-via . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2.2 OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3 Management cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Conclusion 48

Page 4: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 1

Abstract

The emerging services of the Internet are increasingly turning into an integral part of oureconomy, science and everyday life. New real-time applications, appearing in the past fewyears, like IP telephony, video telephony and IPTV, have gained mainstream usage andsuccess, and are now beginning to stress IP networks demanding high Quality of Service(QoS). However, these requirements usually cannot be met by contemporary IP networks.The reason for this, among others, is the obsolete resilience mechanism of pure IP-basednetworks, causing significant packet loss and intolerable delay in case of a failure.

Traditional IP routing protocols (such as OSPF) utilize global, reactive approach tofailure recovery, which leads to slow reconvergence, or even transient routing loops. Inorder to overcome these limitations, a fast, full-IP resilience mechanism has been developedby IETF, namely the IP Fast ReRoute (IPFRR) framework. IPFRR adopts a local andproactive failure mitigation scheme: only routers adjacent to the failure are involved in therepairing process by rerouting packets so that they avoid the failed component. Backuppaths are computed and installed in advance, thus reducing the repair process to a locallyperformable routing change.

In this work, our aim was to compare IPFRR to standard IP resilience. We examinedtwo IPFRR methods: Not-via, an IETF draft, and the emerging Lightweight Not-via, arevised version of the former method developed at our department. Our motivation wasto confirm that (1) IPFRR is substantially faster in repairing failures, and (2) the revisedNot-via is more efficient and carries less management burden. In order to achieve thisgoal, we designed an IPFRR prototype, implemented our design and deployed the resul-tant prototype implementation in a real Linux-based testbed. Our prototype uses Bidi-rectional Forwarding Detection (BFD) for fast failure detection, supports communicatingwith OSPF to query topology information and globally synchronized transient-to-persistentfailure switch-over. We conducted a series of measurements to evaluate recovery time andpacket loss. To asses the implied management cost, we measured the total number ofrouting entries and the time needed for routing table calculations. The results indicatethat IPFRR, in combination with BFD and OSPF, provides a particularly viable routingsolution suitable even for carrier-grade applications.

Page 5: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 2

Kivonat (Abstract in Hungarian)

Az Internet – folyamatosan bovulo szolgaltatasai reven – egyre inkabb a gazdasag, a tav-kozles es a mindennapi elet szerves reszeve valik. Az utobbi nehany evben megjeleno,magas szolgaltatas-minoseg (Quality of Service) igennyel rendelkezo valosideju alkalma-zasok, ugymint IP telefonia, video telefonia vagy IPTV, ujfajta nyomast gyakorolnak azIP halozatokra. Ezeknek az elvarasoknak azonban a mai IP halozatok tobbnyire nem fe-lelnek meg. Ennek oka egyebek mellett a tisztan IP-alapu halozatok elavult hibajavıtasimechanizmusaban keresendo, amely hiba eseten jelentos csomagveszest es elfogadhatatlankesleltetest eredmenyez.

A hagyomanyos IP utvalaszto protokollok (pl. az OSPF) globalis, reaktıv megkozelıtestalkalmaznak a hibak javıtasara. Ez a topologia lassu konvergenciajahoz, egyes esetekbena csomagok atmeneti hurokba kerulesehez is vezethet. Ezen hatranyok kikuszobolese erde-keben az IETF kifejlesztett egy teljesen IP alapu, gyors hibajavıto mechanizmust, az IPFast ReRoute (IPFRR) keretrendszert. Az IPFRR lokalis, proaktıv megkozelıtest alkal-maz: a hibajavıtasban csak a hibaval szomszedos utvalasztok vesznek reszt, atiranyıtva acsomagokat a hibas komponenst elkerulo utra.

Munkank soran az IPFRR es a hagyomanyos IP halozatok hibajavıto kepesseget hason-lıtottuk ossze. Ket IPFRR modszert vizsgaltunk: az IETF draft statuszban levo Not-viat, esennek egy javıtott valtozatat, a tanszekunkon fejlesztett Lightweight Not-via algoritmust.Celunk annak igazolasa volt, hogy (1) az IPFRR lenyegesen gyorsabban javıt hibakat, es (2)a Lightweight Not-via hatekonyabb mukodes mellett alacsonyabb menedzsment koltseggeljar. A kituzott cel eleresehez terveztunk es kifejlesztettunk egy IPFRR prototıpust, majda kapott rendszert telepıtettuk Linux-alapu teszthalozatunkra. A prototıpus BidirectionalForwarding Detection (BFD) modszert hasznal gyors hibadetektalashoz, egyuttmukodik azOSPF protokollal a halozat topologiajanak lekerdezesehez, valamint tamogatja a tranzienstıpusu hibarol perzisztenre valo atvaltast. A rendszeren mereseket vegeztunk a hibajavıtasiido es a csomagvesztes meghatarozasara. A menedzsment koltseg becslesehez osszehason-lıtottuk a letrehozott routing bejegyzesek szamat es az utvalaszto tablak kiszamıtasahozszukseges idot. Az eredmenyek alapjan a BFD es OSPF protokollokkal kiegeszıtett IPFRRkiemelkedoen eros hibavedelemmel rendelkezo, akar szolgaltatoi halozatokban is alkalmaz-hato utvalasztasi megoldast nyujt.

Page 6: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 3

1 Introduction

Services provided over the Internet nowadays are evolving fast, and they tend to be used asthe main medium for everyday communications. Besides today’s most popular applicationslike e-mail or instant messaging, there is a growing demand for services offered by mul-timedia applications, such as VoIP, IPTV or video telephony. These applications requirethat the connections provided by the underlying network have low latency and delay to afeasible extent.

Applications willing to establish communication between hosts scattered around theworld are condemned to use the Internet Protocol (IP), since this ubiquitous protocol isthe one used for connecting separate networks. However, new technologies bring additionalrequirements that cannot be completely fulfilled by IP networks, including the Internet.The reason is that conventional IP routers only provide best-effort services, with no guar-antee for any QoS parameters. For real-time applications, the most important of theseparameters are the packet delay, and in connection, the failure recovery time.

In large networks, failures occur every now and then inevitably, so striving for a failure-free configuration is not an option that network operators can take. Instead, failures mustbe recognized and handled by the network, preferably in a way that is transparent forthe applications, and cause the smallest glitch in traffic that is possible. As a rule ofthumb, the network should adapt to any changes in the topology within 50 ms in orderto provide a good quality real-time service. Unfortunately, failure recovery mechanisms intraditional IP networks are slow, the recovery time might be in the order of seconds or evenworse. Everyday applications (e.g. e-mail, web browsing, data transfer) usually tolerateeven that amount of delay, but multimedia applications require substantially faster responsefor enjoyable quality.

Another aspect is that permanent failures are not very frequent in real networks. In-stead, most of the failures are transient, causing only temporary, short cut-offs in thetraffic. Proper handling of this kind of failure is difficult, because the network has toadapt to topology changes twice within a small interval: first, when the connection is lost,and second, when it is is re-established. It is obvious that when a failure appears, it isimpossible to predict how long it would last, thus routers cannot make a distinction be-tween permanent and transient failures in advance. Therefore, new resiliency methods areneeded, which can handle both kinds of failures in a fast and efficient way.

One possible solution to this challenge is using IP Fast ReRoute (IPFRR) [1]. Thebasic idea is that the network is prepared for all possible node or link failures by calculatingand storing alternative routes to every destination. In this case, routers will react to thetopology changes faster, because when a failure occurs they can use one of their backuproutes for packet forwarding. In case of transient failures, the routers involved in repairingthe failure switch back to the normal routes as soon as the failure is over.

There are several IPFRR techniques published in the literature that approach fastfailure recovery in different ways. One of the most viable solutions is Not-via, whichintroduces special IP addresses with complex semantics in order to explicitly mark thefailed component for routers that are not adjacent to the failure. Unfortunately, even in

Page 7: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 4

average sized networks a huge number of extra addresses is needed, resulting in significantlyincreased address management cost. Additionally, one shortest path calculation is neededper each potentially failed component, putting a heavy load on processing units.

A project at the Department of Telecommunications and Media Informatics(BME–TMIT) was aimed at creating an improved version of Not-via, called LightweightNot-via, which reduces the number of additional addresses and comes with lower compu-tational cost compared to the original Not-via.

We had two tasks within this area:

1. Implement both Not-via and Lightweight Not-via within an IPFRR prototype, anddeploy the implementation in a real Linux-based testbed.

2. Compare Not-via and Lightweight Not-via to each other in terms of consumed man-agement cost, and to conventional IP routing protocols in terms of failure recoverytime. The measurements should be done using real traffic generated in the testbed.

The rest of this report is organized as follows. In Section 2, we discuss the conventionalIP routing protocols and resiliency methods, first in general, then focusing on OSPF.In Section 3, we explain the idea behind IPFRR, and review the most important IPFRRmethods, including a more detailed explanation of the two techniques used in our prototype,Not-via and Lightweight Not-via. Section 4 unfolds the design principles of the prototype,and contains the formalization of the Not-via and Lightweight Not-via algorithms in termsof pseudo-code. In Section 5, we discuss the implementation details, including a moreelaborate view of the architecture, message handling and some pitfalls we encounteredduring this phase. Section 6 covers the evaluation process of the prototype, and displaysthe measured recovery time and management costs along with some interpretation of theresults. Finally, in Section 7 we conclude our work.

Page 8: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 5

2 Failure recovery in traditional IP networks

The conventional IP protocol has no built-in mechanism for failure recovery. However, IP-based networks can still handle failures occurring in lower layers. The way this is achievedis by properly changing the network routes. In this section, we describe how static routingworks, then introduce dynamic routing protocols in general. After that, we review theirset-up and operation by discussing the Open Shortest Path First (OSPF) [2] protocol, themost common routing protocol used in IP-based networks. Finally, we analyse the failurerecovery ability of OSPF and we provide some reasons why this is not eligible for real-timeapplications.

The routing procedure in IP networks is based on computing the next-hop routers to-wards each possible destination so that they forward packets on an optimal (shortest path)route. Packets are passed to the appropriate next-hop router based on their destinationaddress, a mapping that is distributed in the forwarding tables of routers. It is clearthat routers need to cooperate in order to forward packets on the optimal path to anydestination in the network.

At an early stage, the forwarding tables were filled manually by the network operator;this is called static routing. In this case, when a failure occurs, the network elementscannot adapt automatically to the topology change, thus traffic that was going throughthe failed component is completely lost until the network operator becomes aware of thefailure and fixes it by reconfiguring the rest of the routers. Static routing could be a simpleand fast solution in small and reliable networks. On the other hand, using it in largeor unreliable networks becomes difficult and uncomfortable; besides, it is hard to revealmistakes committed by the network operator in the manual configuration phase.

It is obvious that the resiliency method of static routing is unacceptable for real-timeapplications. Using static routing in complex, unreliable networks makes it impossible tomeet the QoS requirements raised by the different kind of network applications, especiallyreal-time applications [3]. This is the reason why, in most cases, dynamic routing protocolsare used in contemporary IP networks.

The main task of dynamic routing protocols is the automatic completion and main-tenance of the forwarding tables by following the changes (including any failures) in thenetwork topology.

In IP networks, routes are distributed among the forwarding tables of the routers, hencethe consistent completion of these tables require routers to cooperate with each other. Itis done by the routing protocol advertising topology information for the routers, whichin turn maintain a consistent view of the network and compute their forwarding tablesindependently, based on their knowledge of the topology.

The main benefit of dynamic routing is that it requires only a minimal configurationat startup by the network operator. After that, the routers are able to automaticallycompute and maintain their forwarding tables, minimizing the probability of failures causedby misconfiguration. Compared to static routing, dynamic routing protocols recover fromfailures faster, without the need for human interaction. However, they have to take carenot to flood the links with topology advertisement packets and other signaling traffic.

Page 9: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 6

Furthermore, routers consume more computing resource for route calculations.In the following subsections, we briefly describe the different types of routing protocols

and discuss the mechanism utilized by OSPF in particular.

2.1 Routing protocols

The first routing protocols were made for the ARPANET [4]. It was a homogeneous net-work, where the operators had full access to the control plane of each device. Today’scommon network, the Internet, is fundamentally different. It consists of a multitude ofseparate networks from all around the world, run by different service providers and gov-ernments. Also, the number of network devices is significantly higher than that of theARPANET. Given that enormous heterogeneity, it is impossible (and inadvisable, due todistinct policies) to advertise each topology change throughout the whole network. In-stead, in terms of routing, the Internet is divided into individual Autonomous Systems(AS). Interior Gateway Protocols (IGP) are used within the borders of an AS, while dif-ferent Autonomous Systems are connected using Exterior Gateway Protocols (EGP), mostcommonly the Border Gateway Protocol (BGP) [5].

Interior Gateway Protocols can be further divided into two groups: distance vectorprotocols and link-state protocols. Both have their advantages, but henceforth we areonly interested in link-state protocols, since they are able to recover from failures moreeffectively. The main principle of link-state protocols is to ensure that each router in thenetwork has the same view of the network topology at any time.

2.2 Open Shortest Path First (OSPF)

The goal during the design of Open Shortest Path First (OSPF) [2] was to create a dy-namic routing protocol, which reacts to any changes in the network topology fast, withoutoverloading the network with signaling traffic. OSPF is an IP-based protocol, meaningthat topology information packets are sent directly over IP.

Routers running OSPF maintain a link-state database containing information regardingthe state of routers and links in the network and the costs of traversing links from routersto IP subnets or between two routers (in case of point-to-point connections). This databaseis used when completing the forwarding tables, hence it is important to keep it up to dateand identical across routers. That said, the routing protocol has three main tasks: networkdiscovery (including routers and connections between them), topology advertisement, andmaintenance of the link-state database and forwarding tables.

OSPF uses the Hello Protocol for network discovery. Each router exchanges Hello pack-ets with all of its neighbors periodically. The connection between two routers is consideredalive when both routers receive Hello packets from the other, and dead when the neighbordoes not answer to a certain amount of sequential Hello packets. The packets also con-tain the list of neighbors of the originating router, so asymmetric connections can also bediscovered.

Page 10: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 7

Area 2

Area 0 (backbone)

ASBR

OSPF Autonomous System

ABR

Area 1

Figure 1: Routing hierarchy of OSPF Areas

OSPF sends Link State Advertisements (LSA) to distribute topology informationamong routers. The two basic LSA types are router-LSA and network-LSA. Router-LSAsare sent by all routers in the network, and they contain information of all interfaces of theoriginating router. Network-LSAs carry a list of routers attached to a certain broadcast (ornon-broadcast multiple access) network. One network-LSA is originated for each broad-cast network by the Designated Router of that network, which is elected also by the Helloprotocol. There are further types of LSAs, with different flooding scopes and semantics;refer to [2] for details.

Routers running OSPF forward packets towards the destination along the shortest path,which is calculated by running Dijkstra’s algorithm. If more than one shortest paths exist,the traffic may be distributed equally among them.

Without routing hierarchy, link-state protocols advertise status information throughoutthe whole network. The bigger the network, the more signaling and computational overheadarises. For scalability purposes, OSPF supports dividing the network into logical Areas(groups of routers), as seen in Fig. 1. At least one Area is required (Area 0), which iscalled the backbone. If other Areas are present, each of them must have at least onededicated router called Area Border Router (ABR), which is connected directly to oneor more backbone routers. Routers with connection to the outside of their AS are calledAutonomous System Boundary Routers (ASBR). In a hierarchy like this, the signalingtraffic could be reduced, since router- and network-LSAs are flooded only within theirrespective Area. Naturally, new LSA types (like summary-LSA or AS-external-LSA) areneeded to advertise condensed routing information from one Area or AS to another [2].

Page 11: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 8

2.3 Failure handling and recovery using OSPF

In the previous subsection, we discussed the basics of OSPF; now we target its failurerecovery mechanism. Failures can be defined as topology changes, thus failure recoverymeans adaptation to the new topology by updating the forwarding tables. First, routersmust detect and identify the failed component, then send notification to others so that allrouters become aware of the change. Finally, the right changes have to be made to theforwarding tables.

As we mentioned earlier, OSPF uses the Hello protocol for failure detection. When arouter stops receiving a certain amount of sequential Hello packets from one of its neighbors,the link between them is considered broken. The speed of the failure detection is determinedby two parameters: HelloInterval, the time between sending two Hello packets, andRouterDeadInterval, the time after the link to a neighbor is considered dead when noHello packets are received. The latter shows how long a router will wait for an unreachableneighbor to be back before giving up and advertising the lost connection; in other words,this is the time during which a failure is considered transient. Its recommended value is atleast twice the HelloInterval.

The minimal value for HelloInterval and RouterDeadInterval is one second, which isan unacceptable lower bound for failure recovery time using real-time applications. OSPFprovides another mechanism for link failure detection besides the Hello protocol, whichis faster, however, it depends on the driver of the network interface card, thus it is notapplicable in every network. The essence of fast link failure detection is that the interfacecard immediately notifies the OSPF process when the status of the physical connectionchanges (performing a so-called layer-2 upcall), eliminating the need for waiting until Rou-terDeadInterval expires before handling the failure. After the detection of a failure,adjacent routers report the topology change via LSAs. Each router updates its routes inthe forwarding table by running Dijkstra’s algorithm on the new topology, and continuesforwarding packets along the new shortest paths.

After this overview of the reactive, global failure recovery mechanism used by OSPF,we now outline the advantages and disadvantages of this approach. One unquestionableadvantage of OSPF is its robustness: it can recover from any kind and number of failures,provided that the network remains connected. Packets are always forwarded along theshortest paths according to the current topology. On the other hand, it is possible thatduring the failure recovery procedure routers have different views of the network becauseof the propagation delay and processing time of LSAs and shortest path computations.Therefore, besides the packet loss, micro-loops might appear, which detract QoS furtherby overloading the network and increasing the recovery time.

Page 12: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 9

3 IP Fast ReRoute

In the previous section, we showed that the failure recovery algorithms used by traditionalIP networks are not able to satisfy the resiliency requirements raised by multimedia andreal-time applications. The two main reasons for this were advertising the failure globallyto all routers, resulting in high propagation delay and convergence time, and the reactiveapproach of failure handling, that is, the repairing of a failure starts only after the failurehas already occurred.

The most promising solutions addressing both problems are provided by the IP FastReRoute (IPFRR) techniques. These protocols suppress the global failure signaling foundin conventional routing protocols (at least until the failure is considered transient). Ad-ditionally, preparations are made to handle the failures in advance, before they actuallyshow up, thereby comprising a significantly faster, more responsive, proactive approach.

According to the concept of IPFRR, when a failure occurs, only local routers are in-volved in temporarily rerouting the traffic to an alternative backup path. Such routes arecomputed for all possible failures during the setup phase, so that a backup path is im-mediately available when needed, minimizing the time elapsed until routing is restored inthe network again. After the failure is over (in case of transient failures), routers involvedin the repairing process can switch back to the normal routes. When the failure becomespermanent (after it has been around for a certain amount of time), it is necessary to informother routers about the topology change.

When using IPFRR, only routers adjacent to the failed link or node know anythingabout the failure at all. Other routers use their normal routes for packet forwarding, whichmay cause problems if a router passes a packet back from where it has received it, believingthat there is a usable path leading thitherto. Therefore, IPFRR techniques must ensurethat no forwarding loops can be formed based on such misbelieves.

In the following subsections, we review the most important IPFRR algorithms suitablefor IP networks. Fast rerouting techniques were first used in MPLS (MultiProtocol LabelSwitching) and optical SONET/SDH (Synchronous Optical NETworking, SynchronousDigital Hierarchy) networks. A similar method was also designed for Ethernet networks[6]. These systems are able to recover from any failure within 50 milliseconds, which is therecovery time targeted by IP networks as well.

3.1 Equal Cost Multiple Path (ECMP)

One of the oldest and simplest IP Fast ReRoute techniques is Equal Cost Multiple Path(ECMP) [7], which is an extension enabled in the majority of today’s networks. ECMP isusable in those cases when more than one (different) shortest paths are available towards adestination. The traffic is distributed equally among the paths by default, offering increasedbandwidth. Additionally, when a failure occurs on one path, routers balance traffic amongthe remaining routes.

ECMP is easy to implement, but it works only when multiple paths of equal cost areavailable between the source and destination. In Fig. 2, node S has three ECMP paths

Page 13: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 10

S

R1

R4

R3

R5

DR2

15

5

5 5

5

15

5

20

15

primary path

secondary path

Figure 2: Equal Cost Multiple Path

to node D: via R1, R2 and R3. Suppose that link S → R2 on the primary path is down.In this case, node S continues to use its two secondary paths when forwarding packets tonode D. However, node R2 has only one shortest path to node D, thus ECMP would notbe able to handle the failure of link R2 → R5.

3.2 Loop-free Alternates (LFA)

The principles of Loop-free Alternates (LFA [8]) are similar to that of ECMP, but LFAcovers more cases. The primary path is always the shortest one, and all other paths alongwhich the next-hop is closer to the destination than the sender are potential secondarypaths. There is no global signaling, failures are not advertised throughout the wholenetwork, but loops still cannot appear, since each router forwards packets to its neighborsso that the distance to the destination decreases in every step.

According to simulations, LFA is usable only in 75-80% of all failure cases [9]. It isobvious that if a node has only one neighbor closer to the destination than itself, LFA isunable to find a secondary path.

S

R1

R2 D

R3

10

1

1 10

10

1 primary path

secondary path

Figure 3: Loop-free Alternates

As shown in Fig. 3, the next-hop from S towards D along the shortest path is R2. Whenlink S → R2 is down, the second shortest path from S is via R1. However, R1 does notknow anything about the failure, and its shortest path to D is through S; hence, it cannotbe used as a backup next-hop as it would pass packets back to S. Instead, S forwardspackets towards R3 in order to avoid the loop. In case the weight of link S → R3 was 1,LFA would not be able to find an alternate path to D.

Page 14: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 11

3.3 Multiple Routing Configuration (MRC)

The Multiple Routing Configuration (MRC) [10] technique is based on creating differentviews (routing configurations) of the network topology by altering the link weights. Ashortest path computation is done on all of these configurations. In case of a node failure,routers choose a routing configuration where the weights of the incoming and outgoinglinks of the failed node are set to infinity, so that no shortest path goes through it to anydestination. These nodes are so-called isolated nodes, with prohibited links. It is obviousthat only packets originated from or destined to an isolated node go through a prohibitedlink. Handling link failures is similar to node failures: the network routers isolate the failedlink by setting its weight to infinity, thus no traffic would traverse it.

Routers must compute and store a separate routing configuration for each networkelement being isolated (one at a time), so that later every potential failure will have anassociated configuration. The MRC technique is easy to understand but hard to implement,because it needs modifications in the IP stack: routers use the Type of Service field of theIPv4 header to signal the effective routing configuration that is currently used for packetforwarding. Relying on this filed may collide with other protocols that also alter those bits,and there is no standardized field in the IPv4 header for this kind of marking purposes.

3.4 Failure Insensitive Routing (FIR)

Routers using OSPF or one of the IPFRR techniques discussed so far forward packetsbased solely on the destination address field of the IP header. The idea of Failure Insensi-tive Routing (FIR) [11] is that, besides the destination address, routers consider also theinterface through which the packet has been received, a condition that may carry extrainformation about the current status of the network. When a router receives a packet froma neighbor which is normally the next-hop along the shortest path towards the destination,it infers that there must be a failure in the network. FIR, being a Fast ReRoute technique,does not use global signaling, so in case of a failure, only adjacent routers know directlyabout it. Others infer the existence of the failure from the unusual direction of the incom-ing packet. In order to avoid loops, these packets are forwarded on an alternative route,which is computed and stored in advance.

The advantage of FIR is that it does not require any modifications in the IP stack, and,just like MRC, it is able to handle any single node or link failure, while still being loop-free.However, the algorithm used for the alternative route computation is complicated, and itis possible that the detour is much longer than it would inevitably be necessary.

3.5 Not-via

Not-via uses IP-in-IP tunneling to steer packets around the failed component. If a routerencounters a failed next-hop, it encapsulates the original packet to a special not-via ad-

Page 15: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 12

dress1. This new address specifies the destination of the encapsulated packet, but alsoserves as an explicit failure indicator by denoting which router should be avoided on therepair path. The not-via address is advertised by the router to which the encapsulatedpacket is destined. Upon receiving such a packet, this router decapsulates it and forwardsthe original packet to its destination.

Normally, when there is a node failure along the shortest path to a destination, therouter before the failed node should tunnel packets to the so-called next-next-hop node,which is the node that would receive the packet from the next-hop node along the forward-ing path. This method ensures that the failed node is avoided and the traffic is routedback to the original shortest path as soon as possible, which is an elegant way of handlinglink and node failures consistently.

P B

A

C

DS

BP

Figure 4: Illustration of the repair mechanism used by Not-via

In Fig. 4, S is about to send a packet destined to D, normally using P as the next-hop.However, due to the failure of P , S encapsulates the packet to the not-via address BP .This address indicates that the packet should be delivered to the next-next-hop node B

not via the next-hop P . B will decapsulate the packet and forward it to D.If a packet is already in a tunnel because of a previous failure, encountering a new one

causes the packet to be dropped instead of being encapsulated again. This rule preventsthe formation of forwarding loops.

3.5.1 Not-via addresses

As previously mentioned in Section 3.5, the semantics of a not-via address is twofold. Foronce, similarly to the meaning of normal IP addresses, it specifies the destination node towhich a packet with that address should be delivered. On the other hand, it acts also asan explicit failure indicator by flagging the component in the network that is believed tobe failed, and hence should be avoided in the forwarding process.

If a network consists solely of point-to-point links, then the failed component indicatedby a not-via address is always a router. In more complicated scenarios, however, whenthere are also LANs in the network, it is necessary to protect against the failure of these

1 When referring to an IP address used for repairing purpose, it is written lowercase (not-via). Refer-ences to the algorithm itself are always capitalized (Not-via).

Page 16: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 13

components too. LANs introduce further not-via addresses, which are discussed in Section3.5.3 in details.

3.5.2 Repair path computation

In the scenario depicted on Fig. 4, router S encapsulates a packet originally destined toD to the not-via address BP due to the failure of P . This packet can reach B only if allrouters on the repair path from S to BP are aware of BP and know which next-hop to usewhen forwarding a packet to that address. The required information can be obtained byremoving P from the topology, running a shortest path first algorithm, and recording thenext-hop on the resultant shortest path to B.

Since a router does not normally know which repair paths it resides on, it has to preparefor handling all not-via addresses that happen to be in the network. Therefore, each routerhas to remove all other routers by turns and calculate the shortest paths in the remaininggraph to those not-via addresses that indicate the failure of the removed node, that is, toall neighbors of the failed node.

By default, Not-via assumes that if the connection to a next-hop is down, the nodeitself must have been failed, and attempts to avoid the router entirely on the repair pathaccordingly. There are, however, two corner cases, when some nodes are reachable onlythrough the failed one. In these cases, the sole chance to repair to those destinations is byassuming that only the link to the next-hop has been failed.

P DS DS

PS DS

(b) last-hop problem(a) bridge problem

Figure 5: Two corner cases requiring link repair

The first case is the so-called bridge problem. Suppose that S can reach D only throughnode P , as shown in Fig. 5/a. Now S assumes that only the link S → P went down, notP itself, thus it encapsulates packets originally destined to D to PS, hereby avoiding thetraversal of the failed link.

The second case calling for link repair is the last-hop problem, depicted in Fig. 5/b.Here the failed node is the destination itself. Now S assumes that, similarly to the bridgeproblem, the link S → D leading to the destination has been failed, and tries to repair itby encapsulating packets to DS.

In the previous two cases, it is possible that, albeit Not-via assumes to be dealing witha link failure, the case is indeed a node failure after all; so the packets, although avoidingthe link, eventually rush into the failed node, where they get discarded. Since repairing alink is considered only if using the failed next-hop would be essential in order to delivera packet to its destination (or it is the very destination that has been failed), these losses

Page 17: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 14

theoretically cannot be obviated at all. However, attempting to repair the link failureassures that if a path exists to the destination, the packets will arrive there.

3.5.3 The LAN problem

So far, networks were basically assumed to consist solely of point-to-point links and routers.Now let us examine how the Not-via algorithm can be extended to cover networks withLANs as well.

In case of dealing with routers, the failure of a node unambiguously defines the failedcomponent: it is the router itself. With LANs, there are slightly more approaches toconsider when attempting to circumscribe the failed components, depending on how far weare willing to advance regarding to the accuracy of failure identification.

The price of finer granularity is a bigger load of not-via addresses, but in exchange thefailure can be signaled more precisely. Coarser models require less not-via addresses, butin some cases they may consider unnecessarily large parts of the network dysfunctional.It might even happen that, by using a less delicate approach, no repair path exists toa destination at all, while sticking to a finer model would have allowed for avoiding theactually failed components.

A LAN is represented by a pseudo-node L. It may be thought of as an abstraction ofa device operating in data link layer (e.g. a switch), or the sum of all layer-2 connectionsof the routers it connects together.

The Not-via draft [12] introduces three different LAN protection methods. Simple LANRepair packs the LAN pseudo-node and all attached routers into one shared risk group,that is, if either L or one of the attached routers fails, all members of the group are assumedto be failed.

LAN Component Repair confines the spread of the LAN shared risk group to the pseudo-node L and all links connected to it. This approach yields finer failure perception comparedto the previous one, since the layer-2 connections inside the LAN become separated fromthe attached router nodes. On the other hand, some more not-via addresses need to beintroduced in order to further differentiate the failed component.

The most accurate repair method, LAN Repair Using Diagnostics, requires the use ofsome kind of diagnostics technique that enables each router to detect the failure of theconnection to any neighbor router reliably and independently. An available technologysuitable for this purpose is Bidirectional Forwarding Detection (BFD) [13], which monitorsthe connectivity between two adjacent routers. The routers may either be connected by apoint-to-point link, or reside on the same LAN.

Correlating the connectivity information of different routers through the same LAN canbe used to deduce the state of the LAN itself. In Fig. 6, if S notes that the connection islost only to one of its neighbors through L, it assumes that L itself is operational and thepeering router is failed. However, when the BFD session breaks with two or more routerson L, then S regards this as the failure of L but not any of the router nodes2.

2 The scope of Not-via covers only the repair of single link or node failures; so it is not a limitation of

Page 18: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 15

C

B

D

S

Q

P

R

L

QR

PR BP

CQ

DR

QC

PB

RD

SL

RQ

QL

PL

RL

QPQS

PS PQ

RPRS

SP SQ SR

Figure 6: Not-via addresses used with LAN Repair Using Diagnostics

In case of LAN Repair Using Diagnostics, the number of not-via addresses advertised bya router S equals the sum of adjacent routers (in layer-3) and LANs. To be more specific,a certain router A needs to advertise a not-via address to protect all of its neighborsconnecting either on point-to-point links or through a LAN, and one not-via address isneeded for each LAN as well that A is attached to.

For our prototype, we incorporated the LAN Repair Using Diagnostics method, becausethis is the most sophisticated failure detection model supporting the unified handling ofLAN and node failures.

Using BFD as diagnostics brings a further benefit. BFD can operate fully in the IPlayer and monitor the state of the forwarding engine too, as opposed to e.g. layer-2 upcall,which only reports the state obtained from the network interface driver. Also, an opensource BFD implementation exists as a Linux kernel module.

3.6 Lightweight Not-via

3.6.1 Introduction

In this subsection, we introcude the Lightweight Not-via method, which was developedat BME–TMIT. The implementation and evaluation of this technique is one of the mostimportant goals established within this report.

The power of Not-via comes from explicitly marking the failed component by tunnelingpackets to special not-via addresses, so that all routers can compute an optimal detour foreach failure scenario. In order to achieve this, Not-via requires a large number of additionaladdresses, and runs a shortest path tree calculation for each potential node failure. Thismakes Not-via in its original form resource hungry, and suffers from significant addressmanagement burden. Lightweight Not-via [14] is a revised version of Not-via, aiming toreduce (or, in some cases, completely eliminate) the number of extra addresses neededfor failure handling, and also lowers the complexity of computing backup paths. Thisalgorithm is based on redundant trees, a concept that must be understood before weintroduce Lightweight Not-via itself.

the algorithm if that guess is wrong, since there would be more than one node failures already.

Page 19: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 16

A CB

ED L

shortest path

primary redundant tree

secondary redundant tree

Figure 7: Sample network for Lightweight Not-via: the arrows mark the shortest path toC, along with the primary and secondary redundant trees rooted at C.

Redundant trees have for a long time found their applications for protection and restora-tion in optical [15, 16] and wireless ad-hoc [17] networks. The reason for this is that redun-dant trees, basically a pair of directed spanning trees, have the appealing property thata single node or link failure destroys connectivity through only one of the trees, leavingthe path along the other tree intact. The concept was first applied to IP Fast ReRoutein [18]. In contrast, Lightweight Not-via applies redundant trees directly to Not-via itself.Organizing the detours over redundant trees gives rise to an easily implementable and de-ployable revised Not-via scheme: it significantly decreases the number of not-via addresses,with clever modifications it reduces computational complexity to linear, and it eliminatesmost of Not-via’s corner cases without introducing new ones.

Consider the sample network in Fig. 7 showing the shortest path tree and two othertrees directed towards node C. These two directed trees, the primary and the secondarytree, are such that the unique path from any node to the root node C in the primary treeis node-disjoint from the path in the secondary tree. For instance, the path from A to C

along the primary tree is A → D → L → E → C, which is surely disjoint from the pathalong the secondary tree, A→ B → C. Trees possessing this property are called redundanttrees.

Lightweight Not-via is defined over redundant trees in the following way. Suppose A

has a packet to send to node C. As long as its default next-hop, B, is alive, A simplypasses the packet to B. If, however, B goes down, A must find a backup path, or at leasta next-hop that can push the packet further, towards C. So it encapsulates the packetand sends it along the primary tree to D. Assuming that D computed the exact sameredundant tree to C (which is not hard to ensure), D will pass the packet through LAN L

and node E to C, where it gets decapsulated and sent further. If, instead, it is now nodeE that has to get a packet to C and it finds that connectivity to C went away, both itsshortest path and its primary backup path are affected by the failure. In this case, thepacket is encapsulated to the secondary backup path and sent through L to B. Note thatthe secondary backup path cannot be impacted by the failure in this case, as it is nodedisjoint from the primary path. Finally, a packet forwarded along the primary path getsrerouted to the secondary path should it encounter a failure on its path (this might bethe very same failure that pushed the packet to the detour in the first place) but not viceversa.

Page 20: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 17

3.6.2 Redefining the semantics of not-via addresses

In Lightweight Not-via, a node v has only three addresses: a default routable IP address,denoted by Dv, an IP address Pv that belongs to the primary tree and an IP address Sv

that belongs to the secondary tree. The address space is, correspondingly, split into threedisjunct zones: a default routable address zone D, a primary backup address zone P anda secondary backup zone S. There are distinct entries in the routing table for all threeaddresses for each node, and there is a common understanding between routers as to whichaddress belongs to which zone. A router, therefore, always unambiguously knows alongwhich path it received a packet.

3.6.3 Repair path computation

In order for a router to participate in the forwarding process, it needs to compute the next-hop corresponding to any of the potential destination addresses it can find in a packet.With Lightweight Not-via, the next-hop corresponding to the default routable address ofsome node v, Dv, is obtained from the shortest path tree, and can be computed for allnodes in one pass spawning a single instance of Dijkstra’s algorithm. The next-hops forthe primary and the secondary backup addresses, Pv and Sv, are obtained from computinga pair of redundant trees to v. Theoretically, one needs to compute a separate pair ofredundant trees to each destination node, or at least find the corresponding next-hops,however, this can be reduced to linear time using a simple distributed algorithm [14].

3.6.4 Removing corner cases

Not-via has some subtle details, making it more difficult to implement correctly and un-derstand in operation. Though, redefining Not-via in terms of redundant trees removesmost of the corner cases. For instance, LANs no longer need special treatment and neithermultiple not-via addresses need to be assigned. Additionally, it is trivial to modify an ar-bitrary redundant tree algorithm to produce maximal redundant trees in non-2-connectednetworks, thereby solving the bridge problem as well (see e.g., [19]). Finally, the last-hopproblem is treated identically to Not-via by simply repairing to the next-hop (as a matterof fact, we have already seen this when we examined the case of E sending a packet to C

and losing connectivity to it).

3.7 Summary

Lightweight Not-via has several advantages over traditional Not-via. First, in Not-via,a not-via address covers only a single failure scenario. After redefining Not-via in termsof redundant trees, a not-via address protects many components: the primary backupaddress protects components along the default path and the secondary backup protectsthe primary backup. Consequently, the number of necessary addresses is reduced to 2per node, a constant per router (note that 2 not-via addresses per node is the absoluteminimum achievable with the original Not-via, only realizable in point-to-point rings). This

Page 21: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 18

obviously alleviates the pain of assigning not-via addresses, helps shrinking routing tablesand reduces the number of tunnels, that is, mitigates many of the address managementissues traditional Not-via raises.

Considering the network in Fig. 7, Not-via requires 17 not-via addresses3 in additionalto the normal IP addresses to be able to operate properly. On the other hand, LightweightNot-via needs no additional addresses, provided that each router interface has an IP addressand routers have a routable loopback address, which is a well established assumption inmost practical cases.

Our first task was to implement Not-via and Lightweight Not-via in the same IPFRRframework, so that later we can perform measurements concerning management cost andfailure recovery time. Before going into details about the actual implementation, we in-troduce the design principles of the prototype in the next section, formalize Not-via andLightweight Not-via in terms of pseudo-code, and also say a few words about the handlingof persistent failures.

3 Going into details, there are 8 addresses to protect neighbors connected by point-to-point links, 6addresses to protect neighbors through LAN L and 3 to protect L, which sum up to 17.

Page 22: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 19

4 Design

So far, we have discussed the resiliency method utilized by OSPF, a conventional IP routingprotocol, and reviewed several IPFRR methods, including Not-via and Lightweight Not-via,which form the basis of our IPFRR prototype. In this section, we turn to present our ownwork. Our task was to implement Not-via and Lightweight Not-via, and compare them toeach other and to OSPF by measuring the achievable failure recovery time and consumedmanagement cost. We received the high-level description of both IPFRR algorithms (see[12] for Not-via and [14] for Lightweight Not-via), thus our first step was to turn theseabstract descriptions into a detailed, ready-to-implement prototype design. In this section,we discuss this design process, while the details of the implementation and the evaluationtake place in the next two sections.

4.1 Overview

By running an IPFRR algorithm on its own, it provides fast repair for any potential singlenode or link failure in the network. Now we argue that it is still insufficient to run only afast reroute algorithm, and explain how this component should be extended in order to bereworked into a complete IPFRR prototype.

Spending too much time in rerouted state after repairing a failure comes with a seriousdrawback. As long as the local backup paths are in use instead of the default ones, thenetwork is prone to any further failures. Hence, it is desirable to reclassify long-standingfailures (those that last more than, say, half a minute) as a persistent topology change,and start a global state synchronization among all routers. This way short-lived, tempo-rary failures are repaired locally by the fast reroute mechanism, but long-term, persistentchanges are signaled and treated globally. After all routers become aware of the topologychange, they recompute the backup paths based upon the new topology and become readyto repair further failures again.

IPFRR OSPFBFD

Figure 8: Prototype components

In order to achieve this functionality, IPFRR should be supported by two additionalcomponents, as shown in Fig. 8. A fast failure detection mechanism is required for notifyingrouters of adjacent failures. We chose Bidirectional Forwarding Detection (BFD) [13] forthis purpose. BFD is a protocol intended to detect faults in the bidirectional path betweentwo forwarding engines by exchanging Hello packets with the opposite router, similarlyto OSPF, but usually much faster (in the order of milliseconds). If a certain amount ofpackets are lost, the connection is considered broken. This loss count and the transmissioninterval of the packets can be configured, which together determine the achievable failuredetection time. It is the responsibility of the prototype to properly set up and tear downBFD connections.

Page 23: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 20

Since IPFRR has no inherent support for topology discovery, it should query this infor-mation from an already running IGP, in our case, OSPF. This connection is also used forsignaling and conducting transient-to-persistent failure switch-over through the network.

In the rest of this section, we go into the details of Not-via and Lightweight Not-via,then organize their routing tables into a working IPFRR system. Finally, we deal with theproblem of transient-to-persistent failure switch-over.

4.2 Notations and terminology

Henceforth, the Not-via and Lightweight Not-via methods are discussed as if they werebeing executed on a dedicated router S. This section introduces some notations needed toformalize the algorithms.

• B: Base topology containing all routers, LANs and links in functional state.

• R: Set of router nodes in topology B.

• L: Set of LAN pseudo-nodes in topology B.

• SPT(T ): Runs a SPT calculation from router S in topology T and caches the resultfor further use.

• nextHop(T , N), nextNextHop(T , N): Return the next-hop and next-next-hop nodefrom S to destination node N in topology T .

• nextHopP(N), nextHopS(N): Return the next-hop node from S to destination nodeN along the primary and secondary redundant trees rooted at N (used only withLightweight Not-via).

• isBridge(N): True if node N is a bridge in topology B, that is, the network wouldfall apart into disjunct components if N was removed.

• isPtP(A→ B): True if the link A→ B is a direct link between two routers, that is,it provides a point-to-point connection; otherwise false.

• getLANbetween(A, B): Returns the LAN pseudo-node to which both routers A andB are attached (should be called only if it is certain that such a LAN node exists).

• LANcontains(L, R): True if router R is attached to LAN L; otherwise false.

• adjacent(N): True if S and N are adjacent nodes. Two routers are adjacent if theyare either connected by a point-to-point link or there is a LAN to which both routersare attached (network-layer connectivity). A router and a LAN are adjacent if therouter has an interface attached to the LAN.

When referring to the semantics of a not-via address, it is written in the form X-Notvia-Y ,which means router X without going through the router or LAN Y . (Note that X is alwaysa router, whereas Y can be a LAN as well.)

Page 24: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 21

4.3 Routing table calculations

Router S has to forward packets according to different rules in the following cases:

A) there are no failures (topology B)

B) an adjacent router F ∗ failed (topology B \ {F ∗}, and, if S has a point-to-pointconnection to F ∗, also topology B \ {S → F ∗})

C) an adjacent LAN L∗ failed (topology B \ {L∗})

We decided to consider all these different cases in the packet forwarding process by mappingthem to distinct routing tables. When a particular failure occurs, we simply switch to adifferent routing table, precomputed well in advance specifically to the failure at hand.This provides fast and reliable handling of failures. Fortunately, the Linux kernel is ableto handle 255 distinct routing tables, so we can be sure that we do not run out of routingtables when considering the potential failures adjacent to S.

Calculating the routing tables is different for the original and the Lightweight Not-via.The former uses SPTs to determine next-hops, the latter has a pair of redundant treesrooted at each destination for this purpose. Our task was to map the forwarding processgiven by next-hops to static routing tables and predefined tunnels for both algorithms. Thefollowing two subsections, 4.3.1 and 4.3.2, contain the formalized algorithms in pseudo-codeform, first for Not-via, then for Lightweight Not-via.

4.3.1 Not-via

In order to complete the routing tables, the next-hop (and sometimes also the next-next-hop) nodes have to be known to all possible destinations, in all possible topologies. Hence,S has to compute and cache the SPTs rooted at itself, in all topologies. The next-hopsand next-next-hops can be queried later from these trees, when needed by the Not-viaalgorithm. The SPT calculations are formalized in Algorithm 1.

Algorithm 1 Not-via: SPT calculations

1: SPT(B)2: foreach N ∈ L ∪R \ {S} do3: SPT(B \ {N})4: end for5: foreach R ∈ R : isPtP(S, R) do6: SPT(B \ {S → R})7: end for

The last-hop and bridge problems, introduced in Fig. 5, require the repair of certainlink failures as well. Therefore, all point-to-point links connecting to S necessitate an SPTwith the link in question removed from topology B.

Page 25: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 22

A) Default routing table The default table is used when S is not aware of any directfailures in the network, that is, BFD sessions with all adjacent routers are up. However, itdoes not mean there are no failures in the entire network at all, since S will not be notifiedof the failure of non-adjacent nodes. Thus S may receive packets destined to those not-viaaddresses whose repair path it resides on, and it has to be prepared to properly forwardthese packets too. Therefore, S has to maintain routes in its default table to all not-viaaddresses that are advertised by non-adjacent routers.

Algorithm 2 Not-via: default routing table

1: foreach N ∈ (R \ {S}) ∪ L do2: if N ∈ R then3: A ← loopback address of router N

4: else5: A ← network prefix of LAN N

6: end if7: table(default).route.add(A via nextHop(B, N))8: end for9: foreach X-Notvia-Y do

10: if (S = X) ∨ (S = Y ) ∨ adjacent(Y ) then11: continue12: end if13: table(default).route.add(X-Notvia-Y via nextHop(B \ {Y }, X))14: end for

Now let us explain the corresponding code snippet listed in Algorithm 2. The firstforeach block (lines 1–8) traverses all router and LAN nodes in the network except S itself,and adds the routing entries for the appropriate normal addresses to the default table.The second block (lines 9–14) traverses all not-via addresses using the generic selector X-Notvia-Y . Since the destination of all not-via addresses like S-Notvia-* is S itself (S = X),and packets destined to not-via addresses like *-Notvia-S never reach S (S = Y ), there isno need to maintain routes to those addresses. Also, no need to route not-via addresseswhich signal the failure of an adjacent component, since S would be aware of these failuresand handle them in separate routing tables. In all other cases, packets have to be forwardedon the shortest path in topology B with the failed node Y removed.

B) Routing table used in case of adjacent node failure Let us first define a helperfunction notviaNextHop (see Algorithm 3) that returns the next-hop used to forward apacket destined to a given not-via address in topology B.

The routing of a not-via address depends on whether the failed component Y flaggedby the not-via address equals S or not. If not, then the packet simply has to be passedto the next-hop used in topology B \ {Y }. If S = Y indeed, the packets must not havecome from an other router, since packets addressed to *-Notvia-S would have avoided S;thus the packets must have been encapsulated by S itself. Now, in case of point-to-point

Page 26: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 23

Algorithm 3 Not-via: next-hop to a not-via address in topology B

1: function notviaNextHop(X-Notvia-Y )2: if S 6= Y then3: NH ← nextHop(B \ {Y }, X)4: else5: if isPtP(S → X) then6: NH ← nextHop(B \ {S → X}, X);7: else8: L← getLANbetween(S, X)9: NH ← nextHop(B \ {L}, X)

10: end if11: end if12: return NH13: end function

connection the link S → X, otherwise the intermediate L LAN pseudo-node should beskipped.

The next two algorithms are used to complete the routing table used when an adjacentrouter node F ∗ fails. Algorithm 4 is used for routing not-via addresses, Algorithm 5 is usedfor normal addresses. Both algorithms should be run with each adjacent router substitutedin place of F ∗, one by one.

For routing not-via addresses, we have to examine first which node would be used asnext-hop if S knew of no failures: this is obtained by calling notviaNextHop. If thisnode equals the failed adjacent node F ∗, then the packet should be dropped by adding ablackhole target, since packets already on a detour cannot be tunneled once more (in orderto avoid loops). Otherwise, the returned next-hop is alive and can be used to forwardpackets destined to the not-via address in question.

For routing normal addresses, we have to examine if the next-hop used in topology Bequals the failed node F ∗. If not, then the failure does not affect the packet forwardingand the next-hop queried from the appropriate SPT can be used.

If the next-hop is known to be failed, some sort of repair should take place to avoid thefailed component. Now we have to distinguish between two major cases: whether there is abridge or last-hop problem, in which cases only the intermediate link or LAN pseudo-nodehas to be skipped, or there are no such problems, in which case the entire next-hop shouldbe avoided.

Let us examine the case with no bridge or last-hop problem more thoroughly. Nowpackets simply have to be encapsulated to the next-next-hop node not via the failed next-hop. This always works with loopback addresses, since they belong to router nodes, butLAN addresses (e.g. an address of a router interface attached to a LAN) need a smallamendment here. For interface addresses only the aggregated network prefix is stored inthe routing tables, making the actual destination node in the graph a LAN pseudo-node.Therefore, whenever the failed next-hop NH is attached to this destination LAN L, as

Page 27: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 24

Algorithm 4 Not-via: routing not-via addresses when node F ∗ is failed1: foreach X-Notvia-Y do2: if S = X then3: continue4: end if5: NH ← notviaNextHop(X-Notvia-Y )6: if NH = F ∗ then7: table(F ∗).route.add(X-Notvia-Y is unreachable)8: else9: table(F ∗).route.add(X-Notvia-Y via NH )

10: end if11: end for

Algorithm 5 Not-via: routing normal addresses when node F ∗ is failed

1: foreach N ∈ (R \ {S}) ∪ L do2: if N ∈ R then3: A ← loopback address of router N

4: else5: A ← network prefix of LAN N

6: end if7: NH ← nextHop(B, N)8: if NH 6= F ∗ then9: table(F ∗).route.add(A via NH )

10: else11: if isBridge(F ∗) ∨ (F ∗ = N) then12: if isPtP(S → F ∗) then13: table(F ∗).route.add(tunnel A to F ∗-Notvia-S)14: else15: L← getLANbetween(S, F ∗)16: table(F ∗).route.add(tunnel A to F ∗-Notvia-L)17: end if18: else19: if (N ∈ L) ∧ LANcontains(N , NH ) then20: NNH ← predHop(B \ {NH }, N)21: else22: NNH ← nextNextHop(B, N)23: end if24: table(F ∗).route.add(tunnel A to NNH -Notvia-NH )25: end if26: end if27: end for

Page 28: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 25

NHS P

PNH

L

Figure 9: Repairing to the predecessor node of L

depicted in Fig. 9, no next-next-hop exists. The solution is to encapsulate packets to thepredecessor node P of L, which is the last router on the repair path to L in topologyB \ {NH }.

C) Routing table used in case of adjacent LAN failure The next two algorithmscalculate the routing table used when an adjacent LAN L∗ is failed. Algorithm 6 is respon-sible for routing not-via addresses, Algorithm 7 handles normal addresses. Both algorithmsshould be run with each adjacent LAN node substituted in place of L∗, one by one.

Algorithm 6 Not-via: routing not-via addresses when LAN L∗ is failed1: foreach X-Notvia-Y do2: if S = X then3: continue4: end if5: NH ← notviaNextHop(X-Notvia-Y )6: if (L∗ 6= Y ) ∧ LANcontains(L∗, NH ) then7: table(L∗).route.add(X-Notvia-Y is unreachable)8: else9: table(L∗).route.add(X-Notvia-Y via NH )

10: end if11: end for

For routing not-via addresses, we have to examine which node would be the next-hop ifthere was no adjacent failure. If this node was reached through L∗, the failure of L∗ wouldrequire a change in packet forwarding, so the packet should be dropped instead of beingforwarded, because not-via addresses are not allowed to be tunneled once more. Otherwise,the next-hop returned by notviaNextHop can be used.

Packets sent to normal addresses should be encapsulated to the next-hop not via thefailed LAN if the next-hop used without the failure would be reached through L∗. Other-wise, the failure of L∗ does not affect the forwarding process.

4.3.2 Lightweight Not-via

Next, we turn to formalize Lihtweight Not-via. As we shall see, its design is cleanercompared to that of Not-via discussed in the previous subsection, and is free of all of itscorner cases.

Page 29: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 26

Algorithm 7 Not-via: routing normal addresses when LAN L∗ is failed

1: foreach N ∈ (R \ {S}) ∪ L do2: if N ∈ R then3: A ← loopback address of router N

4: else5: A ← network prefix of LAN N

6: end if7: NH ← nextHop(B, N)8: if LANcontains(L∗, NH ) then9: table(L∗).route.add(tunnel A to NH -Notvia-L∗)

10: else11: table(L∗).route.add(A via NH )12: end if13: end for

In order to complete the routing tables for Lightweight Not-via, the next-hop (andsometimes also the next-next-hop) nodes have to be known to all D, P and S addressesin the network. One SPT should be computed to obtain the next-hops towards all Daddresses. A single pair of redundant trees, the primary and the secondary tree, is alsocomputed, both trees rooted at a certain node R having e.g. the highest D address amongall nodes. Then, for all other nodes N 6= R, these trees are rewired with N set as root, andthe corresponding next-hops along the rebased trees yield nextHopP(N) and nextHopS(N).The SPT and redundant tree calculations are formalized in Algorithm 8.

Algorithm 8 Lightweight Not-via: SPT and redundant tree calculations

1: SPT(B)2: compute a pair of redundant trees rooted at R

3: foreach N ∈ L ∪R \ {S, R} do4: rebase trees with N set as root5: nextHopP(N) ← next-hop along the rebased primary tree6: nextHopS(N) ← next-hop along the rebased secondary tree7: end for

A) Default routing table After that we know the next-hops for all possible addresses,as calculated in Algorithm 8, completing the default table with Lightweight Not-via isquite easy. All kinds of nodes (routers and LANs) have D addresses, so we iterate throughall nodes and just add a route to them via the corresponding next-hop along the shortestpath. For router nodes, also the P and S addresses have to be routed, using the next-hopsalong the primary and secondary trees. The code is shown in Algorithm 9.

Page 30: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 27

Algorithm 9 Lightweight Not-via: default routing table

1: foreach N ∈ (R \ {S}) ∪ L do2: table(default).route.add(DN via nextHop(N))3: if N ∈ R then4: table(default).route.add(PN via nextHopP(N))5: table(default).route.add(SN via nextHopS(N))6: end if7: end for

B) Routing table used in case of adjacent node failure First let us deal with the Daddresses, since both routers and LANs have that kind of address. If the next-hop along theshortest path is alive, then this next-hop can be used for packet forwarding. Otherwise,packets have to be tunneled to the next-next-hop: either the primary or the secondarytree rooted at the next-next-hop is used, whichever does not contain the failed node F ∗.(If PNNH is failed, then SNNH must be alive, since the two trees are node disjoint.) Thepseudo-code for routing D addresses is shown in Algorithm 10.

Algorithm 10 Lightweight Not-via: routing D addresses when node F ∗ is failed

1: foreach N ∈ (R \ {S}) ∪ L do2: NH ← nextHop(N)3: if NH 6= F ∗ then4: table(F ∗).route.add(DN via NH )5: else6: NNH ← nextNextHop(B, N)7: if nextHopP(NNH ) 6= N then8: table(F ∗).route.add(tunnel DN to PNNH )9: else

10: table(F ∗).route.add(tunnel DN to SNNH )11: end if12: end if13: end for

For routing P and S addresses, only routers should be considered as destinations. Ifthe next-hop towards the PR address of a router R is alive, the packet should be forwardedvia the next-hop to R along the primary tree rooted at R. If the desired next-hop is failed,then the next-hop along the secondary tree must be alive and the packet destined to PR

should be tunneled to SR and passed to the next-hop along the secondary tree. Note thatin this case the original packet is encapsulated twice, as it was already in a tunnel when itarrived. This is not a problem, however, because the destination addresses in both outerIP headers belong to the same node, which will simply recognize and remove both headers.

Finally, S addresses are either forwarded along the corresponding secondary tree ordropped if the next-hop to be used is failed in order to prevent forwarding loops. The

Page 31: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 28

Algorithm 11 Lightweight Not-via: routing P and S addresses when node F ∗ is failed

1: foreach R ∈ (R \ {S}) do2: if nextHopP(R) 6= F ∗ then3: table(F ∗).route.add(PR via nextHopP(R))4: else5: table(F ∗).route.add(tunnel PR to nextHopS(R))6: end if7: if nextHopS(R) 6= F ∗ then8: table(F ∗).route.add(SR via nextHopS(R))9: else

10: table(F ∗).route.add(SR is unreachable)11: end if12: end for

pseudo-code for routing P and S addresses is shown in Algorithm 11.

C) Routing table used in case of adjacent LAN failure Due to the redefinedsemantics of not-via addresses introduced by Lightweight Not-via, LAN failures can betreated very similarly to router failures. The only difference is that, instead of testing if arouter equals the failed node F ∗, now we have to check if it is attached to the failed LANL∗. The other operations, conditional branches and actions to be performed are the same.For the sake of brevity, we do not publish the pseudo-code for this case; it would be almostan exact copy of the code used for adjacent node failures.

4.4 Routing table management

Now that we know how to calculate the routing tables and set up tunnels for both Not-viaand Lightweight Not-via, we explain how to insert them into the prototype. Only one rout-ing table can be active at a time, therefore, we set up preferences among the precalculatedrouting tables, and always the one with the highest preference will be effective. For thispurpose, only two levels are needed: HI PREF for the active table, and LO PREF for allother inactive tables.

We assume that there are no failures in the network at startup, so the default tablewill get HI PREF and the others (which are used when a router or LAN fails) start atLO PREF. In case of a failure, the preferences have to be altered to activate the propertable prepared to handle the situation. For this purpose, we introduce the procedureinstallRoutingTable (see Algorithm 12) that raises the preference of the table usedwhen a given node N∗ fails and lowers the preference of all other tables.

As it has been mentioned previously, the prototype uses BFD for failure detection.After calculating the routing tables, S sets up a BFD session with each adjacent routerin the network. Whenever a BFD session goes down or up with a given router, the BFDmodule sends a notification to the IPFRR process.

Page 32: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 29

Algorithm 12 Install routing table used when node N∗ fails

1: procedure installRoutingTable(N∗)2: table(∀N : adjacent(N) ∧ N 6= N∗).preference ← LO PREF3: table(N∗).preference ← HI PREF4: end procedure

Although IPFRR aims to repair only one failure at a time, the prototype should main-tain a sane behavior even if multiple failures occur. This was kept in mind during the designprocess. The number of failures S is aware of is stored in a variable called bfd_failures.Additionally, for each node N in the network S stores the number of failed BFD sessionsthat terminate at N (in case of routers) or goes through N (in case of LANs). This numbercan either be 0 or 1 for routers, but for LANs it may go as far as the number of routersadjacent to S (in case all neighbors are connected to S through one single LAN, and theyall fail).

Algorithm 13 Data structures for tracking BFD sessions1: int bfd failures ← 02: NodeMap<int> failedSession

When a node fails or gets repaired, the BFD module returns an identifier of the peerrouter to which the BFD session went down or came back. The tasks to be done upon theseevents are given in two procedures: bfdDn in Algorithm 14, and bfdUp in Algorithm 15.

Algorithm 14 BFD session with peer R goes down

1: procedure bfdDn(R)2: bfd failures ← bfd failures + 13: failedSession[R] ← 14: if isPtP(S → R) then5: installRoutingTable(R)6: else7: L← getLANbetween(S, R)8: failedSession[L] ← failedSession[L] + 19: if failedSession[L] = 1 then

10: installRoutingTable(R)11: else12: installRoutingTable(L)13: end if14: end if15: end procedure

If the BFD connection breaks with an adjacent router R, the number of known failuresis incremented and the BFD session to R is marked to be failed. If S is connected to R

Page 33: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 30

Algorithm 15 BFD session with peer R comes up

1: procedure bfdUp(R)2: bfd failures ← bfd failures − 13: failedSession[R] ← 04: if ¬isPtP(S → R) then5: L← getLANbetween(S, R)6: failedSession[L] ← failedSession[L] − 17: end if8: if bfd failures > 0 then9: foreach M ∈ (R \ {S}) ∪ L do

10: int f ← failedSession[M ]11: if (f > 0) ∧ (M ∈ R ∨ f > 1) then12: installRoutingTable(M)13: end if14: end for15: else16: installRoutingTable(default)17: end if18: end procedure

with a point-to-point link, it is clear that R itself has been failed and the correspondingrouting table is activated. Otherwise, the BFD session must go through an intermediateLAN L located between S and R, thus the number of failed sessions through L should alsobe incremented. Also, if it is the second or more failure through L, it is considered a LANfailure (according to the used LAN-repair method). Otherwise, it is the first failure overL and the routing table for R is used.

If the BFD connection to router R comes back, the number of known failures aredecremented and the BFD session to R is marked to be up. If S is connected to R througha LAN L, the number of failed sessions through L are decremented too. If there are stillknown failures, S searches for a node M which has failed BFD connections. If M is eithera router or a LAN with more than one failed BFD sessions4, the routing table for M isselected and installed.

4.5 Transient-to-persistent failure switch-over

The infrastructure introduced so far is enough to repair a single failure. However, if thefailure is not a transient drop-out, but a more persistent change (e.g. an excavator torn upa bunch of fibers), it would leave the network vulnerable to further failures. To deal withthis problem we resort to a well-established instrument of network protocols: timers.

4 This ensures that we do not repair a LAN containing only one failure, because in this case we assumethe failure of the router, not the LAN.

Page 34: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 31

When there are no known failures in the network, there are no active timers either. If arouter notices that a BFD session went down, it starts a timer T . Its timeout is the intervalthat separates transient failures from persistent ones. If the failure disappears before T

expires, it is a transient failure; if T expires and the failure still exists, it is considered tobe persistent. The actual timeout value should be configurable, but based on the averagelifetime of network failures it is best to be set around 30-60 seconds.

If T is up, and an additional failure occurs or a failure disappears while there are stillfailed nodes, T is reset to its initial value. It means that a non-empty set of failures isconsidered persistent only if the set of failed nodes has been steady for an interval.

When T expires at a router, it knows that there is a persistent failure in the network.However, routers not adjacent to the failure still know nothing about it, thus a globalsignaling of the failure is initiated in order to communicate to other routers that a persistentfailure occurred and recalculation of all routing tables is necessary. If a router receivesthis signal, it retrieves the topology from OSPF to obtain the up-to-date connectivityinformation, recalculates the IPFRR routing tables based on this topology, and continuesthe forwarding process using the new default routing table.

Page 35: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 32

5 Prototype implementation

In the previous section, we discussed the design of the prototype: we introduced the com-ponents needed besides IPFRR itself, formalized Not-via and Lightweight Not-via, andoutlined the transient-to-persistent failure switch-over mechanism. In this section, we diginto the implementation details of the prototype, show a more elaborate view of the ar-chitecture, give some specific information of the used software components, the internalmessages and signals, and unfold a few pitfalls that we encountered during the implemen-tation process.

5.1 Architecture

The architecture of the prototype implementation is depicted in Fig. 10. The three mainmodules are the IPFRR module, which is driven by our own Not-via and Lightweight Not-via implementations, the OSPF module, which supplies topology information and realizedby the Quagga Routing Suite [20], and the failure detection module, provided by the kbfd

Linux kernel module [21].

OSPF(quagga)

netlink

ospfapi

query topology

opaque-LSA

manageroutes

manageroutes

Linux kernel

IPFRR(Not-via, Lightweight Not-via)

BFD(kbfd kernel module)

manage

BDF sessions

Figure 10: Interfaces between prototype components

The communication between the modules is shown by the arrows. The IPFRR moduleuses the Linux kernel netlink interface to manage BFD sessions in the kbfd module andalso for manipulating routing tables in the kernel. IPFRR queries topology informationfrom the OSPF module through its ospfapi interface, which is also used for sending opaque-LSAs5 [22] for signaling between IPFRR processes running on different routers. Finally,the OSPF module itself also creates its own routes in the Linux kernel through its netlinkinterface (the reason for this is uncovered in Subsection 5.4).

5 Opaque LSAs provide a generalized mechanism to allow for the extensibility of OSPF. They consistof a standard LSA header followed by application-specific data, which can be used to distribute arbitraryinformation between processes by standard OSPF link-state database flooding mechanisms.

Page 36: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 33

main

ip

bfd

bfd-netlink

ip-netlink spt

graph

ospfclient

addr

Figure 11: Not-via source modules

5.2 Software components

The IPFRR module (which consists of the Not-via and Lightweight Not-via implementa-tions) is written in C++ using the LEMON graph library [23]. LEMON is a C++ templatelibrary aimed at combinatorial optimization tasks, especially those working with graphsand networks. The parts using the Linux kernel netlink interface are written in C.

Fig. 11 shows the source code modules for the Not-via implementation. The arrowsindicate code dependencies. Modules bfd and ip are responsible for managing BFD ses-sions and calculating routing tables, respectively. They have a -netlink variant as well,which implement the netlink socket communication towards the kernel. Module ospfclientqueries and processes topology information from OSPF and creates a LEMON graph rep-resentation that the graph module can store and handle. Module spt computes SPTs forall topologies required by the Not-via algorithm. Finally, addr is a class for representingand manipulating IP addresses in CIDR form.

In order to provide fast failure detection in the prototype, we use the kbfd kernelmodule, which supports Linux kernels only up to version 2.6.18. The PC routers wererunning Debian Linux, with either kernel 2.6.18 included in the distribution or a customkernel 2.6.24, so we had to port the module to support 2.6.24 as well. Additionally, wefound and fixed a bug in kbfd that caused a massive kernel panic when BFD sessions weredeleted, a fix without which the proper handling of BFD sessions is inconceivable. A new

Page 37: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 34

project description of the patch

kbfd

⋄ port to recent Linux kernel versions⋄ fix crashing bug that caused kernel panic when deleting BFD sessions⋄ add support for setting the DiffServ CodePoint of BFD packets

quagga⋄ support for C++ compiler⋄ raise privilege for creating netlink socket towards kbfd

Table 1: Patches contributed to quagga and kbfd

netlink socket call to support the setting of DiffServ CodePoint was also added to kbfd inorder to prioritize BFD control packets over background streams (the implementation usesvalue 0x30, CS6 Network Control, in accordance with [24]).

The prototype uses the latest version of quagga from its CVS repository, which shouldbe compiled with support for opaque-LSA in order to be used with our implementation(e.g., use the --enable-opaque-lsa configure switch when compiling quagga from source).Since quagga is written in C and we used C++ for the majority of the code for our prototype(including the ospfclient module which links to quagga’s ospfapi implementation), we hadto patch quagga so that it can be compiled with a C++ compiler. Our patch has beenaccepted and merged into upstream by the quagga maintainers, providing official supportfor C++ in the leading open source routing software suite.

BFD support in quagga is not necessary for the prototype, because BFD sessions areentirely managed by our software. However, if we would like to compare the failure recoverytime of IPFRR to that of OSPF, we have to utilize the same failure detection method forevaluating both protocols. For this, we will need a modified quagga that can cooperatewith kbfd. There is a patch available for download from the author of kbfd that addssupport for kbfd in quagga [21]. This patch was slightly modified by us to increase theinteroperability between the patched quagga and Linux: the privilege of the quagga processis raised during the creation of the netlink socket towards kbfd, because only privilegedprocesses are allowed to register a netlink socket in a multicast group, which is needed inorder for quagga to receive BFD up/down events broadcasted by the kbfd kernel module.

All modifications to quagga and kbfd are tracked at our website, available at http://

opt.tmit.bme.hu/~kbfd/. Table 1 summarizes our contributions to these projects.

5.3 Address management

The private address space 10.0.0.0/8 is used by the prototype. Loopbacks are assignedaddresses from the range 10.0.0.1–10.0.0.254 with netmask /32 (e.g. 10.0.0.x/32). Inter-face addresses are from subnet 10.0.0.0/24 (e.g. 10.0.x.0/24). These are configured anddistributed by OSPF based on the settings in quagga’s zebra.conf file on each machine.Network pseudo-nodes are given the aggregated prefix of the attached router interfaceaddresses. The management of not-via addresses is specific to each IPFRR algorithm,detailed in the next subsections 5.3.1 and 5.3.2.

Page 38: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 35

5.3.1 Not-via

The Not-via draft does not specify the means of allocating or advertising not-via addresses.In our prototype, we decided to allocate the separate address space 10.1.0.0/24 for not-viaaddresses and define the addresses in the form 10.1.x.y/32 where x and y are derived fromfrom the loopback and network addresses of routers and LANs, respectively.

For example, given two routers A and B with loopback addresses 10.0.0.1/32 for A

and 10.0.0.2/32 for B and a LAN L with network address 10.0.3.0/24, the following not-via addresses would be assigned: A-Notvia-B = 10.1.1.2/32, B-Notvia-A = 10.1.2.1/32,A-Notvia-L = 10.1.1.3/32 and finally B-Notvia-L = 10.1.2.3/32.

5.3.2 Lightweight Not-via

Lightweight Not-via redefines the semantics of not-via addresses so that a node has onlythree addresses: a default routable IP address, an IP address that belongs to the primarytree and an IP address for the secondary tree.

As a convention, all routers follow the policy that the loopback address (advertised asthe router-ID by OSPF) is the default, the lowest interface address is the primary backupand the largest one is the secondary backup address. Note, however, that only the IPaddresses are determined like this, the netmask for these addresses is always /32 so thatthey will be more specific than the corresponding interface address. This is necessary asa P or S address must be routed according to a redundant tree, which may require anext-hop different from the one used for the interface address in a LAN.

5.4 Messages and states

Although the prototype design provided a way to handle transient-to-persistent failureswitch-overs, the actual implementation has to deal with a problem that an abstract designdoes not suffer from: the execution time of the algorithms and routing table operations.When a router receives a signal to synchronize its topology with OSPF and recompute itsIPFRR routing tables, it has to provide routing even during the recalculation process. Itmeans that there needs to be a separate active routing table besides the IPFRR tables,because those will be completely cleared and reestablished in case of a persistent failure.

The solution is to make use of a special routing table maintained by the quagga rout-ing daemon. By default, it is suppressed by the active IPFRR table using routing tablepreferences. Quagga constantly runs the OSPF Hello protocol, exchanges packets betweenadjacent routers and closely follows the topology changes introduced by network compo-nent failures. By the time the T timer expires (see Section 4.5), the routing table of quaggabecomes a usable and consistent table for the new topology at each router in the network.

Until now, we have only referred to the signaling process that communicates the oc-currence of persistent failures throughout the network. For this purpose, the prototypeimplementation uses the opaque-LSA extension of OSPF. When a router’s timer expires,it originates an opaque-LSA, which will be distributed by quagga to all other routers. The

Page 39: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 36

opaque-LSA carries a payload indicating the failed component to filter opaque-LSAs orig-inated by different routers for the same failure (which might happen if their timers expirealmost at the same time). Utilizing opaque-LSAs gives way to another useful feature: if anew router is added to a network already running IPFRR, it can send an opaque-LSA toannounce its presence and make other routers reconfigure their topologies to include thenewcomer. This process is exactly the same as recovering from a persistent failure, and itis implemented in the prototype.

To summarize the sent and received messages and provide a good overview of differentstates, Fig. 12 shows the life-cycle of an IPFRR process running on a router.

5.5 Pitfalls

A thorough design can greatly help the implementation process, but there are always someunforeseen circumstances that emerge only when an idea is replanted into practice. Herewe would like to shed light on a few of them.

When working with a lot of PCs with 3–4 line-cards (occasionally from different vendors)in each, one has to be very lucky not to experience hardware failures or misbehaving udev6

rules. We also had our share from these genial delicacies, but managed to get over themwith humble patience.

Filling the routing tables is not enough to make a router out of a Linux box; theforwarding process has to be enabled by writing 1 to /proc/sys/net/ipv4/ip_forward.An other thing with routing is that the kernel caches next-hops to recently seen destinationaddresses in order to speed up the forwarding time of packets. This cache should be flushedwhenever the active routing table is changed, because some destinations are reached via adifferent next-hop in the new routing table.

Some routes must be added to the special table main before they can be added to anyother table. Lightweight Not-via creates routes to all destinations via the loopback addressof the corresponding next-hop router to ensure that the packet is passed to the next-hopdirectly, without traversing the primary or secondary redundant tree. These routes canonly be created if routes exist in the main table that specify the interface through whichthe next-hop is reachable.

BFD sessions are also established between the loopback addresses of adjacent routers.If BFD was routed similarly to ordinary traffic, in case of a failure BFD packets would betunneled and forwarded along the appropriate redundant tree (as it is with all traffic sentto D addresses). It is not acceptable, since this way BFD packets would not even traversethe link they are intended to monitor. The solution is to create a separate routing tablesolely for BFD packets and set up a rule so that only these packets will be processed bythat table (in our implementation, the filtering is done on the DSCP header field that weset for BFD packets).

6 The Linux udev device manager dynamically provides the nodes for the devices actually present ona system. It provides a set of rules that name and create device nodes and run configured programs toset-up and configure the devices.

Page 40: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 37

stopTimer

call bfdUp(N ′)

bfdUp(N ′) timerExpired

opaque-LSA

opaque-LSA

stopTimer

with OSPFsynch topology

with OSPFsynch topology

routing tablescalculate

idle

opaque-LSA

call bfdDn(N ′)

startTimerrouting tables

recalculate

fast reroute –

bfdDown(N)

idle

routing tablesrecalculate

idle

startTimer

stopTimer

call bfdDn(N ′)

bfdDown(N ′)

with OSPFsynch topology

more failures no more failures

Figure 12: SDL diagram of prototype message handling and state transition

Page 41: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 38

6 Evaluation

Next, we get to discuss the last phase of our work, where the evaluation of our prototypeimplementation takes place. We compare the failure recovery time of our prototype tothat of OSPF, and also compare Not-via to Lightweight Not-via in terms of additionaladdresses, routing table management and computational cost.

6.1 Testbed

The topology of the testbed used for evaluating our prototype can be seen in Fig. 13. Itconsists of 9 PC routers running Debian GNU/Linux, connected over Ethernet. We usedthe DBS (Distributed Benchmark System) [25] tool for performance measurement.

or4 or2 or3

ot2

ot3 ot1 ot4

dest

src

new shortest path

tunneled repair path

original shortest path

Figure 13: Topology of the test network used for measurement

DBS can benchmark TCP and UDP streams in complex scenarios, generating trafficbetween several nodes. The setup used for the prototype evaluation contained only twomeasurement nodes, realized by node src and dest. Node src sent UDP packets to dest

with configurable traffic parameters. We settled for the following configuration: the trafficfrom src to dest consisted of UDP packets with 256 bytes of payload, sent at 1 packet/msrate, allowing for 1 ms precision in time measurement. Node src and dest recorded thetimestamp and sequence number for each packet they were sending or receiving. Thisrequires that the clocks of the two nodes are synchronized, which was achieved by runningthe Network Time Protocol [26] between src and dest.

The duration of the DBS measurement period can be configured in advance. After it isover, DBS collects the measured data at one of the nodes in order to determine the averagethroughput, delay, packet loss and other traffic parameters.

The BFD parameters were set to send BFD control packets in every 3 ms, and 3 lostpackets were considered to be the sign of a failure. This yields at least 6 ms, at most

Page 42: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 39

9 ms failure detection time. Setting BFD to send packets more frequently started to causepacket loss (BFD is transmitted over UDP), therefore BFD sessions became unstable andcontinuously changed state between up and down. The BFD settings we used provided thelowest possible failure detection time that our testing environment could handle.

Our prototype is capable of handling transient-to-persistent failure switch-over: after afailure has existed for a given time, it is considered to be a persistent change and the failedcomponent is entirely removed from the topology by rebuilding the IPFRR routing tablesbased on the new topology obtained from OSPF. This switch-over timeout was T = 15 sduring our measurements (which may be a bit low for real-world usage, but is speeds upthe evaluation process).

6.2 Failure recovery time

Our first series of measurement targeted the failure recovery time of Not-via and OSPF.For this purpose, we created a link failure between nodes ot1 and ot4, as depicted inFig. 13. The schedule of the link failure was a bit different for Not-via and OSPF, as werethe actions that the two protocols took to recover from it; let us discuss them separately.

6.2.1 Not-via

The events during a Not-via measurement cycle are shown in Fig. 14.

700 50 65

(3)sw

itch

-ove

rto

orig.top.

switc

h-ov

erdo

ne

Time [s]

startmea

suremen

t

recove

r link

10

(1)un

plug

link

(immed

iate

fast

reroute)

25 30

(2)sw

itch

-ove

rto

new

top.

switc

h-ov

erdo

ne

Figure 14: Timeline of events throughout a Not-via measurement cycle

Initially, the measurement packets sent by DBS traveled from src to dest on theoriginal shortest path, as depicted in Fig. 13. After 10 seconds that DBS was started, weunplugged the link between ot1 and ot4. This resulted in a fast reroute, which took placeas soon as the BFD sessions over this link signaled the failure. After the fast reroute, thetraffic went from src to ot1 on the original shortest path; ot1 repaired to ot4-Notvia-ot1(note the bridge problem at ot4) and the traffic was tunneled to ot4, which decapsulatedthe packets and passed them to dest.

15 seconds later, T expired in ot1 and ot4. The link failure was reclassified as apersistent change and opaque-LSAs were sent to initiate a global topology switch-over.This finished in at most 5 seconds. After that, the traffic from src to dest traveled on the

Page 43: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 40

new shortest path. At 50 s, we plugged the link back, which caused the BFD sessions golive between ot1 and ot4. However, since the current routing tables did not contain thefailed link (the change had been made persistent), it could not be fast rerouted. Instead,the T timer was started again in ot1 and ot4 and a global topology synchronization wasinitiated after T had been expired, which diverted the traffic back to the original shortestpath.

6.2.2 OSPF

We performed this evaluation by running only OSPF on the routers. For this, we usedquagga with all kbfd-related patches applied, as discussed in Section 5.2. The BFD pa-rameters were identical to the settings used with Not-via. The events during an OSPFmeasurement cycle are shown in Fig. 14.

0 40

(2)recove

rlin

k

startmea

suremen

t

Time [s]

switc

h-ov

erto

orig.top.

10

(1)un

plug

link

switc

h-ov

erto

new

top.

Figure 15: Timeline of events throughout an OSPF measurement cycle.

With OSPF, there is no fast reroute part, since OSPF reacts to topology changes byinitiating global topology synchronization (or, in other words, all failures are consideredpersistent changes). The same is true for link recovery: OSPF immediately signals thetopology change to all routers in the network.

6.2.3 Results

Here we present our measurement results for Not-via and OSPF. Although the IPFRRresults are measured using the Not-via algorithm, we would have acquired the same withLightweight Not-via as well, because once the routing tables are calculated, the operationof the prototype is the same.

We conducted 10 measurements for both Not-via and OSPF. We publish the fast rerouterecovery and transient-to-persistent switch-over times along with the statistics of sent andlost packets collected by the benchmark system. The results for Not-via are summarized inTable 2, the data obtained by using OSPF is in Table 3. Both tables contain the minimum,maximum and average values for the measured quantities as well.

Our results indicate that the fast reroute repair time achieved with Not-via is 12 mson average, and 20 ms at maximum. Consequently, the number of lost packets duringfast reroute is 13 on average and 22 at maximum, which is not a surprise considering the

Page 44: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 41

1 packet/ms rate at measurement packets were sent. With conventional OSPF, the recoverytime varied between 243–251 ms, with 248 ms on average. The packet loss is in line withthe switch-over time. The full potential of using IP Fast ReRoute clearly manifests itself:repairing a failure in no more than 20 ms is an achievement that enables pure IP-basednetworks to form the basis of carrier-grade applications.

On the other hand, the results for transient-to-persistent failure switch-overs show thatthey took close to 2 s at maximum, whereas OSPF could reroute almost at once, withoutany packet loss for the second time, when the cable was plugged back. It does not meanthat OSPF inevitably performs better here, though: for another failure case we might aswell measured similar delays and packet losses as occurred during the first switch-over.The failure scenario is simply so that OSPF accidentally switches the nodes back to theoriginal topology in the right order that avoids the formation of micro-loops.

Although for Not-via the switch-over time is high, the maximum packet loss during thefirst and second switch-over is hardly more than two and four times of the loss measuredat the fast reroute part. It shows that the network suffered only small drop-outs when theindividual nodes along the forwarding path switched to another routing table. Note thatproviding a micro-loop free switch-over was not targeted by our prototype at all, so it isby no means a failure. It is clear, though, that there is room for improvement in that areaand, considering the second switch-over of OSPF, it could be very well solved in practice.

To provide an even better understanding of the events during the measurement withNot-via, we present two graphs: Fig. 16 shows the throughput of measurement packets,and Fig. 17 completes it by highlighting the lost packets. The impact of the fast rerouteand the two switch-overs are clearly visible on both graphs. It is worth correlating thetiming of these events with the timeline shown in Fig. 14.

Page 45: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 42

overall (1) unplug link,fast reroute

(2) switch-over tonew top.

(3) switch-over toorig. top.

no.sent

packetslost

packetsrecoverytime [ms]

lostpackets

switch-overtime [ms]

lostpackets

switch-overtime [ms]

lostpackets

1 90004 130 12 13 763 48 888 692 90005 135 20 22 1892 36 867 773 90004 149 12 13 1984 54 940 824 90004 130 16 14 1944 52 944 645 90010 102 4 5 794 32 899 656 90004 139 16 18 819 48 899 737 90011 113 8 10 795 40 904 638 90004 132 12 13 778 45 871 749 90004 126 12 13 1972 40 851 73

10 90005 115 8 9 1960 48 831 58

min 90004 102 4 5 763 32 831 58max 90011 149 20 22 1984 54 944 82avg 90006 127 12 13 1370 44 889 70

Table 2: Recovery and switch-over times measured with Not-via

overall (1) switch-over tonew top.

(2) switch-over toorig. top.

no.sent

packetslost

packetsswitch-overtime [ms]

lostpackets

switch-overtime [ms]

lostpackets

1 50006 251 250 251 0 02 50007 255 246 248 4 73 49999 247 246 247 0 04 50004 251 251 251 0 05 49999 250 250 247 0 36 50006 241 243 241 0 07 50006 252 251 252 0 08 50004 246 250 246 0 09 50001 255 246 247 4 8

10 50007 251 250 251 0 0

min 49999 241 243 241 0 0max 50007 255 251 252 4 8avg 50004 250 248 248 1 2

Table 3: Switch-over times measured with OSPF

Page 46: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 43

received packets

sent packets

Time [s]

Thro

ughput

[Mbps]

9080706050403020100

2.4

2.2

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

Figure 16: Throughput from node src to dest measured with Not-via

lost packets

received packets

sent packets

Time [s]

Seq

uen

cenum

ber

(×107)

9080706050403020100

2.5

2

1.5

1

0.5

0

Figure 17: Packet loss measured with Not-via

Page 47: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 44

6.3 Management cost

After measuring the recovery times of Not-via and OSPF, we turned to compare the man-agement costs of Not-via and Lightweight Not-via. One very important constituent ofthe management burden of Not-via comes from the fact that it uses significantly moreIP addresses, which all need to be assigned, distributed and subjected to routing tablecalculations. The first round of our measurements was aimed at determining actually howmany additional addresses Not-via and Lightweight Not-via uses.

We chose a series of increasingly sized commonplace network topologies: the Abilene,NSF and the AT&T topologies from [27]; the German (Germany), Italian (Italy) andthe European (Cost266) backbone topologies from [28]; an extended 50 node version of theGerman backbone (Germany50, [27]); plus two random network topologies: one of 75 nodes(Top75) and one of 100 nodes (Top100), both generated by the BRITE tool [29] using therouter-level Waxman model (m = 4). Since the type of the links is not specified in thesetopologies, we repeated the measurements first with every link set as a point-to-point link,and then with substituting every fifth node with a LAN connecting the neighbors of thenode.

The number of not-via addresses the original Not-via needs for each topology is given inFig. 18. The most important observation is that the address pool size increases drasticallywith the increase of the network, especially in the presence of LANs. Quantitatively, thenumber of addresses scales linearly with the number of point-to-point links in the network,and quadratically in the presence of LANs. In contrast, Lightweight Not-via needs onlytwo additional addresses per router or, when there are unique interface addresses availableon at least two interfaces, it does not need any additional addresses at all.

Obviously, configuring several thousands of not-via addresses by hand is next to impos-sible, and it remains cumbersome and prone to human errors even using some centralizednetwork management software. The problem is worsened by the need to retain the com-pound semantics of not-via addresses in a consistent manner all over the network. Butit is not only central network management that is overwhelmed by the sheer volume ofnot-via addresses: just dealing with so many addresses can overload even the IP routersthemselves. Every single not-via address handed out in the network comes at a high price:a distinct forwarding table entry must be computed and configured, an IP-in-IP tunnelneeds to be set up for those addresses that can potentially be local or remote endpoints ofdetours, etc.

Table 4 gives an idea of the magnitude of the address management load. It contains thetotal number of not-via addresses in the network, the number of forwarding table entries,the execution time of the calculation of the forwarding tables and the configuration of theforwarding engine at a randomly selected node in different topologies. Results are givenfor Not-via and Lightweight Not-via both with and without LANs. Note that it is onlynecessary to execute all these steps when the topology changes persistently, but even inthis case managing so many forwarding entries can be a tedious task. The time spent by arouter from computing the next-hops until the forwarding entries are all downloaded intothe forwarding engine is displayed in Fig. 19.

Page 48: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 45

The time needed to compute the forwarding tables for the original Not-via growsdramatically in increasingly sized networks, to the point that it takes tens of millisec-onds for larger topologies. Our measurements indicated a very visible improvement withLightweight Not-via in this regard. However, forwarding table calculation time straight-out vanishes when compared to the amount of related management work: configuring theforwarding engine with several thousand entries easily bogs down a router for half a secondor even more. While this observation might be surprising, it is in line with the rest of theliterature [30].

These measurement results cast Not-via in a completely different light: although thecomputational complexity of Not-via is substantial, yet it is the extra management burdencaused by the extension of the address pool that dominates its complexity. Our measure-ments reproduce this burden spectacularly even in small and middle-sized topologies, andwe expect it to become prohibitive in larger networks. On the other hand, it is exactly thisburden where the advantages of Lightweight Not-via really manifest themselves: the timeof computing the next-hops and configuring the forwarding engine decreases by an orderof magnitude into the range of some few hundred milliseconds, which falls well within thetime range contemporary IP routers perform ordinary shortest path routing [30].

Page 49: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 46

Not-via

Topology w/o LANs w/ LANs

Name node link addr route addr route calc conf

Abilene 12 15 30 196 33 260 0.6 9.2

Germany 17 26 50 318 68 371 1.1 12.3

AT&T 22 39 76 460 92 470 1.8 16.2

NSF 26 44 86 542 120 612 2.2 22.8

Italy 33 56 112 703 145 780 4.1 29.6

Cost266 37 57 114 736 150 1078 4.9 39.5

Germany50 50 88 176 1110 242 2576 9.4 94.9

Top75 75 300 600 3330 969 9098 28.8 364.3

Top100 100 400 800 4457 1895 17490 45.8 682.1

Lightweight Not-via

Topology w/o LANs w/ LANs

Name node link addr route addr route calc conf

Abilene 12 15 0-24 187 0-20 180 0.5 5.3

Germany 17 26 0-34 272 0-28 238 0.7 6.7

AT&T 22 39 0-44 357 0-36 324 1.1 8.1

NSF 26 44 0-52 425 0-42 370 1.3 10.9

Italy 33 56 0-66 544 0-52 478 1.9 14.8

Cost266 37 57 0-74 612 0-60 628 2.3 19.5

Germany50 50 88 0-100 833 0-80 1102 4.3 30.2

Top75 75 300 0-150 1258 0-120 1860 17.7 47.1

Top100 100 400 0-200 1683 0-160 2490 30.8 66.2

Table 4: Total number of not-via addresses in the network (addr), the forwarding tableentries (route), the execution time of the calculation of the forwarding tables (calc, [ms])and the configuration of the forwarding engine (conf, [ms]) at a randomly selected node indifferent topologies.

Page 50: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 47

LAN

Point-to-point

Topology with number of nodes (every 5th is considered LAN)

Num

ber

ofnot-

via

addre

sses

Top100

(100)

Top75

(75)

Ger

many

(50)

Cost

266

(37)

Italy

(33)

NSF

(26)

AT

&T

(22)

Ger

many

(17)

Abilen

e(1

2)

2000

1500

1000

500

0

Figure 18: Number of not-via addresses for different commonplace network topologieswith only point-to-point links and with every fifth node substituted by a LAN. Note thelogarithmic scale on the x axis.

Forwarding table computation

Configuration of the forwarding engine

Topology with number of nodes (every 5th is considered LAN)

Tim

e[m

s]

(100)

Top100

(75)

Top75

(50)

Ger

many

(37)

Cost

266

(33)

Italy

(26)

NSF

(22)

AT

&T

(17)

Ger

many

(12)

Abilen

e

450

400

350

300

250

200

150

100

50

0

Figure 19: Execution time of computing the forwarding tables and configuring the forward-ing engine for Not-via (first column) and Lightweight Not-via (second column).

Page 51: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 48

7 Conclusion

IP Fast ReRoute is one of the last missing technological components from the IP protocolsuite on its way to become a mature carrier-grade transport technology. In this report,we published the design and implementation process of a full-fledged IPFRR prototypebased on the Not-via and Lightweight Not-via fast reroute algorithms. We conducted a se-ries of measurements in order to compare the recovery time achievable with our prototypeimplementation to that of OSPF, a traditional dynamic routing protocol. Also, we com-pared Not-via to Lightweight Not-via by terms of address management cost and consumedprocessing resources.

At the beginning of our work, we had the high-level description of Not-via andLightweight Not-via at our disposal, and our task was to turn them into a formalized pro-totype design, which also incorporates a transient-to-persistent failure switch-over mech-anism, a substantial component of a deployable IPFRR system, yet often being omittedfrom the majority of IPFRR literature. From the experience we gained from the prototypeimplementation, we can state that it was more difficult to properly deal with all of Not-via’s subtleties and corner cases compared to the cleaner design of Lightweight Not-via.The difference is obvious even by simply looking at the pseudo-code listings of the twoalgorithms in Section 4.3.

After the implementation, our second task was to perform measurements aimed atevaluating the recovery time and management burden implied by IP Fast ReRoute. As forthe recovery time, we found that IP Fast ReRoute really lived up to our expectations. Withthe help of BFD, we were able to go below not just the 50 ms recovery time required by real-time applications, but even below 20 ms. Additionally, this recovery time is independentof the network topology and the location of the failure. On the other hand, OSPF repairsin the order of hundreds of milliseconds, and even this repair time varies across differentfailures and topologies.

Next, we turned to investigate the management cost at which Not-via and LightweightNot-via operate. The first and most visible manifestation of this cost is the number ofadditional addresses that Not-via used up. It means that a completely new not-via ad-dress pool needs to be handed out, distributed and managed in the network, with thecomplex semantics communicated consistently between routers and respective forwardingtable entries installed into the forwarding engine. We had to use a policy concerning not-viaaddresses hard-wired into our prototype, which did fine within the context of our testbed,but it is by no means portable. In contrast, Lightweight Not-via required no additionaladdresses at all. Also, the simpler semantics of not-via addresses let us completely avoidthe introduction of awkward prerequisites to be able to manage not-via addresses; a stockOSPF implementation was well enough to advertise all necessary IP addresses.

Not-via calculates an SPT for each potentially failed node, whereas Lightweight Not-viaonly computes one SPT and a pair of redundant trees, which is significantly faster in largernetworks. While it is an important difference, the majority of the computational overheadcomes from another aspect of management cost, which is the number of routing entriescreated by Not-via. As it was shown in Table 4, Lightweight Not-via performs better also

Page 52: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 49

in this area by installing much less routes, which substantially decreases the time neededto load them into the forwarding engine.

On the whole, we successfully fulfilled all goals set at the beginning of our report. Asa result, we have a working IPFRR prototype, which is able to repair a single link ornode failure under 20 ms, operating fully in IP layer. Also, we confirmed that LightweightNot-via is a notable alternative to the original Not-via by significantly reducing the com-plexity of implementation, eliminating additional not-via addresses in a large way, therebyconsiderably lowering the overall management cost as well.

The transient-to-persistent failure switch-over is still open for further study and im-provement, especially by aiming for micro-loop free reconfiguration of the network. Thisarea was not in the scope of our current work, but could be an interesting topic for futureinvestigation.

Page 53: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 50

References

[1] M. Shand and S. Bryant. IP Fast Reroute Framework. Internet Draft, availableonline: http://tools.ietf.org/html/draft-ietf-rtgwg-ipfrr-framework-08,February 2008.

[2] J. Moy. OSPF Version 2. Internet Engineering Task Force: RFC 2328, April 1998.

[3] T. Szigeti and C. Hattingh. Quality of Service Design Overview. Available online:http://www.informit.com/articles/article.aspx?p=357102&rl=1.

[4] A. Khanna and J. Zinky. The revised ARPANET routing metric. Available online:http://pdos.csail.mit.edu/decouto/papers/khanna89.pdf.

[5] Y. Rekhter and T. Li. A Border Gateway Protocol 4 (BGP-4). Internet EngineeringTask Force: RFC 1771, March 1995.

[6] J. Farkas, Cs. Antal, and L. Westberg. Fast failure handling in ethernet networks.IEEE International Conference on Communications, June 2006.

[7] D. Thaler. Multipath Issues in Unicast and Multicast Next-Hop Selection. InternetEngineering Task Force: RFC 2991, Nov 2000.

[8] Alia Atlas. Loop-Free Alternates for IP/LDP Local Protection. Internet Draft,available online: http://tools.ietf.org/html/

draft-ietf-rtgwg-ipfrr-spec-base-00, March 2005.

[9] S. Previdi. Ip fast reroute technologies. Asia Pacific Regional Internet Conference onOperational Technologies (APRICOT), March 2006.

[10] Amund Kvalbein, Audun Fosselie Hansen, Tarik Cicic, Stein Gjessing, and OlavLysne. Fast IP Network Recovery Using Multiple Routing Configurations. InINFOCOM, 2006.

[11] S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, and C-N. Chuah. Fast Local Rerouting forHandling Transient Link Failures. Accepted for publication in IEEE/ACMTransactions on Networking, available online: http://arena.cse.sc.edu/papers/

fir.ton.pdf, Dec 2006.

[12] S. Bryant, M. Shand, and S. Previdi. IP Fast Reroute Using Not-via Addresses.Internet Draft, available online: http://www.ietf.org/internet-drafts/

draft-ietf-rtgwg-ipfrr-notvia-addresses-02.txt, February 2008.

[13] D. Katz and D. Ward. BFD for IPv4 and IPv6 (Single Hop). Internet Draft,available online: http://www.ietf.org/internet-drafts/

draft-ietf-bfd-v4v6-1hop-08.txt, March 2008.

Page 54: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 51

[14] G. Enyedi, G. Retvari, P. Szilagyi, and A. Csaszar. IP Fast ReRoute: LightweightNot-via without Additional Addresses. Submitted to: INFOCOM 2009, 2009.

[15] M. Medard, R. A. Barry, S. G. Finn, and R. G. Galler. Redundant trees forpreplanned recovery in arbitary vertex-redundant or edge-redundant graphs.IEEE/ACM Transactions on Networking, 7(5):641–652, Oct 1999.

[16] M. Medard, R. A. Barry, S. G. Finn, Wenbo He, and S. S. Lumetta. Generalizedloop-back recovery in optical mesh networks. IEEE/ACM Transactions onNetworking, 10(1):153–164, Feb 2002.

[17] S. Ramasubramanian, M. Harkara, and M. Krunz. Distributed linear timeconstruction of colored trees for disjoint multipath routing. 5th InternationalIFIP-TC6 Networking Conference, May 2006.

[18] T. Cicic, A. F. Hansen, and O. K. Apeland. Redundant trees for fast IP recovery. InBroadnets, pages 152–159, 2007.

[19] G. Enyedi, G. Retvari, and A. Csaszar. A linear time maximal redundant treealgorithm. IEEE ICC, 2008. Available online: http://opti.tmit.bme.hu/

~enyedi/ipfrr/.

[20] GNU Quagga routing software. Available online: http://www.quagga.net.

[21] BFD implementation in Linux Kernel Module (kbfd). Available online: http://

kbfd.sourceforge.net.

[22] R. Coltun. The OSPF Opaque LSA Option. Internet Engineering Task Force: RFC2370, July 1998.

[23] Library of Efficient Models and Optimization in Networks (LEMON). Availableonline: https://lemon.cs.elte.hu/trac.

[24] J. Babiarz, K. Chan, and F. Baker. Configuration Guidelines for DiffServ ServiceClasses. Internet Engineering Task Force: RFC 4594, August 2006.

[25] DBS: Distributed Benchmark System. Available online: http://www.ai3.net/

products/dbs/.

[26] D. L. Mills. Network Time Protocol (Version 3) Specification, Implementation andAnalysis. Internet Engineering Task Force: RFC 1305, March 1992.

[27] Survivable fixed telecommunication Network Design library (SNDlib). Availableonline: http://sndlib.zib.de.

[28] M. L. Garcia-Osma. TID scenarios for advanced resilience. Tech. Rep., The NOBELProject, Work Package 2, Activity A.2.1, Advanced Resilience Study Group, Sep2005.

Page 55: Design, Implementation and Evaluation of an IP Fast ReRoute ...

Design, Implementation and Evaluation of an IP Fast ReRoute Prototype 52

[29] A. Medina, A. Lakhina, I. Matta, and J. Byers. BRITE: Boston universityRepresentative Internet Topology gEnerator. Available online: http://www.cs.bu.

edu/brite, 2005.

[30] A. Shaikh and A. Greenberg. Experience in black-box OSPF measurement. In IMW’01: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement,pages 113–125, 2001.