Multi-topology protection: promises and problems G. Apostolopoulos Institute of Computer Science...

28
Multi-topology protection: promises and problems G. Apostolopoulos Institute of Computer Science Foundation Of Research and Technology Hellas (FORTH)

Transcript of Multi-topology protection: promises and problems G. Apostolopoulos Institute of Computer Science...

Multi-topology protection: promises and problems

G. Apostolopoulos

Institute of Computer Science Foundation Of Research and Technology

Hellas (FORTH)

2MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Basic concept of MT protection

Based on IETF proposed MT extensions to IGPs Routers have multiple-routing tables

Need to pick a routing table for each incoming packet Different addresses

Various types of packet marking

Use MT to repair failuresWhen a link/node fails affected traffic is locally switched to a pre-computed “backup” topology

Each destination in the FIB has a backup next-hop that is activated when a local link fails

Traffic reaches the destination over the backup topology without loops

3MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT protection

Primary

Backup

s

d

Mark traffic to send to backup top

Traffic reaches dest over backup top

4MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Advantages of MT

Fast local repair of failure

Can repair all possible failures (single link or node) Multiple failures can be detected and addressed

No need to distinguish between link and node failures

ECMP, SRLG, lan failures, and multi homed prefixes can be handled easily

No need for tunnelingBut must mark packets instead

Can optimize how traffic is routed after the failure by manipulating link weights on the backup topologies

Failure may not last long but even so traffic impact is undesirable

5MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

But there are issues

Basic operation and optimization have been worked out Rough overview of some remaining issues:

How to differentiate traffic Premium versus regular and BE

How to use MT in a real network: Multiple areas, Inter-AS, Hot-potato routing with BGP transit trafficHow to return to normal after failure is repaired

Operational issuesHow complex is to configure? How expensive is to monitor/troubleshoot? Incremental deployment?

How to optimize link weights What to optimize for? Need to know traffic matrix

6MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Traffic differentiation

Premium and regular/BE traffic

I may be willing to preempt non-premium traffic to make sure “premium” traffic is still ok

Standard practice with existing CSPF-FRR architectures

Different topologies for each traffic classOne of the envisioned uses of MT anyway

scalability?

Traffic optimization goals may be different nowHave to consider the interaction between the traffic types

Minimize effect on premium traffic

Do not starve BE traffic

7MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

How to return to normal after repair?

Failure is repaired and IGP is re-convergingHow to switch traffic back to the initial (no-failure) topology without micro-loops

How to avoid micro-loops in general After each IGP convergence event

SolutionUse a “fixed” topology – same as the primary topology

Continue routing traffic over the fixed/backup topology

Let IGP converge in the primary topology

After “convergence is complete”, switch all traffic to the primary topology

8MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Converge in a separate topology

Fixed

Backup

s

ds

d

Primary

Switch traffic after IGP has converged

9MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

How to tell when IGP has converged

Use a “convergence” timer in IGP

Start it when a change that will require IGP re-convergence is detected

All traffic is forwarded over fixed/backup topologies

Must move traffic from primary to fixed

During convergence New routes are installed in primary topology

After timer expires (after IGP has converged)Switch all fixed and backup traffic to primary topology

Since no topology is in flux no micro-loops will occur from switching topologies

10MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT with multiple areas

Some of the destinations may be summary routes coming from outside areas

Need to map these summary routes to backup topologies in order to compute the backup next-hop for them

IGP can do this mapping

Link and non-ABR node failures Backup topologies for each area cover failures inside the area

No need to coordinate with other areas

Need to unmark the repaired packets when they leave the area

Remote area does not know about local backup topologies

11MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT with areas

ABR failures Failure affects two areas

Need a primary and a backup ABR for each summary route

Simple case: primary and backup ABR connect to the same area

Can handle with local backup topology

unmark packet when it leaves the area

Hard case: primary and backup ABR connect to different areas

Need to coordinate backup topologies among these areas else packet may loop

12MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Multi-area MT example

Route-1

Unmark the packet

Packet will reach dest without issue

13MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Multi-area MT example

Route-2

Unmark the packet

Packet will not reach Dest needs coordination

14MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Other reasons for looking at all areas together

SRLGs may be different in each areaE.g. area 1 can not use ABR 2 as backup due to SRLG constraints in area 2

May be necessary if I want to optimize routing Backup topologies for different areas will have to coordinate their link weights for most effective routing after a failure

But it may be too expensive to optimize such a large topology

15MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Inter-AS traffic

Cover failures of border routers and peering links Peering links do not belong to IGP Need extensions to let IGP know about these links

Stub links

Stub (potentially multi-homed) ISP and outgoing traffic

Similar to the area problem IGP can compute the backup topologies

Can compute few, independent of the number of BGP prefixes

Need to map BGP prefixes to these topologies to compute their backup next-hops

Should not have to import all BGP routes into IGP

Repaired packets need to be unmarked as they leave the AS

16MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Inter-AS operation

Prefix

Special node

17MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

BGP-IGP interactions

How to map the BGP routes to the backup topologies and compute the backup next-hops for the BGP routes

Backup topologies are computed by IGP

Prefix reachability is controlled by BGP policy decisions

One approachBGP will have to tell RIB which two border routers can be used for reaching a prefix

BGP must have a concept of a “backup” border router for each prefix

IGP will tell RIB about the backup topologies

RIB will compute the backup next-hop for BGP routes on their way to the FIB

18MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT with hot-potato BGP traffic

Problem: Changes in the IGP weights/topology can cause massive shifts to transit BGP traffic

MT can help By avoiding micro-loops during IGP convergence

By creating a BGP forwarding topology that is engineered and protected with MT and insensitive to some of the changes in the IGP layer

This topology can be applied to only selected transit BGP prefixes

Optimization of traffic routing after failures becomes quite useful now

19MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Other concerns

What is the administrative overhead of MT?

What is the performance overhead of MTStorage?

IGP signaling?

20MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Administrative overhead

Need to manage multiple IGP topologies OSS tools will need to be extended

If backup topologies are optimized then I need to manage multiple sets of IGP link weights

Quite a bit of effort But done by automated offline tools anyway

Troubleshooting and monitoring Over which topology this prefix is routed? What is the connectivity status of topology T? All tools (ping, traceroute) may have to be upgraded depending on the topology de-multiplexing method

Does not look too good but compare:With full mesh of statically configured and optimized LSPs for TEWith statically configured FRR tunnels

Incremental deployment is tricky! May not be able to guarantee protection from all failures if not supported by all nodes

21MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

So how bad is scalability

Simulations show that Can repair failures with 3-4 backup topologiesCan optimize routing after failure with 6-8 topologies

How much does each topology cost? SPF computation

One SPF per topology, few topologies so not an issue

IGP signalingNo extra cost, single adjacency for all topologies

IGP RIB spaceSeparate routing tables for each topologyCan share next-hops

System RIB spaceSeparate routing table for each topology Only for IGP routes

BGP routes will not have to be replicated

Can share next-hops

22MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Example FIB structure

0 0 0 0 1 0 3 2 1 2 3 2

Shared Next hop structures

TopologyHASH

Prefixlookup

Primary Table

Backup Table 0 0 0 0 X0 0 0 40 0

3 topologies4 next-hops for ECMP

23.45.6.2 1

23MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT with MPLS

Use MPLS labels for de-multiplexors

Build a MPLS forwarding plane for each topology using LDP

For VPNs/BGP free cores

Simple LDP extensions Essentially MT-LDP

No need to encapsulate traffic on a failure

Simpler than RSVP-TE/FRR less signaling overheads

Configuration overhead is not clear thoughDepends a lot on the OSS tools used

24MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT with multicast

Has become interesting with the advent of IP TV etc…

IETF discusses methods to extend LDP for P2MP LSPs and

MPLS-FRR for P2MP LSPs

MT protection can be easily extended to be used there using P2MP MPLS labels and P2MP extensions to LDP

And we can still optimize traffic after failure

25MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Dynamic TE

Traffic matrices can change significantly DDoS attacks, Diurnal patterns, Failures

Adjust routing Does not have to happen extremely fast

Ideally this should happen automatically

CSPF-MPLS has ways to cope with this Traffic flows inside LSPs

May need a full mesh though

If a link gets overloaded cat try to shift some LSPs away from it

Doing this automatically can lead to oscillations

26MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

MT dynamic TE

When a link is overloaded need to shift traffic away from it

Option A:Create topology T1, re-optimize weights in T1 and shift all traffic to T1

Needs a coordinated switch, large impact on the network

Better when change in traffic patterns is permanent

Option B: Shift only some (S,D) pairs to T1

No need for coordinated switch and smaller impact to network

Better for temporary changes in the traffic patterns

Optimization problemWhat traffic I send into T1

What link weights I use for T1?

Can I do something that is adaptive/feedback based?

27MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Immediate work items

MT with multiple traffic classes New optimization constraints (interaction among traffic types)

P2MP protectionOptimization issues

What are the right optimization goals?

combine P2MP and P2P backup for reaching the optimal solution

Dynamic TE with MT Optimization for both the (S,D) and the link weights

Adaptive based on congestion feedback?

28MT protection: promises and problems Simula Research Lab, Oslo, April 20 2007

Other interesting items

MT in Ethernet networks There have been some related proposals already

Use MT routing to handle rapid changes in links like those in a wireless network

Link fading could be considered a partial link failure

Deal with inaccurate traffic matrices Solutions that adapt to changing traffic matrices

Algorithms that find good routings even when given inaccurate traffic matrices

There has been significant work on the traffic matrix estimation and inaccuracy problem