Evaluating Potential Routing Diversity for Internet Failure Recovery

32
1/21 Evaluating Potential Evaluating Potential Routing Diversity for Routing Diversity for Internet Failure Recovery Internet Failure Recovery *Chengchen Hu, *Chengchen Hu, + Kai Chen, Kai Chen, + Yan Chen, Yan Chen, *Bin Liu *Bin Liu *Tsinghua University, *Tsinghua University, + Northwestern University Northwestern University

description

Evaluating Potential Routing Diversity for Internet Failure Recovery. *Chengchen Hu, + Kai Chen, + Yan Chen, *Bin Liu *Tsinghua University, + Northwestern University. Internet Failures. Failure is part of everyday life in IP networks - PowerPoint PPT Presentation

Transcript of Evaluating Potential Routing Diversity for Internet Failure Recovery

1/21

Evaluating Potential Routing Evaluating Potential Routing Diversity for Internet Failure Diversity for Internet Failure

RecoveryRecovery

*Chengchen Hu, *Chengchen Hu, ++Kai Chen, Kai Chen, ++Yan Chen, *Bin Yan Chen, *Bin LiuLiu

*Tsinghua University,*Tsinghua University,++Northwestern UniversityNorthwestern University

2/21

Internet FailuresFailure is part of everyday life in IP networks

e.g., 675,000 excavation accidents in 2004 [Common Ground Alliance]

Network cable cuts every few days …

Real-world emergencies or disasters can lead to substantial Internet disruptionEarthquakesStormsTerrorist incident: 9.11 event…

3/21

Example: Taiwan earthquake incident

Large earthquakes hit south of Taiwan on 26 December 2006

Only two of nine cross-sea cables not affected

There are abundant physical level connectivity there, but the it took too long for ISPs to find them and use them.

Page 3

figures cited from "Aftershocks from the Taiwan Earthquakes: Shaking up Internet transit in Asia, NANOG42"

4/21

How reliable the Internet is?Internet is not as reliable as people expected!

[Wu, CoNEXT’07]32% ASes are vulnerable to a single critical

customer-provider link cut93.7% Tier-1 ISP’s single-homed customers are lost

from the peered ISP due to Tier-1 depeering

Our question: can we find more resources to increase the Internet reliability especially when Internet emergency happens?

5/21

Basic IdeaTwo places where we can find more routing

diversities:Internet eXchange Points (IXPs)

Co-location where multiple ASes exchange their trafficParticipant ASes in an IXP may not be connected via BGP

Internet valley-free routing policyAS relationships: customer-provider, peering, siblingPeering relaxation (PR): allow one AS to carry traffic from

the other to its providerMentioned in [Wu, CoNEXT’07], but without evaluation

Our main focus: How much can we gain from these two potential

resources, i.e., IXP and PR?

6/21

Dataset for EvaluationMost complete AS topology graph

BGP data Route Views, RIPE/RIS, Abilene, CERNET BGP View

P2P tracerouteTraceroute data from 992, 000 IPs in over 3, 700 ASes

In total, 120K AS links with AS relationshipshttp://aqualab.cs.northwestern.edu/projects/

SidewalkEnds.html [Chen et al, CoNEXT’09]

IXP dataPCH + Peeringdb + Euro-IX (~200 IXPs)3468 participant ASes

7/21

Failure ModelsTier-1 depeering

Real example: Cogent and Level3 depeering

Tier-1 provider-customer link teardownReported in NANOG forum

Mixed types of link breakdown9.11 event, Taiwan earthquakes, 2003 Northeast

blackout

8/21

Evaluation MetricsRecovery Ratio

# of recovered <src-dst> AS pairs versus total # of affected <src-dst> AS pairs

Path Diversity# of increased link-disjoint AS paths between

affected <src-dst> AS pairs

Shifted Path# of link-disjoint AS paths shifted onto a normal

link after we use IXP or PR resources

9/21

Results: Tier-1 Depeering36 experiments for 9 Tier-1 ASesRecovery ratio: most of the lost AS pairs can

be recovered

10/21

Results: Tier-1 DepeeringPath diversity: multiple AS paths between lost

AS pairs

11/21

Results: Tier-1 DepeeringShifted path

On average, 3.75 ~ 17.2 for all 36 experimentsModerate traffic load shifted onto the unaffected

links

12/21

Economic modelB pays to A for recovery

Business modelRisk alliance (like airlines): price is determined

beforehandpay on bandwidth & duration or bits (95

percentile)

A Bpeer

A BP-C

A BP-C

A BIXP

13/21

Communication channel Search for peers

Have direct connections to peers

Search for co-located ASes in the same IXPASes are connected by switches in modern IXPsMessages are broadcasted with the help of the

switchesMessage confidentiality with public key crypto

14/21

Automatic communication Query message (failed AS)

who connected to specific destination ASes

Reply message (surviving AS)I can provide BW1 bandwidth to the destination AS

ACK (failed AS)I would like buy BW2 (<=BW1)

Set up BGP sessionsWithdraw BGP sessions

15/21

Check available connectivity & bandwidthConnectivity

traceroute

Available bandwidthMaximum capacity is already knownEstimate the amount which has been used

Y. Zhang, M. Roughan, N. Duffield, and A. Greenberg, “Fast Accurate Computation of Large-Scale IP Traffic Matrices from Link Loads,” ACM SIGMETRICS, 2003.

Subtract

16/21

Optimal selection of helper ISPsFrom a single victim ISP perspective

Buy transit from a minimal number of ASesRecover all the (prioritized) traffic Least cost

17/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi}

is how much bandwidth AS j could provide to Di;ijx

18/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi}

Score each (helper) AS j with Select the AS with largest score (select the one with lowest price if same score)

min( / ,1)ij iix B

3 2.3

5 2.1

19/21

Selection heuristic Update Lost connectivity to {Di}, with bandwidth demand {Bi}

updated

20/21

Selection heuristic rescore and selectLost connectivity to {Di},

with bandwidth demand {Bi}

1 0.3

0.10

21/21

SummaryFirst work to evaluate the potential routing

diversity via IXP and PR with the most complete AS topology graph.

40%-80% of affected <Src, Dst> AS pairs can be recovered via IXP and PR with multiple paths and moderate shifted paths.

Point out a new venue for Internet failure recovery.Possible and practical mechanisms to utilize

potential routing diversity.

Look forward to feedback and collaborations from IXP/ISPs!

22/21

Thank you!Thank you!

Q&AQ&A

23/21

BackupBackup

24/21

Failure ModelsTier-1 depeering

Real example: Cogent and Level3 depeering

Tier-1 provider-customer link teardownReported in NANOG forum

Mixed types of link breakdown9.11 event, Taiwan earthquakes, 2003 Northeast

blackout

25/21

Results: Tier-1 provider-customer links teardown Recovery ratio

Path diversity4.64 for 10 Tier-1 provider-customer links teardown4.54 for 20 Tier-1 provider-customer links teardown

Shifted pathThe average number of shifted path when 10, 20 and 30

links are damaged are 3.4, 4.0 and 4.2, respectively.

26/21

Results: Mixed types of links breakdownTaiwan earthquake, 9 big victim ASesRecovery ratio

27/21

Results: Mixed types of links breakdownPath diversity

28/21

Results: Mixed types of links breakdownShifted path

29/21

System framework Adding an Emergency Recovery (ER) module in a

router’s control plane Setting up the communications between ER and the

Intra-TE Resource Management modules.

30/21

Building communication channelAn example

31/21

Optimal selection of ISPs to helpFrom global view

Min. shift path or tuned AS-linksst. recover all the (prioritized) traffic we could

or

Max. recovery ratiost. shift path or tuned AS-links

From a single ISPMin. cost for the ISPst. recover all the (prioritized) traffic we could

or

Max. recovery ratiost. cost for the ISP

32/21

Selection heuristic Lost connectivity to {Di}, with bandwidth demand

{Bi} is how much bandwidth AS j could provide to Di;Score each (helper) AS j with Select the helper AS with largest score (select the

one with lowest price if same score)Update {Di} by deleting the recovered AS Update {Bi} by subtracting the recovered

bandwidthrescore and select the next helper ASIteration till all are recovered

min( / ,1)ij iix B

ijx