Modeling Inter-Domain Routing Protocol Dynamics ISMA 2000 December 6, 2000
-
Upload
hyatt-dotson -
Category
Documents
-
view
28 -
download
2
description
Transcript of Modeling Inter-Domain Routing Protocol Dynamics ISMA 2000 December 6, 2000
Modeling Inter-Domain Routing
Protocol Dynamics
ISMA 2000December 6, 2000
In collaboration with Abha, Ahuja, Roger Wattenhofer, Srinivasan Venkatachary, Madan Musuvathi
Craig LabovitzMerit Network/Microsoft Research
2
Routing DynamicsGoal: Develop a model of Internet inter-domain routing
protocol dynamics. Easy, right?
Subgoals– Model impact of failures and topological changes on end-to-end
paths– Predict/measure reliability of inter-AS links, routers, etc.– Compare steady-state topology compare to topologies under
failure– Figure out where all of those darn BGP updates come from
3
Stuff
• Old stuff– Measurements of BGP updates and convergence– Model BGP convergence (upper and lower bounds)
• New Stuff– Protocol timer trade-offs– Improvements to BGP (BGP-CT)
4
Data Sets & Tools• Default-free BGP peering sessions
– (routeviews.merit.edu, 2 Equinix probes, 1 Mae-West, several iBGP probes, Merit RSNG route servers)
– Daily tables and all BGP updates/events sent to RS over last five years
– Daily default-free dumps (and all updates/events) for 20-30 peers for last two years
• Fault injection probes (OSPF/BGP)• Analysis/Tools
– MRT/Perl (playing with SSFNet)– RouteTracker (whois.routetracker.net)
5
Internet BGP Update Volume
• Withdraws in millions until 2/1998 due to withdraw looping/Cisco bug. Dramatic drop after IOS release
Announcements growing after 6/98 due to MED policy and convergence?
6
MTTF of Backbone Networks
• Informally: How long before a network is unreachable?• Majority of Internet routes unreachable within 30 days
7
Mean Time to Fail-Over
• How long before traffic is re-routed?• Majority of Internet routes which possess backup paths fail-over
every 3 days
8
Internet Route Repair
• How long before a network is reachable again?• Long-tailed distribution with plateau at 30 minutes. Why this plateau?
9
BGP Convergence
• If complete graph, N! upper theoretic bound and 30*(N-3) lower bound
• In practice, Internet has hierarchy and customer/provider/sibling relationships. Bounded by length longest possible path
10
BGP Convergence ExampleR
AS0 AS1
AS2AS3
*B R via 3 B R via 03 B R via 23
*B R via 3 B R via 03 B R via 13
*B R via 3 B R via 13 B R via 23
AS0 AS1 AS2
** **B R via 203
*B R via 013 B R via 103
11
Observed Fault Injection Topologies
• In steady-state, topologies between ISP1, ISP2, ISP3 similar – all direct BGP peers of ISP4.
• Repeatedly withdrew single-homed route (R1, R2, R3)
Steady State
ISP 1R1
Withdraw
ISP 4
ISP 2R2
Withdraw
Steady State
ISP 3R3
Withdraw
Steady State
MAE-WEST
12
Comparing ISP Convergence Latencies
• CDF of faults injected into three Mae-West providers and observed at ISP router in Japan• Significant variations between providers
13
ISP1-ISP4 Paths During Failure
• Only one back up path (length 3)
Steady State
ISP 1
ISP 5
P2
P2ISP 4
R1
FAULT
96% Average: 92 (min/max 63/140) seconds
Announce AS4 AS5 AS1 (44 seconds)
Withdraw (92 seconds)
4% Average: 32 (min/max 27/38) seconds
Withdraw (32 seconds)
14
ISP2-ISP4 Paths During Failure
Stea
dy S
tate
ISP 2
ISP 4
P2
P2
ISP 5
P3
P3
P3
ISP 6
R2
FAULT
Vagabond
P4
P4ISP 10
ISP 11
ISP 12
ISP 13
P4
P4
P4
63% Average: 79 (min/max 44/208) seconds
AS4 AS5 AS2 (35 seconds)
Withdraw (79 seconds)
7% Average: 88 (min/max 80/94) seconds
Announce AS4 AS5 AS2 (33 seconds)
Announce AS4 AS6 AS5 AS2 (61 seconds)
Withdraw (88 seconds)
7% Average: 54 (min/max 29/9) seconds
Withdraw (54 seconds)
23% Other
15
ISP3-ISP4 Paths During Failure
ISP 3
Stea
dy S
tate
ISP 4
R3
P2
P2
ISP 5
FAULT
ISP 1
P3
P3
P6
P7
P7P4
P4
P5
P5
P5P5
P5
P6
P6
P6
ISP 7
ISP 9
ISP 8
P7
P7
P4
36% Average: 110 (min/max 78/135) seconds
Announce AS4 AS5 AS (52 seconds)
Withdraw (110 seconds)
35% Average: 107 (min/max 91/133) seconds
Announce AS4 AS1 AS3 (39 seconds)
Announce AS4 AS5 AS3 (68 seconds)
Withdraw (107 seconds)
2% Average:140.00 (min/max 120/142)
Announce AS4 AS5 AS8 AS7 AS3 (27) Announce AS4 AS5AS9 AS8 AS7 AS3 (86)
Withdraw (140 seconds)
27% Other
16
Race Conditions and Paths
• T(shortest path) <= Tdown <= T(longest path)
B
A
17
Relationship Between Backup Paths and Convergence
• Convergence related to length of longest possible backup ASPath between two nodes
Longest Observed ASPath Between AS Pair
18
Towards Fast BGP Convergence
Four possible solutions
• No transit/One-hop topology (peer and filter everyone)
• Turn off/Change MinRouteAdver timer• “Tag” BGP updates and provide hint so
nodes can detect bogus state information• Entirely new protocol
19
255 AS Topology0
10
20
30
4050
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450Seconds
Cumulative Percentage
MRA, CT
MRA, No CTNo MRA, No CT
20
255 AS Topology0
10
20
30
4050
60
70
80
90
100
0 200 400 600 800 1000 1200 1400 1600 1800Number of Messages
Cumulative Percentage
MRA, CT
MRA, No CTNo MRA, No CT
21
BGP-CT• Incremental addition to BGP4
– Capability negotiation– Tags carried in as multi-protocol NRLI extension– Invalidate alternative paths if match tag (and other necessary
conditions met)
• Details– New state machine additions (temporary invalidation)– Works with iBGP– Implemented MRT and deployed on CAIRN– Improves BGP convergence by an order of magnitude in most
cases (in a few cases, behavior is worse)