Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian...

29
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian...

Page 1: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Delayed Internet Routing Convergence

Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian

Presented By

Harpal Singh Bassali

Page 2: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Introduction

Conventional Wisdom - Rapid restoration and rerouting in the event of link or router failure.

Actual convergence time of the order of minutes!!

What happens to the data packets till then? Loss of connectivity Packet Loss Latency

Page 3: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Infrastructure

Used both passive data collection and fault-injection machines.Data collected over a 2 year period.Injected over 250,000 routing faults from diverse locations. Used RouteView probes to monitor BGP updates in core internet routers. Active probe machines measured end-to-end performance by sending ICMP echo messages to random web sites.

Page 4: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Infrastructure

Page 5: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Taxonomy

Tup : A previously unavailable route is announced as available.

Tdown : A previously available route is withdrawn.

Tshort : An active route with a long ASPath is implicitly replaced by a new route with a shorter ASPath.

Tlong : An active route with a long ASPath is implicitly replaced by a new route with a shorter ASPath.

Page 6: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Routing MeasurementsLatency Vs Number of BGP updates

Page 7: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations Long Tailed distribution. 20% of Tlong and 40% of Tdown take more than 3 minutes

to converge. (Tshort, Tup) and (Tlong, Tdown) form equivalence classes. A 20 second separation between Tlong and Tdown.

Tdown and Tlong had twice as many update messages as Tshort and Tup.

Strong correlation between number of updates and latency.

Page 8: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Routing Measurements

Latency Vs Type of BGP update

Page 9: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations

Significant variation in convergence latencies for the ISPs. No correlation between convergence latency and

geographic or network distance. Factors contributing to Internet fail-over delay are

independent of network load and congestion.

Page 10: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

End-to-End Measurements

Page 11: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

ObservationsPacket Loss Vs Type of BGP update Less than 1% packet loss throughout the 10 minute period. Tlong event has 17% and Tshort event has 32% packet loss. Wider curve of Tlong due to the slower speed of routing

table convergence.

Page 12: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations

Latency Vs Type of BGP update Wider curve of Tlong due to the slower speed of routing

table convergence. Tup event had all it’s packet within 1 minute.

Page 13: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

BGP ConvergenceUpper Bound on Convergence

Page 14: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Assumptions

Each AS is a single node. We have a complete graph of Ases. Exclude the analysis of MinRouteAdver. Model the BGP processing as a single linear, global queue.

Page 15: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

BGP ConvergenceUpper Bound on Convergence

Page 16: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Results

Loop detection, if performed at both sender and receiver side, all mutual dependencies could be discovered and eliminated in a single round.

Convergence Latency is independent of geographic and network distance.

These variations are directly related to topological factors like the length and number of possible paths between ASes.

Page 17: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

The Impact of Internet Policy and Topology on

Delayed Routing Convergence

Craig Labowitz, Roger Wattenhofer, Srinivasan Venkatachary and Abha Ahuja

harpal:

vbfdsvdjn

harpal:

vbfdsvdjn

Page 18: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Major Results

Internet fail-over convergence = , where n is the length of the longest backup path between source and destination.

Customers of bigger ISPs exhibit faster convergence. Errant paths are frequently explored during delayed

convergence.

)(*30 n

Page 19: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Methodology

Inject BGP route transitions into more than 10 geographically and topologically diverse providers.

A set of probe machines actively injected faults at random intervals of roughly 2 hours.

Generated faults over a six month period. Treated the address space as a customer wrt to policy and

filtering by the cooperating providers. Logged periodic routing table snapshots and all BGP

updates from additional 20 ISPs.

Page 20: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Inter-provider Relationships

Peer : Bilateral exchange of customer and backbone routing information. Routes learnt from other peers and upstream providers are not exchanged.

Customer/Transit : The customer announces its backbone and downstream routes to an upstream provider.

Backup transit : A peer relationship in which a provider only provides transit after detection of a fault. Both are peers in steady-state but after a failure, the backup transit peer begins advertising its now downstream peer’s backbone and customer routes.

Page 21: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Relationships

Page 22: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Convergence Topologies

Page 23: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations

Page 24: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Conclusions

Vagabond paths are responsible for delays in convergence. The more densely the router is peered, the more time it

takes to converge. MinRouteAdver responsible for significant additional

latency during delayed convergence.

Page 25: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Topology Impact on Convergence

Page 26: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations

Long-tailed distribution due to vagabond paths. ISP3 exhibits significantly slower convergence times. Average convergence latency for a route failure

corresponds to the longest possible backup path allowed by policy and topology.

Page 27: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Latency Vs Longest ASPath explored

Observations(contd.)

Page 28: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Observations(contd.)

Provider Type Vs Observed ASPath length

Page 29: Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.

Conclusions

Customers sensitive to fail-over latency should multi-home to larger providers.

Smaller providers should limit their number of transit and backup transit interconnections.

A large number of vagabond paths suggest a need for a better route validation and authentication mechanism.