Unicast Routing and BGP - Colorado State Universitycs557/slides/03.pdf · Unicast Routing and BGP...
Transcript of Unicast Routing and BGP - Colorado State Universitycs557/slides/03.pdf · Unicast Routing and BGP...
Unicast Routing and BGP
CSU CS557, Spring 2018 Instructor: Lorenzo De Carli
(Slides by Christos Papadopoulos, remixed by Lorenzo De Carli)
What is this lecture about?
• A general introduction to routing • A review of the BGP routing protocol • A discussion of BGP limitations
(assigned reading)
Forwarding V.S. Routing• Forwarding: the process of moving packets from
input to output based on: – the forwarding table – information in the packet
• Routing: process by which the forwarding table is built and maintained: – one or more routing protocols – procedures (algorithms) to convert routing info to
forwarding table
Forwarding Examples
• To forward unicast packets a router uses: – destination IP address – longest matching prefix in forwarding table
• To forward multicast packets: – source + destination IP address and incoming
interface – longest and exact match algorithms
Factors Affecting Routing
4
36
21
9
1
1D
A
FE
B
C
• Routing algorithms view the network as a graph
• Problem: find lowest cost path between two nodes
• Factors – static: topology – dynamic: load – policy
Distance Vector Protocols
• Employed in the early Arpanet • Distributed next hop computation
– adaptive
• Unit of information exchange – vector of distances to destinations
• Distributed Bellman-Ford Algorithm
Distributed Bellman-FordStart Conditions: Each router starts with a vector of (zero) distances to all directly attached networks
Send step: Each router advertises its current vector to all neighboring routers.
Receive step: Upon receiving vectors from each of its neighbors, router computes its own distance to each neighbor. Then, for every network X, router finds that neighbor who is closer to X than any other neighbor. Router updates its cost to X. After doing this for all X, router goes to send step.
Example - Initial Distances
A
B
E
C
D
Info at node
ABCD
A B C
0 7 ~
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 ~ 2
1
8~20
E
E Receives D’s Routes
A
B
E
C
D
Info at node
ABCD
A B C
0 7 ~
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 ~ 2
1
8~20
E
E Updates Cost to C
A
B
E
C
D
Info at node
ABCD
A B C
0 7 ~
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 4 2
1
8~20
E
A Receives B’s Routes
A
B
E
C
D
Info at node
ABCD
A B C
0 7 ~
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 4 2
1
8~20
E
A Updates Cost to C
A
B
E
C
D
Info at node
ABCD
A B C
0 7 8
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 4 2
1
8~20
E
A Receives E’s Routes
A
B
E
C
D
Info at node
ABCD
A B C
0 7 8
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
~
~20
E 1 8 4 2
1
8~20
E
A Updates Cost to C and D
A
B
E
C
D
Info at node
ABCD
A B C
0 7 5
7 0 1~ 1 0~ ~ 2
7
1
1
2
28
Distance to node
D
3
~20
E 1 8 4 2
1
8~20
E
Final Distances
A
B C
D
Info at node
ABCD
A B C
0 6 5
6 0 15 1 03 3 2
7
1
1
2
28
Distance to node
D
3
320
E 1 5 4 2
1
5420
E
E
Final Distances After Link Failure
A
B C
D
Info at node
ABCD
A B C
0 7 87 0 18 1 010 3 2
7
1
1
2
28
Distance to node
D
10320
E 1 8 9 11
189110
E
E
View From a Node
A
B
E
C
D
dest
ABCD
A B D
1 14 5
7 8 56 9 44 11 2
7
1
1
2
28
Next hop
E’s routing table
How Are These Loops Caused?
• Observation 1: – B’s metric increases
• Observation 2: – C picks B as next hop to A – But, the implicit path from C to A
includes itself!
Solution 1: Holddowns
• If metric increases, delay propagating information – in our example, B delays advertising route – C eventually thinks B’s route is gone,
picks its own route – B then selects C as next hop
• Adversely affects convergence
Other “Solutions”
• Split horizon – B does not advertise route to C – In general, prevents routes from being advertised
on the interface through which they were learned
• Poisoned reverse – B advertises route to C with infinite distance
Works for two-node loops – does not work for loops with more nodes
Example Where Split Horizon Fails
1
11
1
A B
C
D
1. When link breaks, C marks D as unreachable and reports that to A and B.
2. Suppose A learns it first. A now thinks best path to D is through B. A reports D unreachable to B and a route of cost=3 to C.
3. C thinks D is reachable through A at cost 4 and reports that to B.
4. B reports a cost 5 to A who reports new cost to C.
5. etc...
D: inf
D: 3D: 4
D: 5
D: 6
Avoiding the Bouncing Effect
Select loop-free paths • One way of doing this:
– each route advertisement carries entire path – if a router sees itself in path, it rejects the
route BGP does it this way Space proportional to diameter
Cheng, Riley et al
Distance Vector in Practice
• RIP and RIP2 – uses split-horizon/poison reverse
• BGP/IDRP – propagates entire path – path also used for effecting policies
Sources
– John Stewart III: “BGP4 - Inter-domain routing in the Internet”
– RFC1771: main BGP RFC – RFC1772-3-4: application, experiences, and
analysis of BGP – RFC1965: AS confederations for BGP – Christian Huitema: “Routing in the Internet”,
chapters 8 and 9 – Cisco tutorial on line
31
Autonomous Systems
• What is an AS? – a set of routers under a single technical administration
(A single organization may own multiple ASes) – a set of advertised prefixes – uses an interior gateway protocol (IGP) and common
metrics to route packets within the AS – uses an exterior gateway protocol (EGP) to route
packets to other AS’s • AS may use multiple IGPs and metrics, but appears
as single AS to other AS’s
32
Example
1 2
3
1.11.2
2.1 2.2
3.1 3.2
2.2.1
44.1 4.2
5
5.1 5.2
EGP
IGP
EGPEGP
IGP
IGP
IGPIGP
EGPEGP
33
History
• Mid-80s: EGP – reachability protocol (no shortest path) – did not accommodate cycles (tree topology) – evolved when all networks connected to
ARPANET • Limited size network topology • Result: BGP introduced as routing protocol
34
Path Vectors
• Each routing update carries the entire path as a sequence of ASes
• Loops are detected as follows: – when AS gets route check if AS already in path
• if yes, reject route • if no, add self and (possibly) advertise route further
• Advantage: – metrics are local - AS chooses path, protocol
ensures no loops35
Interconnecting BGP Peers
• BGP uses TCP to connect peers (port 179) • Advantages:
– BGP much simpler – no need for periodic refresh - routes are valid
until withdrawn, or the connection is lost – incremental updates
• Disadvantages – congestion control on a routing protocol?
36
Hop-by-hop Model
• BGP advertises to neighbors only those routes that it uses – consistent with the hop-by-hop Internet
paradigm – e.g., AS1 cannot tell AS2 to route to other ASes
in a manner different than what AS2 has chosen (need source routing for that)
37
AS Categories
– Stub: an AS that has only a single connection to one other AS - carries only local traffic
– Multi-homed: an AS that has connections to more than one AS, but does not carry transit traffic
– Transit: an AS that has connections to more than one AS, and carries both transit and local traffic (under certain policy restrictions)
38
Policy With BGP
• BGP provides capability for enforcing various policies
• Policies are not part of BGP: they are provided to BGP as configuration information
• BGP enforces policies by choosing paths from multiple alternatives and controlling advertisement to other AS’s
40
Examples of BGP Policies
• A multi-homed AS refuses to act as transit – limit path advertisement
• A multi-homed AS can become transit for some AS’s – only advertise paths to some AS’s
• An AS can favor or disfavor certain AS’s for traffic transit from itself – Pick appropriate routes by examining path vectors
41
BGP Is NOT Needed If:
• Single homed network (stub)
• AS does not provide downstream routing
• AS uses a default route
42
Path Selection Criteria
• Information based on path attributes • Attributes + external (policy) information • Examples:
– hop count – policy considerations – presence or absence of certain AS – path origin (EGP, IGP) – link dynamics (flapping, stable)
43
Path Attributes
• Categories (recall flags): – well-known mandatory (passed on) – well-known discretionary (passed on) – optional transitive (passed on) – optional non-transitive (if unrecognized, not passed
on) • Optional attributes allow for BGP extensions • Path attributes are used by routers to select routes • Many possible attributes; we are only going to
discuss one example44
AS_PATH Attribute
• Well-known, mandatory attribute • Important components:
– list of traversed AS’s • If forwarding to internal peer:
– do not modify AS_PATH attribute • If forwarding to external peer:
– prepend self into the path
45
BGP Problems: Delayed Convergence
• Question: – How long does it take for a route to fail-over?
• How to answer this question: – Experimental methodology – Explanation of observation using simple model
48
Key Idea
• Study and understand BGP convergence time –simulation –measurement
• Suggests bounds of O(n!) and O((n-3)*30s)
49
Why Is Convergence Important?
• Robustness – PSTN (telephone) fail-over times are in
milliseconds – Internet fail-over times are in 10s of seconds
50
Methodology
• Two years of traces • Introduce artificial faults across Internet
– but for only their AS, of course! – failures, repairs, fail-over
• Measure: – Tup: new route, Tdown: old route goes away, Tshort
& Tlong: a route shrinks or grows
51
Cum
ulat
ive
Perc
enta
ge o
f Eve
nts
0
25
50
75
100
Seconds Until Convergence0 20 40 60 80 100 120 140 160
TupTshortTlongTdown
Short
->Lon
g Fail
-Ove
r
New
Rou
te
Long
->Sh
ort F
ail-o
ver
Tup
and
Tsho
rt
Failu
re
• Less than half of Tdown events converge within two minutes • Long tailed distribution (up to 15 minutes)
Observed Convergence Latency
53
Impact on Traffic([Labovitz00a] figure 4a)
Why does loss go up?⇒Because there are route loops in the net causing packet drops.
54
How To Tell What’s Going On?
• Simulate BGP – model one router per AS – assume full routing mesh – ignore latency – synchronous processing via global queue ⇒simple model that captures key details
55
BGP Convergence Example
• What’s the problem here? • BGP’s loop detection prevents looped paths (e.g., in step 2 AS1 removes path
through 0 as it now includes 1 itself) • However, it does not prevent AS’es learning incorrect routes from neighbors
(e.g., in step 3 AS2 learns the non-existent 10R path from 1 and adds it to its routing table)
• Worst case (fully connected network graph): BGP tries all possible n! network paths before giving up and concluding that the route is not reachable
AS 0
AS 1AS 2
R
2 -> 0 21R2 -> 1 21R
2 -> 0 21R2 -> 1 21R
2 -> 0 21R2 -> 1 21R
2 -> 0 21R2 -> 1 21R
2 -> 0 21R2 -> 1 21R
0 -> 1 02R0 -> 2 02R
0 -> 1 02R0 -> 2 02R
2 -> 0 201R2 -> 1 201R
0 -> 2 W0 -> 1 W 1 -> 0 120R
1 -> 2 120R
2 -> 0 201R2 -> 1 201R
0 -> 1 02R0 -> 2 02R
2 -> 0 201R2 -> 1 201R 0 -> 2 W
0 -> 1 W 1 -> 0 120R1 -> 2 120R 0 -> 2 012R
0 -> 1 012R
0 -> 2 W0 -> 1 W 1 -> 0 120R
1 -> 2 120R 0 -> 2 012R0 -> 1 012R
Routing Tables
0(*R, 1R, 2R) 1(0R, *R, 2R) 2(0R, 1R, *R)
steady state
Messages Processing Messages Queued in System
1 -> 0 10R 1 -> 2 10R
0
R withdraws its route1 0 -> 2 01R 0 -> 1 01R
1 -> 0 10R 1 -> 2 10R
1 -> 0 12R1 -> 2 12R2
1 and 2 receive new announcement from 0
1 -> 0 10R 1 -> 2 10R0(-, -, *2R) 1(-, -, *2R) 2(*01R, 10R, -)
0 -> 1 01R 0 -> 2 01R
R -> 0 WR -> 1 WR -> 2 W
1 -> 0 12R1 -> 2 12R3
0 and 2 receive new announcement from 1
1 -> 2 12R1 -> 0 12R
4
1 -> 0 12R1 -> 2 12R5
0 and 1 receive new announcement from 2
0 and 2 receive new announcement from 1
6
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0(-, *1R, 2R) 1(*0R, -, 2R) 2(*0R, 1R, -)
0(-, *1R, 2R) 1(-, -, *2R) 2(01R, *1R, -)
2 -> 0 20R2 -> 1 20R
2 -> 0 20R2 -> 1 20R
2 -> 0 20R2 -> 1 20R
2 -> 0 20R2 -> 1 20R
0 and 1 receive new announcement from 2
(steps omitted)
48 N/A 0(-, -, -) 1(-, -, -) 2(-, -, -)steady state
0(-, -, -) 1(-, -, *20R) 2(*01R, 10R, -)
0 -> 1 02R0 -> 2 02R
2 -> 0 201R
0(-, *12R, -) 1(-, -, *20R) 2(*01R, -, -)
0(-, *12R, 21R) 1(-, -, -) 2(*01R, -, -)
2 -> 1 201R
1 -> 0 W1 -> 2 W
Stage Time
5.1 Upper Bound on Convergence
182
Does this explain measurements?
• Tup/Tshort converge quickly because they shorten path length and therefore are quickly accepted
• Tdown/Tlong converge slowly because BGP tries hard to find all alternatives
57
What can be done about this?
• Route advertisement delay helps alleviate the problem (i.e. minimum waiting times between two successive advertisements of routes to a particular AS)
• Could do loop detection at sender side and not just receiver side (i.e., if a route goes through AS N, do not advertise that route to AS N)
58