BGP Tutorial - Internet2 · PDF fileWhat’s In Store? • Routing Architectures •...

Post on 18-Mar-2018

224 views 4 download

Transcript of BGP Tutorial - Internet2 · PDF fileWhat’s In Store? • Routing Architectures •...

BGP Tutorial

John S. Graham

What’s In Store?

•  Routing Architectures •  BGP Basics •  Path Selection (many examples) •  BGP State Machine •  Multiprotocol Extensions

– Multicast Routing Covered – Layer-III VPN Omitted

Cisco Architecture

BGP

RIP OSPF

Static EIGRP

192.168.0.0/16

Metric:

Next-Hop:

172.16.0.0/12

Metric:

Next-Hop:

10.0.0.0/8

Metric:

Next-Hop:

IP Routing Table

Export

Import

BGP

RIP

OSPF

Static

IS-IS

Juniper Architecture

Metric 1:

Metric 2:

Next Hop: Area:

172.16.0.0/12

Export

Import

AS-Path: Community: Level:

BGP Schematic (IPv4 Unicast)

BGP Path Selection

Routing Policy

<

>

< B

GP Table

Routing Table

>

<

<>

inet.0

172.16.0.0/12

192.168.0.0/16

The Autonomous System

•  Collection of routers under one administrative control

•  Single internal routing protocol

•  Identified using an AS Number

AS100

AS200

AS Numbers •  An ASN is a 16-bit number

–  1 through 64511 are assigned by RIRs –  64512 through 65534 are for private use and should

never appear on the Internet –  Numbers ‘0’ and ‘65535’ are reserved –  AS 23456 used to represent 4-byte ASN to routers

unable to handle the new standard •  All major routing platforms now support 32-bit

ASN: http://www.get4byteasn.info/ •  Interesting contrasts between European and US-

based R&E networks

Test 4-Byte ASN 39322

•  Operated by Force10 Networks •  Advertises 64.127.137.0/24

– Accessible via Internet2 CPS – Check the AS-Path on your network!

•  4-Byte ASN can be expressed as: – AS-Plane = 39322 – AS-Dot = 153.154

Thanks to Brent R. Sweeny for advice

ISP Routing Components

•  Interior Gateway Protocol (IGP) •  Internal BGP (iBGP)

– Routes customer prefixes around internal infrastructure

–  Is NOT congruent with physical connectivity •  External BGP (eBGP)

– Prefix interchange with customers – Most routing policy located here

Routing Components Depicted Circuit

IS-IS Adjacency

iBGP Peering

eBGP Peering

A.0

B.0

C.0

D.0

E.0

A.1

B.1

B.2

C.2 C.3

D.3

D.4

A.4 Router A

Router B

Router C

Router D

Router E

Attribute Classes Well-Known Optional

Mandatory Discretionary Transitive Non-Transitive

AS_PATH LOCAL_PREF COMMUNITY MED

NEXT_HOP ATOMIC_ AGGREGATE AGGREGATOR CLUSTER_LIST

ORIGIN ORIGINATOR_ID

MP_REACH_NLRI

MP_UNREACH_NLRI

The AS_PATH Attribute

172.16.0.0/12 200 100

172.16.0.0/12 300 200 100

172.16.0.0/12 100

172.16.0.0/12 400 300 200 100

172.16.0.0/12 500 400 300 200 100

AS100 detects its own AS in the path from AS500 and

ignores the prefix

Which is the ‘Better’ Path?

The NEXT_HOP Attribute

B.1

B.2

A.1

C.2 C.3 D.3

Router NEXT_HOP

SELF = NO SELF = YES

A ― ―

B A.1 A.1

C A.1 B.0

D C.3 C.3

The LOCAL_PREF Attribute

Graphic used with kind permission of Philip Smith, Cisco Systems

Multi-Exit Discriminator

Graphic used with kind permission of Philip Smith, Cisco Systems

MED: Metric Confusion

•  MED is non-transitive and optional – Some implementations send learned MED to

iBGP peers by default and others do not – Some implementations send MEDs to eBGP

peers by default while others do not •  Default metric varies by vendor

– No explicit metric implies 232 -1 or 232 -2 – No explicit metric implies zero

eBGP vs iBGP 1.  Uses AS_PATH for

loop avoidance 2.  NEXT_HOP is

modified 3.  Peers must be

directly connected 4.  MED is reset 5.  LOCAL_PREF is

never advertised

1.  Chinese whispers prohibited

2.  NEXT_HOP is unchanged

3.  Peers not necessarily directly connected

4.  MED is propagated 5.  LOCAL_PREF

always advertised

Internal Peering Topologies

Daisy Chain (Wrong)

Full Mesh (Allowed)

Route Reflector (Allowed)

How Are Prefixes Passed Around?

•  On any given router only the best path for a prefix is passed to other peers

•  Best path learned via eBGP – Advertised to all other eBGP peers – Advertised to iBGP peers

•  Best path learned via iBGP – Advertised onto eBGP peers – Not advertised to other iBGP peers

The Golden Rule

• Never redistribute routes from the IGP into BGP

• Never redistribute routes from BGP into the IGP

Best Route Selection

•  Longest prefix always wins regardless of routing protocol

•  Source of routing information – Connected > Static > eBGP > {IGP} > iBGP

•  BGP ignores received prefixes if – There is no route to the NEXT_HOP – The AS_PATH contains the local AS number – Not synchronized (Cisco IOS only)

BGP Path Vector Algorithm 1.  {Highest WEIGHT} 2.  Highest LOCAL_PREF is Preferred 3.  Shortest AS_PATH 4.  Lowest ORIGIN Code 5.  Lowest MED 6.  Prefer prefix received from eBGP peer over

iBGP peer 7.  Path with lowest metric to NEXT_HOP (aka

Metric2) 8.  Lowest ROUTER_ID 9.  Shortest CLUSTER_LIST 10.  Lowest neighbor IP address

Interior Gateway Protocol (IGP) •  Can be OSPF or IS-IS

–  Prefer IS-IS •  Doesn’t require IP or ANY Layer-III (e.g. CLNS) protocol to work •  Routes IPv6 with no more effort than IPv4

•  Routes ISP infrastructure addresses: –  Router /32 loopback –  Point-to-point backbone /30 or /31 subnets –  (Customer assigned link networks) –  (IXP assigned address for public peerings) –  Other infrastructure addresses such as management networks

•  Simple and invariant configuration compared with an enterprise network –  Adjacencies follow physical backbone connectivity –  Non-backbone interfaces run IGP passively –  Metric

•  Based on route miles •  Can be adjusted for traffic engineering purposes

–  Complex sub-divisions (areas, levels) unnecessary •  Principal job of IGP is determining least-cost path between any two routers

Static iBGP; Dynamic IGP

Within Internet2:

1.  ‘Next Hop’ for customer prefixes in RIB provided by iBGP and does not change if backbone circuit fails.

2.  Entry in FIB provided by IS-IS and depends on backbone connectivity.

Route Reflection

CHIC

HOUS ATLA LOSA NEWY SALT SEAT WASH

KANS

Route Reflection Rules

•  Prefix received from a client – Reflect to all client peers apart from the

sender – Reflect to all non-client peers

•  Prefix received from a non-client peer – Reflect to all client peers

Loop Prevention for RR

•  iBGP speaker receives a prefix with the ORIGINATOR_ID attribute equal to its own ROUTER_ID

•  Route reflector receives a prefix with the CLUSTER_LIST attribute containing its own CLUSTER_ID

Progress of Route Reflection

1. External Prefix advertised to both RR 2. Prefix reflected to all client peers

3. Prefix passed between RR but ignored due to CLUSTER_LIST attribute

iBGP Tracks the IGP Metric

CHIC

KANS LOSA SEAT SALT HOUS ATLA WASH

NEWY

KANS WASH ATLA HOUS SALT SEAT LOSA

BUFF NEWY

278 978 2363 2363 3019 3932 4068

3212 2932 2019 689 1507 1045 905

1000

1000

128.122.0.0/16 128.122.0.0/16

NREN and Internet2

Metric = 2932

Metric = 2019

Possible Layer-II Connectivity

•  Internet2 and NREN routers directly connected

•  Connection through a VLAN on a single peering switch

•  Connection via VLAN that traverses multiple Layer-II and optical devices

Internet2 Connection at NGIX-W

OC192

OC192

OC192

10GE

GE

10GE

10GE

VLAN 166

VLAN 166

Prefer the 10G-Connected Path

•  Internet2 router in Seattle – Associate a high LOCAL_PREF with prefixes

received on external peering with NREN •  Internet2 router in Salt Lake City

– Receives NREN prefixes over iBGP peering with Seattle with high LOCAL_PREF

– Now prefers iBGP to SEAT over eBGP via NGIX-W

Boomerang Prefixes

CHIC

KANS

NEWY

GPN

GLBX

RRMA RRMA

161.46.0.0/16

Boomerang Prefixes 2/2

•  Loop is prevented by AS_PATH attribute •  Suppressing boomerang prefixes:

– Advertise the prefix to the echoing peer – Attach NO_EXPORT community to transiting

peer – Request echoing peer to filter their outbound

(sub-optimal)

Redundant Connections

•  Multiple Peerings with Internet2 – Set the MED on prefixes sent to I2 – Use communities to change the

LOCAL_PREF on I2 – Allow both peerings with I2 to float

•  Advertise prefixes both directly to I2 and through another RON

Use of MED: Internet2 & ESNet

128.175/16 [0]

128.175/16 [278]

128.175/16 [1001]

128.175/16 [1000]

278

1000

905

ESNet Send MEDs

131.225/16 [1]

131.225/16 [0]

131.225/16 [1000]

131.225/16 [905]

1000

905

278

Bidirectional MED Causing Asymmetric Routing

Fermi UPenn UPenn Fermi

What If I2 Doesn’t Send MEDs?

•  ESNet use their IGP Metric –  Traffic remains on ESNet backbone only until it

reaches a router with an I2 peering. •  ESNet use LOCAL_PREF on One Peering

–  A single (congested?) egress from ESNet for all traffic to Internet2-connected destinations

•  Different I2 Prefixes Receive High LOCAL_PREF at Different Locations –  Same effect as MEDs, but implementation far more

manual and complex

Introducing Prefixes Into BGP

1.  Use network statement 1.  With auto-summary disabled 2.  With auto-summary configured

2.  Configure aggregate routing 3.  Use route maps to redistribute

1.  Prefixes learned from an IGP 2.  Static routes

1.1 The network Statement (Cisco) router bgp 87 no auto-summary network 129.79.0.0

! ip route 129.79.0.0 255.255.0.0 Null0 200

1.  There is no mask following the prefix in the ‘network’ statement as we are advertising a classful network

2.  The static route serves two important purposes:

1.  The prefix will not be advertised by BGP unless there is an exact match in the IP routing table

2.  Traffic sent to non-existent IP addresses in the range will be silently dropped. This avoids wasting b/w sending ICMP Unreachables and is a valuable defense against scanners and DDoS attacks.

1.2 Using auto-summary (Cisco)

router bgp 87 auto-summary network 129.79.0.0

1.  The prefix will be advertised by BGP providing there is at least one contained prefix in the IP routing table

2.  This IGP-learned prefix can be any length; it does not have to match the classful network that BGP is being asked to advertise

3.  Use this command:

show ip route 129.79.0.0 255.255.0.0 longer-prefixes

to check whether there is an IGP-learned route. If there are none, then BGP will not advertise the 129.79/16 parent

3.2 Redistribute Static (Juniper) policy-options { policy-statement ORIGINATE { term Seed { from { protocol static; route-filter 129.79.0.0/16 exact; } then accept; } } }

routing-options { static { route 129.79.0.0/16 discard; } }

Aggregation Scenario

Universities

Regional Network

Internet2 or NLR

10.100/16 10.200/16 10.300/16

10/8

Prefix Leaking Problem router bgp 400 network 10.0.0.0 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.1.5 remote-as 200 neighbor 192.168.1.9 remote-as 300 neighbor 192.168.1.13 remote-as 500

! ip route 10.0.0.0 255.0.0.0 Null0 200

Router D

(Faking) Aggregation

•  Use the NO_EXPORT Community – Not the best choice as the problem and its

solution reside in different domains •  Filter outbound prefixes to NLR/Internet2

– A straightforward robust solution •  Deploy route aggregation

– Downstream problems could cause routing flaps on peering with Internet2

Configuring Aggregation (Cisco) router bgp 400 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.1.5 remote-as 200 neighbor 192.168.1.9 remote-as 300 neighbor 192.168.1.13 remote-as 500 aggregate-address 10.0.0.0 255.0.0.0 summary-only

Router D

Configuring Aggregation (JunOS) routing-options { aggregate { route 10.0.0.0/8 discard passive community 65535:65281; } }

policy-options policy-statement ORIGINATE { term AGGREGATE_to_BGP { from protocol aggregate; then accept; } }

protocols bgp group ISP { export ORIGINATE; peer-as 400; neighbor 192.168.1.13 { … } }

Global NOC Recommendation

•  Use a ‘network’ statement to originate ARIN allocations to Internet2

•  Configure a supporting static route •  Disable the ‘auto-summary’ capability •  Filter more specific contained prefixes

using an outbound route-map applied to the peering with Internet2 or NLR

Steering Inbound Traffic (1/2)

129.79/16 [19782 87 I] 129.79/16 [19782 19782 19782 87 I]

[210 11537 19782 87 I]

[19401 19782 19782 19782 87 I]

Steering Inbound Traffic (2/2)

129.79/16 129.79.9/24

LOCAL_PREF = 200

LOCAL_PREF = 100

129.79/16

Global Routing Flap Analysis 1/3 jsg@rtr.ll-re1> show route aspath-regex " .* 47868 .* " table commodity.inet.0

commodity.inet.0: 284903 destinations, 486789 routes (284902 active, 0 holddown, 5 hidden) + = Active Route, - = Last Active, * = Both

94.125.216.0/21 *[BGP/170] 05:29:58, localpref 200, from 149.165.255.64 AS path: 4323 3257 25512 47868 I > to 149.165.254.25 via ge-0/1/0.1, label-switched-path LL640Lo0.0->CTC640lo0.0 to 149.165.254.29 via xe-2/0/0.111, label-switched-path LL640Lo0.0->CTC640lo0.0 [BGP/170] 16:42:14, MED 208, localpref 200 AS path: 1239 3257 3257 25512 47868 I > to 144.228.154.165 via so-1/0/0.0

47868 25512 3257 4323

Global Routing Flap Analysis 2/3

47868 29113 3257

1299

3356

255*[47868]

Global Routing Flap Analysis 3/3

Graphics used with kind permission of Renesys and Earl Zmijewski

Functions Served by Communities

1.  Assign prefixes to pre-defined groups (local significance only)

2.  Control how prefix is advertised by peer 3.  Control your peer’s LOCAL_PREF for

the specific prefix 4.  Signal peer to prepend multiple AS

numbers to AS_PATH 5.  Blackhole all traffic to specific prefix

Expressing A Community

1.  A community is just a 32-bit number 2.  By convention, the most-significant 16 bits

represent an AS number 3.  To convert from ‘new’ to ‘old’ formats

1.  Multiply the ‘high’ 16-bits by 216

2.  Add the ‘low’ 16-bits to the result

JunOS & Cisco ‘New Format’ 11537:260

Cisco ‘Old Format’ 756089092

BGP Communities on Internet2

•  Classify prefixes – Directly connected participants – Sponsored – SEGP

•  Adjust Internet2 LOCAL_PREF •  Request Internet2 to black-hole a prefix •  (Prevent ISP from advertising prefix to

specified upstream peers)

Well-Known Communities

NO-EXPORT 65535:65281 Do not advertise to any eBGP peer

NO-ADVERTISE 65535:65282 Do not advertise to any peer

NO-EXPORT-SUBCONFED 65535:65283 Do not advertise beyond local AS

(confederations only)

NO-PEER 65535:65284 Do not advertise to bi-lateral peer

Using Communities (1/5)

96.4.0.0/15

128.169.0.0/16

192.55.208.0/24

+14048:10 +14048:10

128.169.0.0/16

141.225.0.0/16

192.55.208.0/24

NO-EXPORT Community 172.16.0.0/12

172.16.1.0/24 +NO-EXPORT

172.16.0.0/12

172.16.2.0/24 +NO-EXPORT

172.16.0.0/16

Security Diversion

Null 0

 Routers are optimized for packet forwarding; not packet filtering

 Routing to Null saves valuable CPU cycles

Inbound Packet

Interface

ACL

Routing Table

Customer-Triggered Blackhole Inbound Prefix

11537:911

Prefix > /24

Discard Forward

No

No

Yes

Yes

Static Route

11537:911

Prefix > /24

Redistribute Into BGP

ISP Customer

Customer-Triggered Blackhole (ISP Perspective; Cisco IOS)

interface Null0 no ip unreachables

ip policy-list BLACKHOLE permit match ip address prefix-list 24_TO_32 match community 10 ! ip community-list 10 permit 756089743 ! ! ip prefix-list 24_TO_32 seq 5 permit 0.0.0.0/0 ge 24 ! ! route-map CUSTOMER_IN permit 10 match policy-list BLACKHOLE set community no-export set interface Null0

Customer-Triggered Blackhole (Customer Perspective; JunOS)

routing-options { static { route 192.168.17.15/32 { discard; community 11537:911; } } }

policy-options { policy-statement ORIGINATE { term BLACKHOLE { from { protocol static; route-filter 0.0.0.0/0 prefix-length-range /24-/32; community BLACKHOLE; } then accept; } term … { … } } }

Customer-Side Policy: IOS vs JunOS

•  Juniper – Blackhole prefix statically routed to Discard – Attach a community tag to the static route

•  Cisco IOS – Blackhole prefix statically routed to Null0 – Add the new prefix to the ‘blackhole’ prefix list – Existing route-map

•  Redistributes ‘blackhole’ prefix list into BGP •  Attaches the correct community

Recommended Routing Policy

•  Should be implemented – Reject any prefix with a private AS in the

AS_PATH – Reject bogon prefixes (following slide)

•  Consider implementing – Assign higher LOCAL_PREF to Internet2 or

NLR prefixes than to commodity. – Max prefixes limit on some peers

LOCAL_PREF Gotcha

149.165.240.64/26

149.165.128.0/17 149.165.128.0/17

LOCAL_PREF = 200

Indiana Gigapop

Bogon Prefixes Prefix Reason RFC

0.0.0.0/0 Default

10.0.0.0/8

Private 1918 172.16.0.0/12

192.168.0.0/16

127.0.0.0/8 Loopback

169.254.0.0/8 Link Local 3330

192.0.2.0/24 IANA Reserved

192.88.99.1/32 6 to 4 relay

198.18.0.0/15 Network device benchmarking 2544

224.0.0.0/4 Multicast group addresses 3171

240.0.0.0/4 Class E addresses

BGP Messages

Type Description References

1 Open RFC 4271

2 Update RFC 4271

3 Notification RFC 4271

4 Keepalive RFC 4271

5 Route-Refresh RFC 2918

The BGP Update Message

Type

Length

Value

Length

Prefix

Length

Prefix Withdrawn Routes

Total Path Attributes Length

Path Attributes

NLRI

× M

× N

× P

BGP Header

Data

Unfeasible Routes Length

N >= M

BGP State Machine

Idle

Open-Sent

Established

Connect

Open-Confirm

Active

BGP Convergence Scenarios

Both BGP processes immediately transition from ‘Established’ to ‘Active’

The router connected via the unaffected circuit blackholes traffic for up to 90 seconds

Anatomy of Brief Outage

1.  Link between ESNet and MANLAN Goes Down 2.  ESNet router sends NOTIFICATION which is not received 3.  Peering on Internet2 remains Established even though ESNet side

is Down 4.  Link between ESNet and MANLAN is restored before KEEPALIVE

timer expires on Internet2 5.  ESNet router negotiates new TCP virtual circuit with Internet2

router 6.  The peering on Internet2 resets

Source-Specific Multicast

S G

G R

(S,G) Join

(S,G) Join

IGMPv3 Join

Multicast Routing

Unicast IPv4 Prefixes

Unicast and Multicast IPv4 Prefixes

Unicast traffic AS400 <> AS200

Multicast traffic AS400 <> AS200