BGPBorder Gateway Protocol (an introduction)
dr. C. P. J. Koymans
Informatics InstituteUniversity of Amsterdam
March 11, 2008
General ideas behind BGPBackgroundProviders, Customers and PeersExternal and Internal BGPBGP information bases
The BGP protocolBGP attributesBGP packets
Traffic EngineeringOutbound Traffic EngineeringInbound Traffic Engineering
IBGP scaling
BGP version 4
I Border Gateway Protocol version 4 (BGP4)
I is specified in RFC 4271
I is an inter-AS routing protocol
I “monopolises” the InternetI uses path vector routing
I which is inbetween distance vector and link state
I uses (often non-coordinated) policy based routing
I which is a nuisance for convergence
Autonomous system (AS)
I An Autonomous System or AS is
I a connected group of networks and routers,
I representing some assigned set of IP prefixes,
I having a single, consistent routing policy,
I both internally and externally
Autonomous system illustration
Autonomous Systems
3
Slide courtesy Iljitsch van Beijnum
Autonomous system illustration
Autonomous Systems
3
Slide courtesy Iljitsch van Beijnum
Autonomous system illustration
Autonomous Systems
3
Slide courtesy Iljitsch van Beijnum
Autonomous system illustration
Autonomous Systems
AS2503 AS192
AS29077
3
Slide courtesy Iljitsch van Beijnum
Providers, Customers and Peers
Customers and Providers
Customer pays provider for access to the Internet
provider
customer
IP trafficprovider customer
Slide courtesy Timothy Griffin
Providers, Customers and Peers
The “Peering” Relationship
peer peer
customerprovider
Peers provide transit between their respective customers
Peers do not provide transit between peers
Peers (often) do not exchange $$$trafficallowed
traffic NOTallowed
Slide courtesy Timothy Griffin
Providers, Customers and Peers
Peering Provides Shortcuts
Peering also allows connectivity betweenthe customers of “Tier 1” providers.
peer peer
customerprovider
Slide courtesy Timothy Griffin
Providers, Customers and Peers
AS Graph != Internet Topology
The AS graphmay look like this. Reality may be closer to this…
BGP was designed to throw away information!
Slide courtesy Timothy Griffin
Providers, Customers and Peers Treatment
I The order of preference for a route is
I Customers have highest preference
I Peers have the next highest preference
I Providers have the lowest preference
I Transit relationships are enforced by export filtering
I Do not advertise provider or peer routes
to other providers or peers
I Do advertise all routes to cutomers
I Do advertise customer routes to providers and peers
Providers, Customers and Peers: Import and Export
Import Routes
Frompeer
Frompeer
Fromprovider
Fromprovider
From customer
From customer
provider route customer routepeer route ISP route
Slide courtesy Timothy Griffin
Providers, Customers and Peers: Import and Export
Export Routes
Topeer
Topeer
Tocustomer
Tocustomer
Toprovider
From provider
provider route customer routepeer route ISP route
filtersblock
Slide courtesy Timothy Griffin
EBGP and IBGP (1)
I External BGP (EBGP)
I is used for BGP neighbors between different AS’s
I to exchange prefixes
I and to implement policies
I Internal BGP (IBGP)
I is used for BGP neighbors within only one AS
I to distribute Internet prefixes across the backbone
in order to create a consistent view
among all entry/exit points
I to originate local (customer) prefixes
EBGP and IBGP (2)
I Routes imported from one IBGP peer
are not distributed to another IBGP peer
I This prevents possible routing loops
I Loop detection is based on duplicates in AS paths,
which is detected by EBGP between different AS’s
I Requires IBGP peers to be configured as a full mesh
Routing Information Bases (RIBs)
I Adj-RIB-In (one per peer)
I Routes after input filtering
I Loc-RIB (one globally)
I Routes after best path selection
I Adj-RIB-Out (one per peer)
I Routes after output filtering
BGP protocol
I Uses TCP over port 179
I Exchanges NLRI
I Network Layer Reachability Information
I Prefixes that can or can no longer be reached through the
router
I Accompanied by BGP attributes
Some important BGP attributes
I In order of path selection importance
I LOCAL_PREF (Local Preference)
I AS_PATH
I ORIGIN
I MULTI_EXIT_DISC (MED; Multi-exit discriminator)
I And further...
I NEXT_HOP
I which must be reachable (directly or via IGP)
I except in the case of multi-hop BGP
Interaction betweed BGP and IGP
53
BGP Next Hop Attribute
Every time a route announcement crosses an AS boundary, the Next Hop attribute is changed to the IP address of the border router that announced the route.
AS 6431AT&T Research
135.207.0.0/16Next Hop = 12.125.133.90
AS 7018AT&T
AS 12654RIPE NCCRIS project
12.125.133.90
135.207.0.0/16Next Hop = 12.127.0.121
12.127.0.121
Slide courtesy Timothy Griffin
Interaction betweed BGP and IGP
Forwarding Table
Forwarding Table
Join EGP with IGP For Connectivity
AS 1 AS 2192.0.2.1
135.207.0.0/16
10.10.10.10
EGP
192.0.2.1135.207.0.0/16
destination next hop
10.10.10.10192.0.2.0/30
destination next hop
135.207.0.0/16Next Hop = 192.0.2.1
192.0.2.0/30
135.207.0.0/16
destination next hop
10.10.10.10
+
192.0.2.0/30 10.10.10.10
Slide courtesy Timothy Griffin
Route selection
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
Slide courtesy Timothy Griffin
Route selection
52
BGP Route Processing
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwardingEntries for bestRoutes.
ReceiveBGPUpdates
BestRoutes
TransmitBGP Updates
Apply Policy =filter routes & tweak attributes
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
Open ended programming.Constrained only by vendor configuration language
Slide courtesy Timothy Griffin
BGP attribute types
I Well-known mandatory
I ORIGIN, AS_PATH, NEXT_HOP
I Well-known discretionary
I LOCAL_PREF, ATOMIC_AGGREGATE
I Optional transitive
I COMMUNITIES, AGGREGATOR
I Optional non-transitive
I MULTI_EXIT_DISC
LOCAL_PREF (Local Preference)
I Advertised within a single AS (via IBGP)
I Used to implement local policies
I Can depend on any locally available information,
possibly learned outside BGP
I Default value is 100
I Highest value wins
AS_PATH
I Sequence of AS’s (or sets of AS’s)
I Used for loop detection
I Shortest path wins
I Prepend own AS (possibly multiple times) in EBGP updates
I Leave unchanged in IBGP updates
Examples of AS_PATHs
64
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
Slide courtesy Timothy Griffin
Examples of AS_PATHs
In fairness: could you do this “right” and still scale?
Exporting internalstate would dramatically increase global instability and amount of routingstate
Shorter Doesn’t Always Mean Shorter
AS 4
AS 3
AS 2
AS 1
Mr. BGP says that path 4 1 is better than path 3 2 1
Duh!
Slide courtesy Timothy Griffin
Examples of AS_PATHs
66
Interdomain Loop Prevention
BGP at AS YYY will never accept a route with ASPATH containing YYY.
AS 7018
12.22.0.0/16ASPATH = 1 333 7018 877
Don’t Accept!
AS 1
Slide courtesy Timothy Griffin
Examples of AS_PATHs
Traffic Often Follows ASPATH
AS 4AS 3AS 2AS 1135.207.0.0/16
135.207.0.0/16ASPATH = 3 2 1
IP Packet Dest =135.207.44.66
Slide courtesy Timothy Griffin
Examples of AS_PATHs
… But It Might Not
AS 4AS 3AS 2AS 1135.207.0.0/16
135.207.0.0/16ASPATH = 3 2 1
IP Packet Dest =135.207.44.66
AS 5
135.207.44.0/25ASPATH = 5
135.207.44.0/25
AS 2 filters allsubnets with maskslonger than /24
135.207.0.0/16ASPATH = 1
From AS 4, it may look like thispacket will take path 3 2 1, but it actually takespath 3 2 5
Slide courtesy Timothy Griffin
ORIGIN
I The ORIGIN attribute tells where the route (NLRI) originated
I Interior to the originating AS: ORIGIN = 0
I Via the EGP protocol (historic): ORIGIN = 1
I Via some other means: ORIGIN = 2
I A lower ORIGIN wins
MULTI_EXIT_DISC (Multi-Exit Discriminator or MED)
I The MED (or metric, formerly INTER_AS_METRIC) is
meant
to be advertised between neighboring AS’s (via EBGP)
I Some implementations carry MED on by IBGP
(hot potato versus cold potato)
I The MED is non-transitive (is not transferred into a third AS)
I A lower MED wins
I The default MED is 0 (lowest possible value)
I Some implementations choose the highest possible value
BGP packet header
0 15 16 23 24 31
Marker
Length Type
Remember that BGP “packets” are in fact part of a TCP-stream
BGP header fields
BGP header fields
Marker All 1’s (compatibility)
Length Total length
no padding, including header
Type 1: OPEN
2: UPDATE
3: NOTIFICATION
4: KEEPALIVE
BGP OPEN message
0 7 8 15 16 31
Version
My Autonomous System
Hold Time
BGP Identifier
Opt Parm Len
Optional Parametershhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
(variable)
OPEN message fields
OPEN message fields
Version 4
My Autonomous System Sender’s AS
Hold Time Liveness detection
BGP Identifier Sender’s identifying IP address
Opt Parm Length Length of parameter field
Optional Parameters TLV-encoded options
One interesting parameter is the Capabilities Optional Parameter,which defines (among others) the Route Refresh Capability.
BGP KEEPALIVE message
This page intentionally left blank.http://www.this-page-intentionally-left-blank.org/
KEEPALIVE message fields
KEEPALIVE message fields
:)
BGP NOTIFICATION message
0 7 8 15 16 31
Error code Error subcode
Datahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
(variable)
NOTIFICATION message fields
NOTIFICATION message fields
Error code 1: Message Header Error
2: OPEN Error
3: UPDATE Error
4: Hold Timer Expired
. . .
Error subcode Depends on error code
Data Depends on error code and subcode
BGP UPDATE message
0 15 16 31
Unfeasible Routes Length
Withdrawn Routes(variable length)
Total Path Attribute Length
Path Attributes(variable length)
Network Layer Reachability Information(variable length)
UPDATE message fields
UPDATE message fields
Unfeasible Routes Length Length of Withdrawn Routes
Withdrawn Routes List of prefixes1
Total Path Attribute Length Length of Path Attributes
Path Attributes TLV-encoded attributes
Network Layer Reachability Information List of NLRI prefixes
1A prefix is specified by its length and just enough bytes of
the network IP address to cover this length
Tweaking your policies
Tweak Tweak Tweak
• For inbound traffic– Filter outbound routes– Tweak attributes on
outbound routes in the hope of influencing your neighbor’s best route selection
• For outbound traffic– Filter inbound routes– Tweak attributes on
inbound routes to influence best route selection
outboundroutes
inboundroutes
inboundtraffic
outboundtraffic
In general, an AS has morecontrol over outbound traffic
Slide courtesy Timothy Griffin
Outbound Traffic Engineering
I This works by manipulating incoming routes
I Changing local preference
I Extending inbound AS paths
I Manipulating the metric (MED), for instance
by using inbound communities
I It is relatively simple (and based on your own policy)
Manipulating local preference
60
So Many Choices
Which route shouldFrank pick to 13.13.0.0./16?
AS 1
AS 2
AS 4
AS 3
13.13.0.0/16
Frank’s Internet Barn
peer peer
customerprovider
Slide courtesy Timothy Griffin
Manipulating local preference
61
LOCAL PREFERENCE
AS 1AS 2
AS 4
AS 3
13.13.0.0/16
local pref = 80
local pref = 100
local pref = 90
Higher Localpreference valuesare more preferred
Local preference used ONLY in iBGP
Slide courtesy Timothy Griffin
Manipulating local preference
70
Implementing Backup Links with Local Preference (Outbound
Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1 AS 65000
Set Local Pref = 50for all routes from AS 1
We’ll talk about inbound traffic soon …
Slide courtesy Timothy Griffin
Manipulating local preference
71
Multihomed Backups (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes from AS 1
AS 2
Set Local Pref = 50for all routes from AS 3
AS 3provider provider
Slide courtesy Timothy Griffin
Inbound Traffic Engineering
I This works by manipulating outgoing routes
I Extending outbound AS_PATHs is a traditional hack
I Manipulating the metric (MED) is the traditional way
I Setting outbound communities is the more modern approach,
where agreements with your neighbors are specified
I Inbound is more complex than outbound
I Inbound depends on neighbor’s policy
I Last resort method: announcing more specific routes
(often a bad idea)
Manipulating AS_PATHs
72
Shedding Inbound Traffic with ASPATH Padding. Yes, this is a
Glorious Hack …
Padding will (usually) force inbound traffic from AS 1to take primary link
AS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
Slide courtesy Timothy Griffin
Manipulating AS_PATHs
73
… But Padding Does Not Always Work
AS 1
192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!
Padding in this way is oftenused as a form of loadbalancing
backupprimary
Slide courtesy Timothy Griffin
Manipulating AS_PATHs
74
COMMUNITY Attribute to the Rescue!
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
Slide courtesy Timothy Griffin
Manipulating MEDs
75
Hot Potato Routing: Go for the Closest Egress Point
192.44.78.0/24
15 56 IGP distances
egress 1 egress 2
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!
Slide courtesy Timothy Griffin
Manipulating MEDs
76
Getting Burned by the Hot Potato
15 56
172865High bandwidth
Provider backbone
Low bandwidthcustomer backbone
Heavy Content Web Farm
Many customers want their provider to carry the bits!
tiny http requesthuge http reply
SFF NYC
San Diego
Slide courtesy Timothy Griffin
Manipulating MEDs
77
Cold Potato Routing with MEDs(Multi-Exit Discriminator Attribute)
15 56
172865
Heavy Content Web Farm
192.44.78.0/24
192.44.78.0/24MED = 15
192.44.78.0/24MED = 56
This means that MEDs must be considered BEFOREIGP distance!
Prefer lower MED values
Note1 : some providers will not listen to MEDs Note2 : MEDs need not be tied to IGP distance
Slide courtesy Timothy Griffin
COMMUNITIES
I An optional transitive attribute
I A community can be used to communicate
preferred treatment of a route
I Some communities have a well-known semantics
I NO_EXPORT: don’t export beyond current AS (or
confederation)
I NO_ADVERTISE: don’t export at all
I NO_EXPORT_SUBCONFED: don’t export via EBGP
Use of communities
58
How Can Routes be Colored?BGP Communities!
A community value is 32 bits
By convention, first 16 bits is ASN indicating who is giving itan interpretation
communitynumber
Very powerful BECAUSE it has no (predefined) meaning
Community Attribute = a list of community values.(So one route can belong to multiple communities)
RFC 1997 (August 1996)
Used for signallywithin and betweenASes
Two reserved communities
no_advertise 0xFFFFFF02: don’t pass to BGP neighbors
no_export = 0xFFFFFF01: don’t export out of AS
Slide courtesy Timothy Griffin
Use of communities
Communities Example
• 1:100– Customer routes
• 1:200– Peer routes
• 1:300– Provider Routes
• To Customers– 1:100, 1:200, 1:300
• To Peers– 1:100
• To Providers– 1:100
AS 1
Import Export
Slide courtesy Timothy Griffin
Route Reflectors
I A route reflector is a kind of “super” IBGP peer
I A route reflector has clients with which it peers via IBGP
and for which it reflects (transitively) routes
I A route reflector is part of a full mesh of
other route reflectors and non-clients
Route reflectors illustration
Full Mesh
39
Slide courtesy Iljitsch van Beijnum
Route reflectors illustration
Route Reflection
40
Slide courtesy Iljitsch van Beijnum
Confederations
I Use multiple private AS’s inside your main AS
I Talk to the outside world with your main AS,
hiding the private AS’s
I Talk to the inside world as if using EBGP and IBGP
for the different private AS’s
I This needs special AS_PATH segment types
Top Related