Internet Routing (COS Internet Routing (COS 598A)598A)
Today: BGP Routing Table SizeToday: BGP Routing Table Size
Jennifer RexfordJennifer Rexford
http://www.cs.princeton.edu/~jrex/teaching/http://www.cs.princeton.edu/~jrex/teaching/spring2005spring2005
Tuesdays/Thursdays 11:00am-12:20pmTuesdays/Thursdays 11:00am-12:20pm
Outline
• IP prefixes– Review of CIDR and hierarchical allocation– Resource constraints on IP routers– Impact of increasing number of prefixes
• Growth in BGP routing table size– Growth of global prefixes over time– Characterizing the causes of growth
• Limiting the number of prefixes– Techniques for limiting the size– Fundamental challenges of limiting size
04/21/23
Classless InterDomain Routing (CIDR)
IP Address : 12.4.0.0 IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Address
Mask
for hosts Network Prefix
Use two 32-bit numbers to represent a network. Network number = IP address + Mask
Usually written as 12.4.0.0/15
04/21/23
Hierarchy in Allocating Address Blocks
12.0.0.0/8
12.0.0.0/16
12.254.0.0/16
12.1.0.0/1612.2.0.0/1612.3.0.0/16
:::
12.253.0.0/16
12.3.0.0/2412.3.1.0/24
::
12.3.254.0/24
12.253.0.0/1912.253.32.0/1912.253.64.0/1912.253.96.0/1912.253.128.0/1912.253.160.0/1912.253.192.0/19
:::
• Prefixes are key to Internet scalability– Address allocation by ARIN/RIPE/APNIC and by ISPs– Routing protocols and packet forwarding based on
prefixes– Today, routing tables contain ~150,000-200,000
prefixes
Resource Constraints on a High-End Router
SwitchingFabric
Processor
Line card
Line card
Line card
Line card
Line card
Line card
Store routing table and process routing protocol messages
Store forwarding table and forward
data packets
Routing Information Base (RIB)
• Routing table for the routing protocol– E.g., BGP routes learned from each neighbor– Typically managed in software in router CPU
• Factors affecting RIB size– Number of destination prefixes– Number of BGP routes per prefix– Size of each route (e.g., BGP attributes)
• Impact of a large RIB– Higher delay to index or scan the table– Ungraceful reaction to table overflow
Ungraceful Overload Behavior in BGP
• BGP is an incremental protocol– Announcement when new route available– Withdrawal when route no longer available– No messages when nothing is changing
• Cannot discard or delete state– … because you won’t receive the message
again– When table is full, router must drop session(s)
• Router reaction in practice may be worse– E.g., drop all BGP sessions and reestablish– E.g., interface lock-up till router is rebooted– Reactions place heavy BGP load on neighbors
Forwarding Information Base
• Forwarding tables in IP routers– Maps each IP prefix to next-hop link(s)– Longest prefix match look-up for data packets– Hardware on line card in high-end routers
• Impact of a large FIB– Higher delay to construct/update the table– Higher delay for packet lookup– Incomplete table or router crash on overflow
4.0.0.0/84.83.128.0/1712.0.0.0/812.34.158.0/24126.255.103.0/24
12.34.158.5destination
forwarding table (FIB)
Serial0/0.1outgoing link
Impact of Table Size: Message Overhead
• More BGP update messages– More prefixes means more update
messages– … and more bandwidth and CPU
consumption– … and longer delays for bringing up a
session
• More BGP route flapping– More likely to have one or more flapping
prefixes– … which consumes even more resources– … and makes the routing system less stable
Growth in BGP Routing Table Size
http://www.cisco.com/en/US/about/ac123/ac147/ac174/ac176/about_cisco_ipj_archive_article09186a00800c83cc.html
http://www.cs.princeton.edu/~jrex/teaching/spring2005/reading/bu02.pdf
Pre-CIDR (1988-1994): Steep Growth Rate
Growth faster than improvements in equipment capability
CIDR Deployment (1994-1996): Much Flatter
Efforts to aggregate (even decreases after IETF meetings!)
CIDR Growth (1996-1998): Roughly Linear
Good use of aggregation, and peer pressure in CIDR report
Boom Period (1998-2001): Steep Growth
Internet boom and increased multi-homing
Long-Term View (1989-2005): Post-Boom
Cause of Growth #1: Multi-Homing
• Connecting to multiple providers– All providers must advertise the prefix– Hole-punching: subnet contained in a
supernet
• Detecting hole-punching– Stub AS connects to two or more ASes– Prefix is contained in one provider’s supernet
ISP #1 ISP #2
Stub
12.1.1.0/24
12.0.0.0/812.1.1.0/24
3.0.0.0/812.1.1.0/24
Cause of Growth #2: Failure to Aggregate
• Prefixes could be coalesced– Advertised exactly the same way– Adjacent prefixes or subnet/supernet relationship
• Detecting failure to aggregate– Prefixes with same attributes in set of BGP tables– Could be reduced to fewer prefixes by combining
ISP #1 ISP #2
Stub
12.1.1.0/24
12.0.0.0/812.1.1.0/24
12.1.2.0/2412.1.3.0/24
Stub
12.1.2.0/24
Stub
12.1.3.0/24
12.1.2.0/23
Cause of Growth #3: Load Balancing
• Larger block sub-divided for more control– Advertise multiple subnets of a larger prefix– Treat differently to influence incoming traffic
• Detecting load balancing– Prefixes originated by the same AS– Could be collapsed (e.g., contiguous or
contained)– … but, have different attributes, such as AS path
ISP #1 ISP #2
Stub12.1.2.0/2312.1.2.0/24
12.1.2.0/2312.1.3.0/24
Cause of Growth #4: Address Fragmentation
• Different parts of the address space– Distinct address blocks allocated to same AS– Must be advertised separately in BGP
• Detecting address fragmentation– Prefixes announced the same way by same
AS– Cannot be collapsed into fewer prefixes
ISP #1
Stub
18.8.0.0/1612.1.1.0/24
Significance of the Four Causes
• Overall contribution– Address fragmentation is the most significant– The other three causes are all important as
well
• Growth over time– Increasing multi-homing– Increasing load balancing
• Architectural implications– Exploit commonality across non-contiguous
address blocks?– Multi-homing without hole-punching?– Load balancing without de-aggregating?
Transient Growth in Table Size: Routing Leaks
Transient spike due to neighbor’s BGP mistake
Techniques for Limiting Table Size
Hierarchical Address Allocation
• Regional Internet Registries– Allocate large address blocks to ISPs– Publish guidelines for minimum block sizes
• ARIN: in 63.0.0.0/8, no mask lengths more than /19• APNIC: in 211.0.0.0/8, no mask lengths more
than /23
• Internet Service Providers– Allocate smaller blocks to customers
• Reclaim address blocks when customers leave
– Hierarchical address allocation inside the ISP• Advertise subnets only when necessary• Customer-owned addresses and multi-homing
Hierarchical Allocation: Only One Router Knows
Stub
12.0.0.0/8
Stub
12.1.0.0/16
12.1.2.0/24 12.1.5.0/24
• Three-level hierarchy– ISP as a whole: 12.0.0.0/8– Edge router in ISP: 12.1.0.0/16– Customer at edge router: 12.1.2.0/24,
12.1.5.0/24Only this router
needs to know the small /24 blocks
Hierarchical Allocation: Only the ISP Knows
Stub
12.0.0.0/8
12.1.5.0/24
• Customer connecting in multiple places– All routers in the ISP need to know the subnet– Otherwise they can’t reach all egress points
– But the rest of the Internet doesn’t need to know
12.1.0.0/16
Hierarchical Allocation: Must Advertise
Stub
12.0.0.0/8
Stub
12.1.0.0/16
78.34.0.0/16 12.1.5.0/24
Another ISP
• Sometimes have to advertise the subnet– Customer doesn’t fall in ISP’s address block– Customer connects to multiple providers
Filtering Small Subnets on BGP Sessions
• Small address blocks– Larger mask than RIR guidelines
• E.g., filter /20 and longer in 63.0.0.0/8
– Or, all prefixes with mask longer than /24
• Trade-off on aggressive filtering– Don’t filter aggressively
• Risk of exceeding memory limits on the router
– Filter aggressively• Risk of disconnecting some parts of the Internet• Risk of thwarting stub ASes trying to load-balance
• Who should pay to store the small subnets???
Prefix Limits to Protect Against Route Leaks
• Vulnerability to other ASes– Sending many small subnets– Exporting address space they shouldn’t
• Filtering policies may not be enough– E.g., all /24s is still 224 prefixes is still a lot
• Max-prefix limit on BGP session– Per-session configurable limit on # of
prefixes– Tear down the session if number exceeded– Not great, but better than exceeding the
memory
Fundamental Problems: Not Easily Automated
• Dependence on “side information”– Customer prefix falls in provider’s address
space?– Customer connects to ISP in multiple places?– Customer connects to multiple providers?
• Auto-combining is hard in distributed system– Safe to combine 12.1.2.0/24 and 12.1.3.0/24???– Depends on whether other ASes need the details
12.1.2.0/24
12.1.3.0/24
seems safenot safe
Optimization: Reducing Forwarding Table Size
• Local FIB minimization– Router locally minimizes size of forwarding
table– E.g., purple router has FIB entry for 12.1.2.0/23– … while still keeping both subnets in BGP table
– But, the size of the RIB may still be an issue
12.1.2.0/24
12.1.3.0/24
Architectural Idea: Reducing BGP Table Size
• Separating BGP propagation from the routers– Exchange BGP updates via separate servers– Servers tell routers only the BGP routes they
need– … yet still propagate full details to neighbors
– We’ll return to this idea in the coming weeks
12.1.2.0/24
12.1.3.0/24
12.1.2.0/2312.1.2.0/2412.1.3.0/24
BGP BGP
Conclusions
• Scalability limitations– Resource constraints on routers– … impose limits on number of prefixes
• Growth in the number of prefixes– Historical trends toward increasing table size– Multi-homing, failure to aggregate, load
balancing, and address fragmentation
• Approaches to limiting growth– Hierarchical address allocation– Careful scoping of BGP route advertisements– Explicit minimization of FIB and RIB sizes
Next Time: Large Topologies
• Two papers– “Hierarchical routing for large networks:
Performance evaluation and optimization”– “BGP route reflection: An alternative to full
mesh IBGP”• Review only of first paper
– Summary– Why accept– Why reject– Avenues for future work
• Optional reading– Fun 1928 article “On Being the Right Size”
Top Related