What’s so special about the number 512?
Geoff Huston Chief Scientist, APNIC Labs
1 September 2014
12 August 2014
Newborn Panda Triplets in China
Rosetta closes in on comet 67P/Churyumov-Gerasimenko
Violence continues in the Gaza Strip
World Elephant Day
The Internet apparently has a bad hair day
What happened?
Did we all sneeze at once and cause the routing system to fail?
12 August 2014 BGP update log for 12 August
12 August 2014
Verizon Route Leak (AS701)
Total IPv4 FIB size passes 512K
BGP FIB Size
12 August 2014
h5p://www.cisco.com/c/en/us/support/docs/switches/catalyst-‐6500-‐series-‐switches/116132-‐problem-‐catalyst6500-‐00.html
h5p://www.brocade.com/downloads/documents/html_product_manuals/NI_05600_ADMIN/wwhelp/wwhimpl/common/html/wwhelp.htm#context=Admin_Guide&file=CAM_part.11.2.html
Cisco Cat 6500
Brocade NetIron XMR
512K is a default constant in some Cisco and Brocade products
Routing Behaviour
Was the AS701 Route Leak the problem? Or was the FIB growth passing 512K entries the problem? What does routing growth look like anyway?
1994: Introduction of CIDR
2001: The Great Internet Boom and Bust
2005: Broadband to the Masses
2009: The GFC hits the Internet
2011: Address Exhaustion
20 years of routing the Internet
IPv4 BGP Prefix Count 2011 - 2014
Jan 2011
Jan 2012
350,000
450,000
550,000
Jan 2013
Jan 2014
Jan 2015
IPv4 2013- 2014 BGP Vital Statistics
Jan-‐13 Aug-‐14 Prefix Count 440,000 512,000 + 11% p.a. Roots 216,000 249,000 + 9% More Specifics 224,000 264,000 + 11% Address Span 156/8s 162/8s + 2% AS Count 43,000 48,000 + 7% Transit 6,100 7,000 + 9% Stub 36,900 41,000 + 7%
IPv4 in 2014 – Growth is Slowing (slightly)
• Overall IPv4 Internet growth in terms of BGP is at a rate of some ~9%-‐10% p.a.
• Address span growing far more slowly than the table size (although the LACNIC runout in May caused a visible blip in the address rate)
• The rate of growth of the IPv4 Internet is slowing down (slightly) – Address shortages? – Masking by NAT deployments? – Saturadon of cridcal market sectors?
IPv6 BGP Prefix Count
2,000
10,000
20,000
Jan 2011
Jan 2012
Jan 2013
Jan 2014
0
World IPv6 Day
IPv6 2013-2014 BGP Vital Statistics
Jan-‐13 Aug-‐14 p.a. rate Prefix Count 11,500 19,036 + 39% Roots 8,451 12,998 + 32% More Specifics 3,049 6,038 + 59% Address Span (/32s) 65,127 73,153 + 7% AS Count 6,560 8,684 + 19% Transit 1,260 1,676 + 20% Stub 5,300 7,008 + 19%
IPv6 in 2013
• Overall IPv6 Internet growth in terms of BGP is 20% -‐ 40 % p.a. – 2012 growth rate was ~ 90%.
(Looking at the AS count, if these reladve growth rates persist then the IPv6 network would span the same network domain as IPv4 in ~16 years dme -‐-‐ 2030!)
IPv6 in 2013 – Growth is Slowing?
• Overall Internet growth in terms of BGP is at a rate of some ~20-‐40% p.a.
• AS growth sub-‐linear • The rate of growth of the IPv6 Internet is also slowing down – Lack of cridcal momentum behind IPv6? – Saturadon of cridcal market sectors by IPv4? – <some other factor>?
What to expect
BGP Size Projections
• Generate a projecdon of the IPv4 roudng table using a quadradc (O(2) polynomial) over the historic data – For IPv4 this is a dme of extreme uncertainty
• Registry IPv4 address run out • Uncertainty over the impacts of any amer-‐market in IPv4 on the roudng table
which makes this projecdon even more speculadve than normal!
IPv4 Table Size
2004 2010 0
600,000
2006 2012 2014
200,000
400,000
Linear: Y = -‐169638 + (t * 1517.584) Quadradc: Y = -‐4973214 + (6484.294 * t) + (-‐1.837814* 10-‐12 * t2) Exponendal: Y = 10**(3.540416 + (year * 1.547426e-‐09))
2008
V4 - Daily Growth Rates
V4 - Relative Daily Growth Rates
V4 - Relative Daily Growth Rates
IPv4 BGP Table Size predicdons Jan 2013 441,172 entries 2014 488,011 entries 2015 540,000 entries 559,000 2016 590,000 entries 630,000 2017 640,000 entries 710,000 2018 690,000 entries 801,000 2019 740,000 entries 902,000
These numbers are dubious due to uncertain=es introduced by IPv4 address exhaus=on pressures.
Linear Model Exponendal Model
IPv6 Table Size
2004 2010 0
15,000
20,000
2008 2012 2014
10,000
25,000
5,000
2006
V6 -‐ Daily Growth Rates
V6 -‐ Reladve Growth Rates
V6 -‐ Reladve Growth Rates
V6 -‐ Reladve Growth Rates
IPv6 BGP Table Size predicdons
Jan 2013 11,600 entries 2014 16,200 entries 2015 24,600 entries 19,000 2016 36,400 entries 23,000 2017 54,000 entries 27,000 2018 80,000 entries 30,000 2019 119,000 entries 35,000
Exponendal Model LinearModel
Up and to the Right
• Most Internet curves are “up and to the right” • But what makes this curve painful?
– The pain threshold is approximated by Moore’s Law
Moore’s Law
BGP Table Size Predictions
IPv4 BGP Table size and Moore’s Law
IPv6 Projections and Moore’s Law
Moore’s Law
BGP Table Size Predictions
BGP Table Growth
• Nothing in these figures suggests that there is cause for urgent alarm -‐-‐ at present
• The overall eBGP growth rates for IPv4 are holding at a modest level, and the IPv6 table, although it is growing rapidly, is sdll reladvely small in size in absolute terms
• As long as we are prepared to live within the technical constraints of the current roudng paradigm it will condnue to be viable for some dme yet
Table Size vs Updates
BGP Updates
• What about the level of updates in BGP? • Let’s look at the update load from a single eBGP feed in a DFZ context
Announcements and Withdrawals
Convergence Performance
IPv4 Average AS Path Length
Data from Route Views
Updates in IPv4 BGP
Nothing in these figures is cause for any great level of concern …
– The number of updates per instability event has been constant, due to the damping effect of the MRAI interval, and the reladvely constant AS Path length over this interval
What about IPv6?
V6 Announcements and Withdrawals
V6 Convergence Performance
Data from Route Views
V6 Average AS Path Length
BGP Convergence
• The long term average convergence dme for the IPv4 BGP network is some 70 seconds, or 2.3 updates given a 30 second MRAI dmer
• The long term average convergence dme for the IPv6 BGP network is some 80 seconds, or 2.6 updates
Problem? Not a Problem?
It’s evident that the global BGP roudng environment suffers from a certain amount of neglect and ina5endon But whether this is a problem or not depends on the way in which routers handle the roudng table. So lets take a quick look at routers…
Inside a router
Line Interface Card
Switch Fabric Card
Management Card
Thanks to Greg Hankins
Inside a line card
CPU
PHY Network Packet
Manager
DRAM TCAM *DRAM
Media
Backplane
FIB Lookup Bank
Packet Buffer
Thanks to Greg Hankins
Inside a line card
CPU
PHY Network Packet
Manager
DRAM TCAM *DRAM
Media
Backplane
FIB Lookup Bank
Packet Buffer
Thanks to Greg Hankins
FIB Lookup Memory
The interface card’s network processor passes the packet’s desdnadon address to the FIB module. The FIB module returns with an outbound interface index
FIB Lookup
This can be achieved by:
– Loading the endre roudng table into a Ternary Content Addressable Memory bank (TCAM)
or – Using an ASIC implementadon of a TRIE representadon of the roudng table with DRAM memory to hold the roudng table
Either way, this needs fast memory
TCAM Memory Address
Outbound Interface identifier
192.0.2.1
I/F 3/1
192.0.0.0/16 11000000 00000000 xxxxxxxx xxxxxxxx 3/0 192.0.2.0/24 11000000 00000000 00000010 xxxxxxxx 3/1
11000000 00000000 00000010 00000001
Longest Match
The endre FIB is loaded into TCAM. Every desdnadon address is passed through the TCAM, and within one TCAM cycle the TCAM returns the interface index of the longest match. Each TCAM bank needs to be large enough to hold the endre FIB. TTCAM cycle dme needs to be fast enough to support the max packet rate of the line card.
TCAM width depends on the chip set in use. One popular TCAM config is 72 bits wide. IPv4 addresses consume a single 72 bit slot, IPv6 consumes two 72 bit slots. If instead you use TCAM with a slot width of 32 bits then IPv6 entries consume 4 times the equivalent slot count of IPv4 entries.
TRIE Lookup Address
Outbound Interface identifier
192.0.2.1
I/F 3/1
11000000 00000000 00000010 00000001
1/0
1/0
1/0
1/0
1/0
x/0000
?
?
?
?
?
…
? The endre FIB is converted into a serial decision tree. The size of decision tree depends on the distribudon of prefix values in the FIB. The performance of the TRIE depends on the algorithm used in the ASIC and the number of serial decisions used to reach a decision
ASIC
DRAM
Memory Tradeoffs
TCAM
Lower
Higher
Higher
Higher
Larger
80Mbit
ASIC + RLDRAM 3
Higher
Lower
Lower
Lower
Smaller
1Gbit
Thanks to Greg Hankins
Access Speed
$ per bit
Power
Density
Physical Size
Capacity
Memory Tradeoffs TCAMs are higher cost, but operate with a fixed search latency and a fixed add/delete dme. TCAMs scale linearly with the size of the FIB ASICs implement a TRIE in memory. The cost is lower, but the search and add/delete dmes are variable. The performance of the lookup depends on the chosen algorithm. The memory efficiency of the TRIE depends on the prefix distribudon and the pardcular algorithm used to manage the data structure
Size
What memory size do we need for 10 years of FIB growth from today? TCAM
V4: 2M entries (1Gt) plus V6: 1M entries (2Gt)
2014 2019 2024
512K 25K 125K
1M 768K 512K V6 FIB
V4 FIB
Trie
V4: 100Mbit memory (500Mt) plus V6: 200Mbit memory (1Gt)
“The Impact of Address Allocadon and Roudng on the Structure and Implementadon of Roudng Tables”, Narayn, Govindan & Varghese, SIGCOMM ‘03
Scaling the FIB
BGP table growth is slow enough that we can condnue to use simple FIB lookup in linecards without straining the state of the art in memory capacity However, if it all turns horrible, there are alternadves to using a complete FIB in memory, which are at the moment variously robust and variously viable:
FIB compression MPLS Locator/ID Separadon (LISP) OpenFlow/Somware Defined Networking (SDN)
Speed
1980 1990 2000 2010 2020
1Tb
10Mb
100Mb
1Gb
10Gb
100Gb
10Mb 1982/15Kpps
100Mb 1995 / 150Kpps
1Gb 1999 / 1.5Mpps
10Gb 2002 / 15Mpps
40Gb/100Gb 2010 / 150Mpps
400Gb/1Tb 2017? 1.5Gpps
Speed, Speed, Speed
What memory speeds are necessary to sustain a maximal packet rate? 100GE 150Mpps 6.7ns per packet 400Ge 600Mpps 1.6ns per packet 1Te 1.5Gpps 0.67ns per packet
0ns 10ns 20ns 30ns 40ns 50ns
100Ge 400Ge 1Te
Speed, Speed, Speed
What memory speeds do we HAVE?
0ns 10ns 20ns 30ns 40ns 50ns
Commodity DRAM DDR3DRAM
RLDRAM
Thanks to Greg Hankins
100Ge
400Ge
1Te
Scaling Speed Scaling speed is going to be tougher over dme Moore’s Law talks about the number of gates per circuit, but not circuit clocking speeds Speed and capacity could be the major design challenge for network equipment in the coming years If we want to exploit parallelism as an alternadve to wireline speed for terrabit networks, then is the use of best path roudng protocols, coupled with desdnadon-‐based hop-‐based forwarding going to scale? Or are we going to need to look at path-‐pinned roudng architectures to provide stable flow-‐level parallelism within the network?
h5p://www.startupinnovadon.org/research/moores-‐law/
Thank You
Questions?
Top Related