PLNOG14: Waltzing on that gentle trade‐off between internet routes and FIB space, an SDN story -...

29
Waltzing on that gentle tradeoff between internet routes and FIB space, an SDN story

Transcript of PLNOG14: Waltzing on that gentle trade‐off between internet routes and FIB space, an SDN story -...

Waltzing on that gentle trade‐off between internet routes and FIB space, an SDN story

Network Engineer @Spotify

● +10 years in the network industry● Python enthusiast● Automation junkie

David Barroso

Principal Software Developer @pmacct

● +10 years measuring and correlating traffic flows● Service Providers are his DNA

Paolo Lucente

Spotify is a commercial music streaming service providing digital rights management-restricted content from record labels [...] Paid "Premium" subscriptions remove advertisements and allow users to download music to listen to offline.

Source: http://en.wikipedia.org/wiki/Spotify

What is Spotify?

● Over 60 million monthly active users● 15 million paid subscribers● More than 30 million songs● Around 20.000 songs added each day● Available in 58 markets

Source: https://press.spotify.com/information/

Some facts about Spotify

● 4 major datacenters● Stockholm● London● San Jose● Ashburn

● Users are directed to the closest datacenter by default.

● In case of fault or maintenance users can be redirected to another datacenter.

Global Architecture of Spotify

RIB vs. FIB

● RIB (Routing Information Base)● A representation in memory of all available paths and their attributes. This

information is fed by routing protocols.

● FIB (Forwarding Information Base)● A copy of the RIB (usually in hardware) where some attributes are resolved

(like next-hop or outgoing interface)

RIB vs. FIB (cont’d)

● RIB (Routing Information Base)● Virtually unlimited (limited only by the memory of the device)

● FIB (Forwarding Information Base)● Limited by the underlying hardware.

● Between 64k-128k LPM prefixes in modern switches with commodity ASICs.● Between 500k-1000k LPM prefixes in expensive routers/switches with customized ASICs.

RIB vs. FIB (cont’d)

● Typically there is a one to one mapping between RIB and FIB entries.● The Internet is comprised of +512k prefixes.● This means we need to support +512k FIB entries if we want to reach

the entire Internet.

When you travel...

● Do you carry an Atlas?● Or do you carry a local map?

So…

● Why do I need all the prefixes?● What if I only install the prefixes I really need?

Do we really need to reach the entire Internet?

● Spotify streams music to users.● We typically serve users in the closest datacenter.● Why would my datacenter in San Jose need to know how to reach

users in Poland?

Example: Stockholm Datacenter

● Total unique IPv4 prefixes (including transit): 519.000● Prefixes coming from peers: 150.000● Average prefixes active per day: approx. 16.000● Average prefixes from peers active per day: approx. 2.000

Example: Stockholm Datacenter

● Green line shows the total amount of prefixes that have been used at least once

● Blue line the amount of new prefixes required at a given time

● Total prefixes: 519.000● Prefixes added within the first hour:

6412● New prefixes added within the

second hour: 1818● New prefixes added within the

second hour: 1242● New prefixes added within the

tenth hour: 327

Example: Stockholm Datacenter

● Red line shows total bandwidth● Blue line shows traffic with a

matching prefix in the FIB● Purple line shows % of traffic

matched.

● When we start we match 0% of the traffic.

● After the first iteration (6412 routes added) we match 99.263%

● After that we go to 99.9%.● New prefixes are most likely scans

Goal

● Pick only the routes I need from the RIB so I can fit them on the FIB of a switch with commodity ASICs.

Motivation

Switches:● Are cheaper.● Have higher 10G density

● 256x10G in 2 RU or 48x10G + 2x100G in 1 RU vs 288 x 10G ports in 10 RU

● Consume less power● 3.4W vs 33W per 10G port

SDN Internet Router key pieces:Selective Route DownloadFeature that allows you to pick a subset of the routes on the RIB and install them on the FIB. Available in Cumulus (bird), Arista*, Juniper, some Cisco routers and probably others.

* If you plan to try on Arista contact your SE.

SDN Internet Router key pieces:pmacct

http://www.pmacct.net/

pmacct: BGP integration

pmacct introduced a Quagga-based BGP daemon:● Implemented as a parallel thread within the collector● Doesn’t send UPDATEs and WITHDRAWs whatsoever● Behaves as a passive BGP neighbor● Maintains per-peer BGP RIBs● Supports 32-bit ASNs; IPv4, IPv6 and VPN families ● Supports ADD-PATH (draft-ietf-idr-add-paths)

Why BGP at the collector?● Telemetry reports on forwarding-plane, and a bit more.● Extended visibility into control-plane information

SDN Internet Router: overview

Internet Switch

Transit/Backbone

IXP

pmacct

BGP Controller

Spotify AP

● Transit will send the default route to the Internet Switch. The route is installed by default in the FIB.

● We receive from the IXP all the peers´ prefixes. Those are not installed, they are forwarded to pmacct and the BGP Controller.

● Pmacct will receive in addition sFlow data.

0.0.0.0/0

Peers’ prefixes

Peers’ prefixes

Peers’ prefixes& sFlow

SDN Internet Router: pmacct

Internet Switch

Transit IXP

pmacct

BGP Controller

Spotify AP

● pmacct aggregates sFlow data using the BGP information previously sent by the Internet Switch.

● pmacct reports the flows aggregated by BGP prefix to the BGP controller.

● The BGP controller will compute the best TopN prefixes.

● The BGP controller instructs the Internet switch to install those TopN* prefixes.

* N is a number close to the maximum number of entries that the FIB of the Internet Switch can support.

1. This is the flow information aggregated per prefix

3. Please, install these prefixes I computed.

Peers’ prefixes& sFlow

2. Compute best TopN prefixes.

SDN Internet Router: Spotify’s AP

Internet Switch

Transit IXP

pmacct

BGP Controller

Spotify AP

● $USER connects to the service● The application informs the access

point that $USER has connected and requests that the $PREFIX containing his/her $IP is installed on the FIB.● It might be installed already as another user

within the same range might have connected previously or because pmacct reported that prefix as being one of the TopN prefixes.

● The BGP controller instructs the Internet Switch to install the prefix if necessary.

1. $USER connects with $IP.

2. Please, make sure $PREFIX containing $IP is installed.

3. Install $PREFIX (if it wasn’t already).

Considerations

The BGP controller updates a prefix list containing the prefixes that the device must take from the RIB and install on the FIB.

● If a prefix is removed from the RIB it will be removed from the FIB by the device.

● If the BGP controller fails the prefix list remains in the device. Allowing the device to operate normally as per the last instructions.

“Warming up” the FIB is important if you don’t want to buy as much transit capacity as peering capacity you have. You can:

● If your application supports it, move users by region to the new location.● Precompute the FIB using the flow information from another location and the

BGP table from the new location before you start advertising your prefixes.

Present and Future

● Demo run in Stockholm connected to Netnod to gather information (no actual changes performed on the Internet Router there).

● Pilot to be run very soon in a major IXP in Europe.

Simulation in Stockholm: N = 10.000 prefixes

● Green line shows the total amount of prefixes that have been used at least once

● Blue line the amount of new prefixes required at a given time.

● Red line shows the amount of prefixes removed.

Simulation in Stockholm: N = 10.000 prefixes

● Red line shows total bandwidth● Blue line shows traffic with a

matching prefix in the FIB● Purple line shows % of traffic

matched.

● With only 10.000 prefixes we offload around 99% of the traffic at all times.

Simulation in Stockholm: N = 30.000 prefixes

● Green line shows the total amount of prefixes that have been used at least once

● Blue line the amount of new prefixes required at a given time.

● Red line shows the amount of prefixes removed.

Simulation in Stockholm: N = 30.000 prefixes

● Red line shows total bandwidth● Blue line shows traffic with a

matching prefix in the FIB● Purple line shows % of traffic

matched.

● With only 30.000 prefixes we offload around 99.9% of the traffic at all times.

Do you want to peer with us?

● Find us at peeringDB. We are peering under as8403.● Send an email to [email protected] even if we are not present in

your location.

Email: [email protected]: https://github.com/dbarrosop/sirDocumentation and configuration examples: http://sdn-internet-router-sir.readthedocs.org/en/latest/

Questions?

Want to join the band?Check out spotify.com/jobs or @Spotifyjobs for more information.