Data centre networking at the University of Bristol - Networkshop44
Data centre networking at London School of Economics and Political Science - Networkshop44
Transcript of Data centre networking at London School of Economics and Political Science - Networkshop44
Protocol Hamburger
Data centre networking at LSE
Matt Bernstein
Protocol Hamburger
Matt BernsteinLSE
New DCI requirements
• Encrypt everything• VLANs• Availability: 100%?• Bandwidth: lots?• Latency: none?• Scale
– 1600 VMs?– “everything in Azure”?
Campus/Hallszones
encrypteddatacentre
interconnect
InternetSloughLondon
Campus
Local DC VLANs
Shared DC VLANs
Local DC VLANs
encrypted tunnel over Janet
171 VLANsin London
magic routerin London
magic routerin Slough
171 VLANsin Slough
Janet offerings for Slough
• high-capacity, low latency IPv4/IPv6 network• L2 (“JPWS”), but 802.1q tagged on the L3 link
– unless you have Ciena light-path kit on campus– even JPWS relies on Janet routing protocols
• 9000-byte MTU– 9192 bytes on interface, but 9000 in protocols– GÉANT run 9000, Janet not minded to change
• no Out-of-Band access– tenants have all done their own thing
LSE #SDC
sloudc-ban1sloudc-ban2
londpg-ban1londtw-ban1
londpg-sbr1londtw-sbr1 londic-rbr2londsh-rbr2
LSE #1
LSE #2
High level diagram LSE, SDC and London Region
J6 Core J6 Core
“Just Say No”: problems with L2
• Broadcast domains / fault domains• Routing across two locations harder• Spanning Tree • No hop count for simple loop detection• MAC address limit of switching hardware• Mixed MTU on the same segment
BUM traffic• “Normal” switches flood BUM frames• So do most “L2VPN” technologies
– VPLS / EoMPLS (including JPWS)– VXLAN…
• Exacerbated by virtual servers – every VLAN on every port, means every
BUM frame spammed to every hypervisor
Technology selection
• Cisco– vs Juniper (vs HP vs Arista vs..)
• OTV– vs EVPN (vs EVI vs VXLAN vs..)
• Nexus L2 fabric– vs Virtual Chassis Fabric (..)
• FEX blades– vs HP PassThru blades
• DACs, DACBOs• FCoE?
Juniper selected
• Metafabric reference architecture– MX240 routers– SRX5400 firewalls– QFX5100 switches
• EVPN for L2 DCI, bypassing firewalls• Enough grunt to last more than five
years• Can do so much more than just VLANs
fpc5 fpc6fpc4fpc3fpc2 fpc7
fpc1fpc0
node0 node1
bacon158.143.220.2532001:630:9:f220::2
onion158.143.220.254
2001:630:9:f220::1
Janet
xe-1/0/4146.97.129.42/30
2001:630:0:9001::2a/126
xe-1/0/4146.97.129.46/302001:630:0:9001::2e/126
ae0 (xe-1/0/{5,6})158.143.220.2/302001:630:9:f220::1:2/126
ae0 (xe-1/0/{5,6})158.143.220.1/30
2001:630:9:f220::1:1/126
ae1 (xe-1/0/{0,1})Layer 2
ae1 (xe-1/0/{0,1})Layer 2
irb.482 (xe-1/0/{2,3})158.143.220.9/30
irb.482 (xe-1/0/{2,3})158.143.220.5/30
sdc-ban1sdc-ban2
reth1 (xe-2/2/7, xe-5/2/6)158.143.220.10/30
reth0 (xe-2/2/6, xe-5/2/7)158.143.220.6/30
reth2 (xe-{2,5}/2/{0,1,8,9})routed VLANs
Traffic flowthrough Janet
Ethernet VPN
• (MP-)BGP control plane for MAC addresses– FabricPath and VCF are both IS-IS for MAC addresses
mb@press> show route table bgp.evpn.0 evpn-mac-address 00:50:56:91:3e:ca
bgp.evpn.0: 882 destinations, 882 routes (882 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both
2:158.143.220.253:1::1133::00:50:56:91:3e:ca/304 *[BGP/170] 1w4d 07:39:40, localpref 100, from 158.143.220.253 AS path: I, validation-state: unverified > via gr-1/3/0.2, label-switched-path press-to-bacon to 158.143.221.0 via ae0.0, label-switched-path press-to-bacon
Ethernet VPN• MPLS forwarding plane
– using RSVP for fast convergence– MPLS first to be standardised [RFC7432]; VXLAN becoming increasingly popular
mb@press> show evpn instance extensive | match "VLAN|1133" VLAN VNI Intfs / up IRB intf Mode MAC sync IM route label 1133 None 1 1 Extended Enabled 371904
mb@press> show route label 371904
mpls.0: 705 destinations, 877 routes (705 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both
371904 *[EVPN/7] 4w5d 12:31:51, routing-instance DATACENTRE-EVPN1, route-type Ingress-IM, vlan-id 1133 to table DATACENTRE-EVPN1.evpn-mac.0
Our stackL2
EVPN (BGP)RSVP
MPLSOSPF
GREIPSec
BGPIPv6L2
158-byte hit per packet• more than 10% overhead for 1500-byte frames• less than 2% overhead for 9000-byte frames
set interface xe-1/0/4.0 family inet6 mtu 9000set services ipsec-vpn rule X term 1 tunnel-mtu 8910set interface ms-1/2/0 mtu 8910set interface gr-1/3/0 mtu 8886set interface ae1 mtu 8842
It Works
~ 3Gb/s throughput (IMIX, 1500-byte MTU)
~ 9Gb/s throughput (single TCP stream, 9000-byte MTU)
Latency (RTTs from London MX to Slough MX):Raw: 2.7ms (small packets) / 4.8ms (8000 byte)Burgered: 3.3ms (small packets) / 5.9ms (8000 byte)
~ 1-2s to re-converge in the event of single failure
The Bad News
• we found some new bugs in Junos• routing protocols within Janet are a SPoF for our DCI• layering is not as strict as I would like (too much in inet.0)• we're not yet running any L3 on the DC networks in Slough
– partly time constraints, partly a few glitches• the existing firewall adds another 1ms to the RTT, if crossing
subnets– round-trip between two VMs on different VLANs in Slough is 7ms– web servers making lots of DB queries to render a web page are slow
What might we have done differently?
• EVPN is now available on the QFX5100 switches– with a VXLAN forwarding plane
• OTV is simpler to configure, less bleeding edge– but even Cisco seem not to be releasing new OTV hardware
(ASR1k, N7k both old—and expensive)• EVPN/VXLAN appearing on platforms like Juniper MX,
Cisco ASR9k, Cisco N9k, Arista– all three vendors have VMs for testing
Questions