Data centre networking at London School of Economics and Political Science - Networkshop44

22
Protocol Hamburger Data centre networking at LSE Matt Bernstein

Transcript of Data centre networking at London School of Economics and Political Science - Networkshop44

Page 1: Data centre networking at London School of Economics and Political Science - Networkshop44

Protocol Hamburger

Data centre networking at LSE

Matt Bernstein

Page 2: Data centre networking at London School of Economics and Political Science - Networkshop44

Protocol Hamburger

Matt BernsteinLSE

Page 3: Data centre networking at London School of Economics and Political Science - Networkshop44

New DCI requirements

• Encrypt everything• VLANs• Availability: 100%?• Bandwidth: lots?• Latency: none?• Scale

– 1600 VMs?– “everything in Azure”?

Page 4: Data centre networking at London School of Economics and Political Science - Networkshop44

Campus/Hallszones

encrypteddatacentre

interconnect

InternetSloughLondon

Campus

Local DC VLANs

Shared DC VLANs

Local DC VLANs

Page 5: Data centre networking at London School of Economics and Political Science - Networkshop44

encrypted tunnel over Janet

171 VLANsin London

magic routerin London

magic routerin Slough

171 VLANsin Slough

Page 6: Data centre networking at London School of Economics and Political Science - Networkshop44

Janet offerings for Slough

• high-capacity, low latency IPv4/IPv6 network• L2 (“JPWS”), but 802.1q tagged on the L3 link

– unless you have Ciena light-path kit on campus– even JPWS relies on Janet routing protocols

• 9000-byte MTU– 9192 bytes on interface, but 9000 in protocols– GÉANT run 9000, Janet not minded to change

• no Out-of-Band access– tenants have all done their own thing

Page 7: Data centre networking at London School of Economics and Political Science - Networkshop44

LSE #SDC

sloudc-ban1sloudc-ban2

londpg-ban1londtw-ban1

londpg-sbr1londtw-sbr1 londic-rbr2londsh-rbr2

LSE #1

LSE #2

High level diagram LSE, SDC and London Region

J6 Core J6 Core

Page 8: Data centre networking at London School of Economics and Political Science - Networkshop44

“Just Say No”: problems with L2

• Broadcast domains / fault domains• Routing across two locations harder• Spanning Tree • No hop count for simple loop detection• MAC address limit of switching hardware• Mixed MTU on the same segment

Page 9: Data centre networking at London School of Economics and Political Science - Networkshop44

BUM traffic• “Normal” switches flood BUM frames• So do most “L2VPN” technologies

– VPLS / EoMPLS (including JPWS)– VXLAN…

• Exacerbated by virtual servers – every VLAN on every port, means every

BUM frame spammed to every hypervisor

Page 10: Data centre networking at London School of Economics and Political Science - Networkshop44

Technology selection

• Cisco– vs Juniper (vs HP vs Arista vs..)

• OTV– vs EVPN (vs EVI vs VXLAN vs..)

• Nexus L2 fabric– vs Virtual Chassis Fabric (..)

• FEX blades– vs HP PassThru blades

• DACs, DACBOs• FCoE?

Page 11: Data centre networking at London School of Economics and Political Science - Networkshop44

Juniper selected

• Metafabric reference architecture– MX240 routers– SRX5400 firewalls– QFX5100 switches

• EVPN for L2 DCI, bypassing firewalls• Enough grunt to last more than five

years• Can do so much more than just VLANs

Page 12: Data centre networking at London School of Economics and Political Science - Networkshop44

fpc5 fpc6fpc4fpc3fpc2 fpc7

fpc1fpc0

node0 node1

bacon158.143.220.2532001:630:9:f220::2

onion158.143.220.254

2001:630:9:f220::1

Janet

xe-1/0/4146.97.129.42/30

2001:630:0:9001::2a/126

xe-1/0/4146.97.129.46/302001:630:0:9001::2e/126

ae0 (xe-1/0/{5,6})158.143.220.2/302001:630:9:f220::1:2/126

ae0 (xe-1/0/{5,6})158.143.220.1/30

2001:630:9:f220::1:1/126

ae1 (xe-1/0/{0,1})Layer 2

ae1 (xe-1/0/{0,1})Layer 2

irb.482 (xe-1/0/{2,3})158.143.220.9/30

irb.482 (xe-1/0/{2,3})158.143.220.5/30

sdc-ban1sdc-ban2

reth1 (xe-2/2/7, xe-5/2/6)158.143.220.10/30

reth0 (xe-2/2/6, xe-5/2/7)158.143.220.6/30

reth2 (xe-{2,5}/2/{0,1,8,9})routed VLANs

Page 13: Data centre networking at London School of Economics and Political Science - Networkshop44

Traffic flowthrough Janet

Page 14: Data centre networking at London School of Economics and Political Science - Networkshop44

Ethernet VPN

• (MP-)BGP control plane for MAC addresses– FabricPath and VCF are both IS-IS for MAC addresses

mb@press> show route table bgp.evpn.0 evpn-mac-address 00:50:56:91:3e:ca

bgp.evpn.0: 882 destinations, 882 routes (882 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both

2:158.143.220.253:1::1133::00:50:56:91:3e:ca/304 *[BGP/170] 1w4d 07:39:40, localpref 100, from 158.143.220.253 AS path: I, validation-state: unverified > via gr-1/3/0.2, label-switched-path press-to-bacon to 158.143.221.0 via ae0.0, label-switched-path press-to-bacon

Page 15: Data centre networking at London School of Economics and Political Science - Networkshop44

Ethernet VPN• MPLS forwarding plane

– using RSVP for fast convergence– MPLS first to be standardised [RFC7432]; VXLAN becoming increasingly popular

mb@press> show evpn instance extensive | match "VLAN|1133" VLAN VNI Intfs / up IRB intf Mode MAC sync IM route label 1133 None 1 1 Extended Enabled 371904

mb@press> show route label 371904

mpls.0: 705 destinations, 877 routes (705 active, 0 holddown, 0 hidden)+ = Active Route, - = Last Active, * = Both

371904 *[EVPN/7] 4w5d 12:31:51, routing-instance DATACENTRE-EVPN1, route-type Ingress-IM, vlan-id 1133 to table DATACENTRE-EVPN1.evpn-mac.0

Page 16: Data centre networking at London School of Economics and Political Science - Networkshop44

Our stackL2

EVPN (BGP)RSVP

MPLSOSPF

GREIPSec

BGPIPv6L2

Page 17: Data centre networking at London School of Economics and Political Science - Networkshop44

158-byte hit per packet• more than 10% overhead for 1500-byte frames• less than 2% overhead for 9000-byte frames

set interface xe-1/0/4.0 family inet6 mtu 9000set services ipsec-vpn rule X term 1 tunnel-mtu 8910set interface ms-1/2/0 mtu 8910set interface gr-1/3/0 mtu 8886set interface ae1 mtu 8842

Page 18: Data centre networking at London School of Economics and Political Science - Networkshop44

It Works

~ 3Gb/s throughput (IMIX, 1500-byte MTU)

~ 9Gb/s throughput (single TCP stream, 9000-byte MTU)

Latency (RTTs from London MX to Slough MX):Raw: 2.7ms (small packets) / 4.8ms (8000 byte)Burgered: 3.3ms (small packets) / 5.9ms (8000 byte)

~ 1-2s to re-converge in the event of single failure

Page 19: Data centre networking at London School of Economics and Political Science - Networkshop44

The Bad News

• we found some new bugs in Junos• routing protocols within Janet are a SPoF for our DCI• layering is not as strict as I would like (too much in inet.0)• we're not yet running any L3 on the DC networks in Slough

– partly time constraints, partly a few glitches• the existing firewall adds another 1ms to the RTT, if crossing

subnets– round-trip between two VMs on different VLANs in Slough is 7ms– web servers making lots of DB queries to render a web page are slow

Page 20: Data centre networking at London School of Economics and Political Science - Networkshop44

What might we have done differently?

• EVPN is now available on the QFX5100 switches– with a VXLAN forwarding plane

• OTV is simpler to configure, less bleeding edge– but even Cisco seem not to be releasing new OTV hardware

(ASR1k, N7k both old—and expensive)• EVPN/VXLAN appearing on platforms like Juniper MX,

Cisco ASR9k, Cisco N9k, Arista– all three vendors have VMs for testing

Page 21: Data centre networking at London School of Economics and Political Science - Networkshop44

Questions

Page 22: Data centre networking at London School of Economics and Political Science - Networkshop44

jisc.ac.uk

Thank you

Matt [email protected]