High Performance Cloud-native Networking · 2019-12-21 · High Performance Cloud-native Networking...

35
High Performance Cloud-native Networking K8s Unleashing FD.io Giles Heron [email protected] [email protected] Maciek Konstantynowicz Principal Engineer, Cisco FD.io CSIT Project Lead Distinguished Engineer, Cisco Jerome Tollet [email protected] Distinguished Engineer, Cisco

Transcript of High Performance Cloud-native Networking · 2019-12-21 · High Performance Cloud-native Networking...

High Performance Cloud-native NetworkingK8s Unleashing FD.io

Giles Heron

[email protected]@cisco.com

Maciek KonstantynowiczPrincipal Engineer, Cisco FD.io CSIT Project Lead

Distinguished Engineer, Cisco

Jerome Tollet

[email protected] Engineer, Cisco

DISCLAIMERs• 'Mileage May Vary'

• Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your opinion and investment of any resources. For more complete information about open source performance and benchmark results referred in this material, visit https://wiki.fd.io/view/CSIT and/or https://docs.fd.io/csit/rls1807/report/.

• Trademarks and Branding• This is an open-source material. Commercial names and brands may be claimed as the property of others.

Internet Mega Trends – ..

Portability and Efficiency

Scalability and Self-Healing

Open Source Platforms

Cloud Native Designs

Software Defined Networking

Cloud

NFV SDN

THE SOFTWARE DEFINED OPERATOR

12-NOV-2013© Deutsche Telekom AG, 2013 4

NFV SDN

IPDO YOU REMEMBER .. ?

5 Pillars of Next Generation Software Data Planes

Blazingly Fast• Process the massive explosion of East-West traffic• Process increasing North-South traffic

Truly Extensible• Foster pace of innovation in cloud-native networking• No compromise on performance (zero-tolerance)

Software First• Cloud means running everywhere• Cloud means hardware and physical infra agnostic

Predictable performance• Dataplane performance must be deterministic• Predictable for a number of VMs, Containers,

virtual topology and (E-W, N-S) traffic matrix

FD.io VPP meets these challengesHow can one use it in large scale Cloud-native networks ?

Measureable• Counters everywhere to count everything for detailed

cross-layer operation and efficiency monitoring• Enables feedback loop to drive optimizations

The Way Applications Are Developed and Deployed.. Has Changed..

Time

The Way Networks are Deployed and Used… has Changed…

Internet Data-CenterCorporate LAN/WAN”80:20 rule”

LAN WAN

Intranets & Internet

LAN WAN

Internet

WWW

SD-WAN & “BeyondCorp”

WiFi Internet

CloudProvider

VLAN 1 Core

Spine/Leaves & L3 Core/L2 AccessTiered Transit & Private Peering

Tier1 –A Tier1 – B Tier1 – C

Tier2 – D Tier2 – E Tier2 – F

Internet exchanges & public peering

Network 1 Network 2 Network 3

Telco/Cable Access &, OTT/CDN Content

Telco 1 Telco 2 MSO 1

Cloud Provider 1 Global Backbone

Core/Distr/Access, VLAN based

VLAN 1VLAN 2VLAN n

VLAN 2 VLAN 2’Tim

e

Cloud Provider 2 Global Backbone

L3 Fabric/SW Overlay & Virt Networking

Layer 3 Fabric

Aside: A Trip Down Memory Lane(Transporting Data vs. Processing Data)

• Year 2012• Internet service provider comment at IETF: processing bits is cheaper than

transporting bits, computing and networking - networking is becoming 1st order citizen on compute platforms.

• Year 2013• RIPE67 Terastream been fixing the cost of transporting bits - 96 of 100GE

coherent lambdas per fibre span - transporting is getting cheaper, so challenging the compute part again

• more bandwidth delivered to Data Centres• Most/all network services in Data Dentres

https://ripe67.ripe.net/presentations/131-ripe2-2.pdf

https://ripe67.ripe.net/archives/video/3/ NFV SDN

IP

MK MK

Electromechanical Solid-stateRelay

Vacuum Tube Transistor Integrated Circuit1010

106

104

102

1

10-2

10-4

10-6

Moore’sLaw 1965

108

Calcu

latio

ns p

er Se

cond

per

$1,

000

Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.

Remember 1965 ”Moore’s Law” – ..

Mechanical

MK

Popularpre 1900

MK

Electromechanical Solid-stateRelay

Vacuum Tube Transistor Integrated Circuit

18001600 1700

BooleanLogic

BinaryCode

1010

106

104

102

1

10-2

10-4

10-6

Moore’sLaw 1965

108

Calcu

latio

ns p

er Se

cond

per

$1,

000

Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.

Remember 1965 ”Moore’s Law” – ..

Mechanical

MK

Popularpre 1900

MK

Electromechanical Solid-stateRelay

Vacuum Tube Transistor Integrated Circuit

18001600 1700

BooleanLogic

BinaryCode

1010

106

104

102

1

10-2

10-4

10-6

Moore’sLaw 1965

108

Calcu

latio

ns p

er Se

cond

per

$1,

000

Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.

Remember 1965 ”Moore’s Law” – Is It Still Applicable?

Mechanical

MK

Popularpre 1900

MK

Electromechanical Solid-stateRelay

Vacuum Tube Transistor Integrated Circuit

18001600 1700

BooleanLogic

BinaryCode

1010

106

104

102

1

10-2

10-4

10-6

Moore’sLaw 1965

108

Calcu

latio

ns p

er Se

cond

per

$1,

000

Popularin 2017

Xeon®Skylake

ModernµProcessor

2010 2020Era of ModernµProcessors

Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.

Remember 1965 ”Moore’s Law” – Is It Still Applicable?

Mechanical

MK

Popularpre 1900

MK

Electromechanical Solid-stateRelay

Vacuum Tube Transistor Integrated Circuit

18001600 1700

BooleanLogic

BinaryCode

1010

106

104

102

1

10-2

10-4

10-6

Moore’sLaw 1965

108

Calcu

latio

ns p

er Se

cond

per

$1,

000

Popularin 2017

Xeon®Skylake

ModernµProcessor

2010 2020Era of ModernµProcessors

Remember 1965 ”Moore’s Law” – Yes, It Surely Is ..

Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.

“Ramble On ..”

Resources to Get Performance1) Processor and CPU cores

for performing packet processing operations_ __

2) Memory bandwidthfor moving data (packets, lookup) and instructions (packet processing)

3) I/O bandwidthfor moving packets to/from NIC interfaces _

4) Inter-socket bandwidthfor handling inter-socket operations _

Processing Packets: How to Use Compute ..

Socket 0

Skylake Server CPU

Socket 1

Skylake Server CPU

UPI

UPI

PCIe

DDR4 DDR4

DDR4

SATA

B IOS

1

4

3

2

3

1

2

DDR4

DMI

PCH

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

PCIe

x16

PCIe

x16

PCIe

x16

PCIe

x16PC

Iex16

PCIe

x16

UPI

280 Gbps@ 1,000Bytes ETH frames

100GE 100GE100GE100GE 100GE100GE

280 Gbps@ 1,000Bytes ETH frames

Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]

CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI

∗ #STLURHFGHIJKLIMNG

Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=

]SOZ_[JR\[]^STLURH__RJ_OPLQRI

1

2

3

4

MK MK

Socket 0

Skylake Server CPU

Socket 1

Skylake Server CPU

UPI

UPI

PCIe

DDR4 DDR4

DDR4

SATA

B IOS

1

4

3

2

3

1

2

DDR4

DMI

PCH

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

PCIe

x16

PCIe

x16

PCIe

x16

PCIe

x16PC

Iex16

PCIe

x16

UPI

280 Gbps@ 1,000Bytes ETH frames

100GE 100GE100GE100GE 100GE100GE

280 Gbps@ 1,000Bytes ETH frames

Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]

CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI

∗ #STLURHFGHIJKLIMNG

Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=

]SOZ_[JR\[]^STLURH__RJ_OPLQRIMoore’s Law in Action

Resources to Get Performance1) Processor and CPU cores

FrontEnd: faster instr. decoder (4- to 5-wide)BackEnd: faster L1 cache, bigger L2 cache, deeper OOO* executionUncore: move from ring to X-Y fabric mesh

2) Memory bandwidth~50% increase: channels (4 to 6), speed (DDR-2666)

3) I/O bandwidth>50% increase: PCIe lanes (40 to 48), re-designed IO blocks

4) Inter-socket bandwidth~60% increase: QPI to UPI (2x to 3x), interface speed (9.6 to 10.4 GigTrans/sec)

1

2

3

4

Processing Packets: What Improves in Compute ..

MK MK

Packet Processing Software Platform• High performance• Linux user space• Runs on compute CPUs: � And “knows” how to run them well !

Shipping at volume in server & embedded products

16

Packet Processing

Dataplane Management Agent

Network IO

Bare-Metal / VM / Container

FD.io VPP – Vector Packet ProcessingCompute-Optimised SW Networking Platform

FD.io VPP – The “Magic” of VectorsCompute Optimized SW Network Platform

1Packet processing is decomposedinto a directed graph of nodes …

Packet 0

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Packet 7

Packet 8

Packet 9

Packet 10

… packets move through graph nodes in vector …2

Microprocessor

… graph nodes are optimized to fit inside the instruction cache …

… packets are pre-fetched into the data cache.

Instruction Cache3

Data Cache4

3

4

Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.

vhost-user-input

af-packet-input dpdk-input

ip4-lookup-mulitcast ip4-lookup*

ethernet-input

mpls-inputlldp-input

arp-inputcdp-input...-no-

checksum

ip6-inputl2-input ip4-input

ip4-load-balance

mpls-policy-encap

ip4-rewrite-transit

ip4-midchain

interface-output

* Each graph node implements a “micro-NF”, a “micro-NetworkFunction” processing packets.

MK MK

Contiv NetmasterSFC

Controller

Kubernetes API Proxies

Service Policy Service Topology Lifecycle

KubeletCNI

CRIContiv Netmaster

Production-Grade Container Orchestration

Network Function and Network Topology Orchestration

Containerized Network Data Plane

ContainerNetwork Function CNF

Agent Agent Agent

FD.io VPPContainer Switch

Agent

Container Networking

Networking Plugin

CNF

LIGATOContiv

Cloud-native Network Micro-ServicesFor Native Cloud Network Services

Production-GradeContainer Orchestration

Cloud-native Network FunctionOrchestration

Containerized FastData Input/ Output

Performance-CentricContainer Networking

Enabling Production-Grade Native Cloud Network Services at Scale

MK GH

Kubelet

CNI

tapv2/veth

Contiv-VPP vswitch

Agent

PodPodPod

VPP …

K8s Master

Data Centre Fabric

App

Kernel Host stack

Legacy Apps K8s State Reflector

Contiv-VPPEtcd

Kubelet

CNI

tapv2/veth

Contiv-VPP vswitch

Agent

PodPodPod

VPP

App

Kernel Host stack

High PerformanceApps

PodPodPod

Envoy Sidecar App

VPP TCPStack

PodPodPod

High PerformanceApps

Envoy SidecarApp

VPP TCPStack

memif

Legacy Apps

PodPodPod

CNF

memif

Cloud-Native NFs

PodPodPod

CNF

Cloud-Native NFs

K8s policy & state distribution

Contiv-VPP Architecture

GH GH

TopologyPlacement

(K8s)Rendering

NF1 NF2 NF3

Logical RepresentationIngress Network

Ingress Classifier

Egress Network

Egress Classifier

IngressRouter

EgressRouter

Host

VPP Vswitch

CNF

VPP

10.1.0.127

CNF1

VPP

CNF2

VPP

…Server

Vswitch VPP

CNF

VPP

CNF

VPP

CNF3

VPP

Overlay Tunnel

Physical Representation

Overlay Tunnel Overlay Tunnel

Ingress Classifier Egress Classifier

Service Function Chaining with Ligato

GH GH

Kubelet

CNICRI

tapv2/veth

Contiv-VPP vswitch

Agent

PodPodPodPodPod

Pod

VPP

Data Centre Fabric

High PerformanceApps

Sidecar Proxy App

VPP TCPStack

App

Kernel Host stack

Legacy Apps

• Kubernetes does not provide a way to stitch micro-services together today• Ligato enables you to wire the data plane together into a service topology• Network functions can now become part of the service topology• Dedicated Telemetry Engine in VPP to enable closed-loop control• Offload functions to NIC but via vSwitch in host memory

Contiv-VPP Etcd

K8s Master

Contiv-VPPNetmaster

DefineTopology

LigatoController

DefineServices

DefineTopology

Ligato – Cloud-native NFs (CNFs)

Smarts in NICs

TelemetryEngine

Back Propagation LoopFor Reactive Placement/Rsrc Mgmt

PodPodPod

memif

Cloud-Native VNFs

Agent

VPP

GH GH

“Without data, you're just another person with an opinion.” ― W. Edwards Deming

§ Discover the limits and know them

§ Assess based on externally measured data and behavior (black-box)

§ Guide benchmarking by good understanding of the whole system (white-box)§ Provide a feedback loop to hardware and software engineering

Open Source Benchmarking – Guiding Principles

“One can’t violate the laws of physics, but one can ’stretch’ them..”

MK MK

• Multi-Platform/-Vendor• Intel & ARM (WiP)

• Packet Throughput & Latency• Non-Drop & Partial Drop Rates

• Data Plane Workloads• FD.io VPP• DPDK L3fwd, Testpmd

• Scaling• Single-, Multi-Core• MACs, IPs, Flows, ACLs etc.

• Performance Test Suites (#s)• L2: 58 • L3 (IPv4 / IPv6): 63• VM vhostuser: 26• Containers memif: 10• Crypto: 13• SRv6: 3• Total: 173

FD.io CSIT-CPL

https://docs.fd.io/csit/rls1801/report/index.html

DPD

KFD

.io V

PP

0.00 5.00 10.00 15.00 20.00 25.00 30.00

10ge2p1x520-eth-l2xcbase-testpmd

10ge2p1x520-ethip4-ip4base-l3fwd

10ge2p1x520-eth-l2xcbase

10ge2p1x520-eth-l2bdbasemaclrn

10ge2p1x520-ethip4-ip4base

10ge2p1x520-ethip4-ip4scale2m

Mpps Throughput

1T/1C x520 Packet Throughput Tests *

18.01 17.10 17.07

* Selection of testcases from the FD.io CSIT 18.01, 17.10 and 17.07 reports

Per release test and performance reports

Breath (# of test cases), Depth (of measurement) and Repeatability (every release, repeatable locally)

Benchmarking Data and Public References:

MK MK

Newer data available !!

https://docs.fd.io/csit/rls1807/report/index.html

FD.io CSIT-18.07: Packet Throughput Results

Baseline

Source: https://docs.fd.io/csit/rls1807/report/

200k of /32hFIB prefixes

20k of /32hFIB prefixes

2M of /32hFIB prefixes

IPv4 Routing (ip4)

19 Mpps

17 Mpps

18 Mpps

16 Mpps

18 Mpps

14 Mpps

16 Mpps

12 Mpps

L2 Switching with MAC Learning (l2bd)

Baseline

100k of /48L2FIB MACs

10k of /48L2FIB MACs

1M of /48L2FIB MACs

MK MK

1M of /48L2FIB MACs

Source: https://docs.fd.io/csit/rls1807/report/

FD.io CSIT-18.07: Throughput Speedup Results

• Predictable performance• Linear scaling with cores• Follows Amdahl’s Law

Baseline200k of /32hFIB prefixes

20k of /32hFIB prefixes

2M of /32hFIB prefixes

Baseline 100k of /48L2FIB MACs

10k of /48L2FIB MACs

VPP Multi-Core SpeedupProperties:

* Capped by 14.88 Mpps 10GE 64B link rate limit

* ** *

* ** *

* *

*

**

VPP: Multi-Core Speedup Properties Source: https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf

• Predictable performance• Linear scaling with cores• Follows Amdahl’s Law

VPP Multi-Core SpeedupProperties:

Packet Vectors are Good for You !

[1] https://www.netgate.com/products/tnsr/[2] https://www.alibabacloud.com/blog/network-service-optimization-with-the-vpp-platform_593985

MK MK

Alibaba: Network Service Optimization with the VPP Platform

Netgate: TNSR, hw appliances

Netgate shipping product(s) [1] Alibaba [2]

Baremetal Data Plane Performance Limit FD.io benefits from increased Processor I/O

YESTERDAY

Intel® Xeon® E5-2699v422 Cores, 2.2 GHz, 55MB Cache

Network I/O: 160 GbpsCore ALU: 4-wide parallel µopsMemory: 4-channels 2400 MHzMax power: 145W (TDP)

1

2

3

4

Socket 0

BroadwellServer CPU

Socket 1

BroadwellServer CPU

2

DD

R4

QPI

QPI

4

2

DD

R4

DD

R4

PC

Ie PC

Ie

PC

Ie

x8 50GE

x16 100GE

x16 100GE

3

1

4

PC

Ie PC

Ie

x8 50GE

x16 100GE

Ethernet

1

3DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

SATA

B IOS

PCH

Intel® Xeon® Platinum 816824 Cores, 2.7 GHz, 33MB Cache

TODAY

Network I/O: 280 GbpsCore ALU: 5-wide parallel µopsMemory: 6-channels 2666 MHzMax power: 205W (TDP)

1

2

3

4

Socket 0

Skylake Server CPU

Socket 1

Skylake Server CPU

UPI

UPI

DDR4 DDR4

DDR4

PC

Ie

PC

Ie

PC

Ie PC

Ie

PC

Ie

PC

Ie

x8 50GE

x16 100GE

x8 50GE

x16 100GE

x16 100GE

SATA

B IOS

2

4

2

1

3

1

4

3

x8 50GE

DDR4

PC

Ie

x8 40GE

LewisburgPCH

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

DD

R4

0 200 400 600 800 1000 1200

160

280

320

560

640

Server[1 Socket]

Server[2 Sockets]

Server2x [2 Sockets]

+75%

+75%

PCle Packet Forwarding Rate [Gbps]

Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors

1,120*Gbps+75%

* On compute platforms with all PCIe lanes from the Processors routed to PCIe slots.

Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !

FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors

No Code Change Required

https://goo.gl/UtbaHy

MK MK

Internet Mega Trends – ..

Portability and Efficiency

Scalability and Self-Healing

Open Source Platforms

Cloud Native Designs

Software Defined Networking

Cloud

NFV SDN

Portability and Efficiency

Scalability and Self-Healing

Open Source Platforms

Cloud Native Designs

Software Defined Networking

Cloud

NFV SDN

Internet Mega Trends – Being Addressed ..

32

Scalability and Self-healing

Portability and Efficiency

Software Defined Networking

Open Source Platforms

Cloud Native Designs

SOFTWARE DEFINED NETWORKING

CLOUD NETWORK SERVICES

LINUX FOUNDATION

PORTABILITY AND EFFICIENCY

SCALABILITY and SELF-HEALING

MK MK

Cloud

NFV SDNInternet Mega Trends – Being Addressed ..

High Performance Cloud-Native NetworkingK8s Unleashing FD.io

THANK YOU !

MK MK GH

ReferencesFD.io VPP, CSIT and related projects

• VPP: https://wiki.fd.io/view/VPP

• CSIT-CPL: https://wiki.fd.io/view/CSIT

• pma_tools - https://wiki.fd.io/view/Pma_tools

Benchmarking Methodology

• Kubecon Dec-2017, Benchmarking and Analysis.., https://wiki.fd.io/view/File:Benchmarking-sw-data-planes-Dec5_2017.pdf

• “Benchmarking and Analysis of Software Network Data Planes” by M. Konstantynowicz, P. Lu, S.M. Shah, https://fd.io/resources/performance_analysis_sw_data_planes.pdf

Opportunities to ContributeWe invite you to Participate in FD.io• Get the Code, Build the Code, Run the

Code• Try the vpp user demo• Install vpp from binary packages

(yum/apt)• Read/Watch the Tutorials• Join the Mailing Lists• Join the IRC Channels• Explore the wiki• Join FD.io as a member

35

Thank you!