High Performance Cloud-native Networking · 2019-12-21 · High Performance Cloud-native Networking...
Transcript of High Performance Cloud-native Networking · 2019-12-21 · High Performance Cloud-native Networking...
High Performance Cloud-native NetworkingK8s Unleashing FD.io
Giles Heron
[email protected]@cisco.com
Maciek KonstantynowiczPrincipal Engineer, Cisco FD.io CSIT Project Lead
Distinguished Engineer, Cisco
Jerome Tollet
[email protected] Engineer, Cisco
DISCLAIMERs• 'Mileage May Vary'
• Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your opinion and investment of any resources. For more complete information about open source performance and benchmark results referred in this material, visit https://wiki.fd.io/view/CSIT and/or https://docs.fd.io/csit/rls1807/report/.
• Trademarks and Branding• This is an open-source material. Commercial names and brands may be claimed as the property of others.
Internet Mega Trends – ..
Portability and Efficiency
Scalability and Self-Healing
Open Source Platforms
Cloud Native Designs
Software Defined Networking
Cloud
NFV SDN
THE SOFTWARE DEFINED OPERATOR
12-NOV-2013© Deutsche Telekom AG, 2013 4
NFV SDN
IPDO YOU REMEMBER .. ?
5 Pillars of Next Generation Software Data Planes
Blazingly Fast• Process the massive explosion of East-West traffic• Process increasing North-South traffic
Truly Extensible• Foster pace of innovation in cloud-native networking• No compromise on performance (zero-tolerance)
Software First• Cloud means running everywhere• Cloud means hardware and physical infra agnostic
Predictable performance• Dataplane performance must be deterministic• Predictable for a number of VMs, Containers,
virtual topology and (E-W, N-S) traffic matrix
FD.io VPP meets these challengesHow can one use it in large scale Cloud-native networks ?
Measureable• Counters everywhere to count everything for detailed
cross-layer operation and efficiency monitoring• Enables feedback loop to drive optimizations
The Way Networks are Deployed and Used… has Changed…
Internet Data-CenterCorporate LAN/WAN”80:20 rule”
LAN WAN
Intranets & Internet
LAN WAN
Internet
WWW
SD-WAN & “BeyondCorp”
WiFi Internet
CloudProvider
VLAN 1 Core
Spine/Leaves & L3 Core/L2 AccessTiered Transit & Private Peering
Tier1 –A Tier1 – B Tier1 – C
Tier2 – D Tier2 – E Tier2 – F
Internet exchanges & public peering
Network 1 Network 2 Network 3
Telco/Cable Access &, OTT/CDN Content
Telco 1 Telco 2 MSO 1
Cloud Provider 1 Global Backbone
Core/Distr/Access, VLAN based
VLAN 1VLAN 2VLAN n
VLAN 2 VLAN 2’Tim
e
Cloud Provider 2 Global Backbone
L3 Fabric/SW Overlay & Virt Networking
Layer 3 Fabric
Aside: A Trip Down Memory Lane(Transporting Data vs. Processing Data)
• Year 2012• Internet service provider comment at IETF: processing bits is cheaper than
transporting bits, computing and networking - networking is becoming 1st order citizen on compute platforms.
• Year 2013• RIPE67 Terastream been fixing the cost of transporting bits - 96 of 100GE
coherent lambdas per fibre span - transporting is getting cheaper, so challenging the compute part again
• more bandwidth delivered to Data Centres• Most/all network services in Data Dentres
https://ripe67.ripe.net/presentations/131-ripe2-2.pdf
https://ripe67.ripe.net/archives/video/3/ NFV SDN
IP
MK MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.
Remember 1965 ”Moore’s Law” – ..
Mechanical
MK
Popularpre 1900
MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit
18001600 1700
BooleanLogic
BinaryCode
1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.
Remember 1965 ”Moore’s Law” – ..
Mechanical
MK
Popularpre 1900
MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit
18001600 1700
BooleanLogic
BinaryCode
1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.
Remember 1965 ”Moore’s Law” – Is It Still Applicable?
Mechanical
MK
Popularpre 1900
MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit
18001600 1700
BooleanLogic
BinaryCode
1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Popularin 2017
Xeon®Skylake
ModernµProcessor
2010 2020Era of ModernµProcessors
Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.
Remember 1965 ”Moore’s Law” – Is It Still Applicable?
Mechanical
MK
Popularpre 1900
MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit
18001600 1700
BooleanLogic
BinaryCode
1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Popularin 2017
Xeon®Skylake
ModernµProcessor
2010 2020Era of ModernµProcessors
Remember 1965 ”Moore’s Law” – Yes, It Surely Is ..
Source: Ray Kurzweil, "The Singularity Is Near: When Humans Transcend Biology", Page 67, The Duckworth Publishers 2009. Data points between 1600 and 1900, and after 2000 represent Maciek's perspective and approximations.
…
“Ramble On ..”
Resources to Get Performance1) Processor and CPU cores
for performing packet processing operations_ __
2) Memory bandwidthfor moving data (packets, lookup) and instructions (packet processing)
3) I/O bandwidthfor moving packets to/from NIC interfaces _
4) Inter-socket bandwidthfor handling inter-socket operations _
Processing Packets: How to Use Compute ..
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
PCIe
DDR4 DDR4
DDR4
SATA
B IOS
1
4
3
2
3
1
2
DDR4
DMI
PCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
PCIe
x16
PCIe
x16
PCIe
x16
PCIe
x16PC
Iex16
PCIe
x16
UPI
280 Gbps@ 1,000Bytes ETH frames
100GE 100GE100GE100GE 100GE100GE
280 Gbps@ 1,000Bytes ETH frames
Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]
CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI
∗ #STLURHFGHIJKLIMNG
Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=
]SOZ_[JR\[]^STLURH__RJ_OPLQRI
1
2
3
4
MK MK
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
PCIe
DDR4 DDR4
DDR4
SATA
B IOS
1
4
3
2
3
1
2
DDR4
DMI
PCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
PCIe
x16
PCIe
x16
PCIe
x16
PCIe
x16PC
Iex16
PCIe
x16
UPI
280 Gbps@ 1,000Bytes ETH frames
100GE 100GE100GE100GE 100GE100GE
280 Gbps@ 1,000Bytes ETH frames
Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]
CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI
∗ #STLURHFGHIJKLIMNG
Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=
]SOZ_[JR\[]^STLURH__RJ_OPLQRIMoore’s Law in Action
Resources to Get Performance1) Processor and CPU cores
FrontEnd: faster instr. decoder (4- to 5-wide)BackEnd: faster L1 cache, bigger L2 cache, deeper OOO* executionUncore: move from ring to X-Y fabric mesh
2) Memory bandwidth~50% increase: channels (4 to 6), speed (DDR-2666)
3) I/O bandwidth>50% increase: PCIe lanes (40 to 48), re-designed IO blocks
4) Inter-socket bandwidth~60% increase: QPI to UPI (2x to 3x), interface speed (9.6 to 10.4 GigTrans/sec)
1
2
3
4
Processing Packets: What Improves in Compute ..
MK MK
Packet Processing Software Platform• High performance• Linux user space• Runs on compute CPUs: � And “knows” how to run them well !
Shipping at volume in server & embedded products
16
Packet Processing
Dataplane Management Agent
Network IO
Bare-Metal / VM / Container
FD.io VPP – Vector Packet ProcessingCompute-Optimised SW Networking Platform
FD.io VPP – The “Magic” of VectorsCompute Optimized SW Network Platform
1Packet processing is decomposedinto a directed graph of nodes …
Packet 0
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Packet 7
Packet 8
Packet 9
Packet 10
… packets move through graph nodes in vector …2
Microprocessor
… graph nodes are optimized to fit inside the instruction cache …
… packets are pre-fetched into the data cache.
Instruction Cache3
Data Cache4
3
4
Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.
vhost-user-input
af-packet-input dpdk-input
ip4-lookup-mulitcast ip4-lookup*
ethernet-input
mpls-inputlldp-input
arp-inputcdp-input...-no-
checksum
ip6-inputl2-input ip4-input
ip4-load-balance
mpls-policy-encap
ip4-rewrite-transit
ip4-midchain
interface-output
* Each graph node implements a “micro-NF”, a “micro-NetworkFunction” processing packets.
MK MK
Contiv NetmasterSFC
Controller
Kubernetes API Proxies
Service Policy Service Topology Lifecycle
KubeletCNI
CRIContiv Netmaster
Production-Grade Container Orchestration
Network Function and Network Topology Orchestration
Containerized Network Data Plane
ContainerNetwork Function CNF
Agent Agent Agent
FD.io VPPContainer Switch
Agent
Container Networking
Networking Plugin
CNF
LIGATOContiv
Cloud-native Network Micro-ServicesFor Native Cloud Network Services
Production-GradeContainer Orchestration
Cloud-native Network FunctionOrchestration
Containerized FastData Input/ Output
Performance-CentricContainer Networking
Enabling Production-Grade Native Cloud Network Services at Scale
MK GH
Kubelet
CNI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPodPod
VPP …
K8s Master
Data Centre Fabric
App
Kernel Host stack
Legacy Apps K8s State Reflector
Contiv-VPPEtcd
Kubelet
CNI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPodPod
VPP
App
Kernel Host stack
High PerformanceApps
PodPodPod
Envoy Sidecar App
VPP TCPStack
PodPodPod
High PerformanceApps
Envoy SidecarApp
VPP TCPStack
memif
Legacy Apps
PodPodPod
CNF
memif
Cloud-Native NFs
PodPodPod
CNF
Cloud-Native NFs
K8s policy & state distribution
Contiv-VPP Architecture
GH GH
TopologyPlacement
(K8s)Rendering
NF1 NF2 NF3
Logical RepresentationIngress Network
Ingress Classifier
Egress Network
Egress Classifier
IngressRouter
EgressRouter
Host
VPP Vswitch
CNF
VPP
10.1.0.127
…
CNF1
VPP
…
CNF2
VPP
…
…Server
Vswitch VPP
CNF
VPP
…
CNF
VPP
…
CNF3
VPP
…
…
Overlay Tunnel
Physical Representation
Overlay Tunnel Overlay Tunnel
Ingress Classifier Egress Classifier
Service Function Chaining with Ligato
GH GH
Kubelet
CNICRI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPodPodPodPod
Pod
VPP
Data Centre Fabric
High PerformanceApps
Sidecar Proxy App
VPP TCPStack
App
Kernel Host stack
Legacy Apps
• Kubernetes does not provide a way to stitch micro-services together today• Ligato enables you to wire the data plane together into a service topology• Network functions can now become part of the service topology• Dedicated Telemetry Engine in VPP to enable closed-loop control• Offload functions to NIC but via vSwitch in host memory
Contiv-VPP Etcd
K8s Master
Contiv-VPPNetmaster
DefineTopology
LigatoController
DefineServices
DefineTopology
Ligato – Cloud-native NFs (CNFs)
Smarts in NICs
TelemetryEngine
Back Propagation LoopFor Reactive Placement/Rsrc Mgmt
PodPodPod
memif
Cloud-Native VNFs
Agent
VPP
GH GH
§ Discover the limits and know them
§ Assess based on externally measured data and behavior (black-box)
§ Guide benchmarking by good understanding of the whole system (white-box)§ Provide a feedback loop to hardware and software engineering
Open Source Benchmarking – Guiding Principles
“One can’t violate the laws of physics, but one can ’stretch’ them..”
MK MK
• Multi-Platform/-Vendor• Intel & ARM (WiP)
• Packet Throughput & Latency• Non-Drop & Partial Drop Rates
• Data Plane Workloads• FD.io VPP• DPDK L3fwd, Testpmd
• Scaling• Single-, Multi-Core• MACs, IPs, Flows, ACLs etc.
• Performance Test Suites (#s)• L2: 58 • L3 (IPv4 / IPv6): 63• VM vhostuser: 26• Containers memif: 10• Crypto: 13• SRv6: 3• Total: 173
FD.io CSIT-CPL
https://docs.fd.io/csit/rls1801/report/index.html
DPD
KFD
.io V
PP
0.00 5.00 10.00 15.00 20.00 25.00 30.00
10ge2p1x520-eth-l2xcbase-testpmd
10ge2p1x520-ethip4-ip4base-l3fwd
10ge2p1x520-eth-l2xcbase
10ge2p1x520-eth-l2bdbasemaclrn
10ge2p1x520-ethip4-ip4base
10ge2p1x520-ethip4-ip4scale2m
Mpps Throughput
1T/1C x520 Packet Throughput Tests *
18.01 17.10 17.07
* Selection of testcases from the FD.io CSIT 18.01, 17.10 and 17.07 reports
Per release test and performance reports
Breath (# of test cases), Depth (of measurement) and Repeatability (every release, repeatable locally)
Benchmarking Data and Public References:
MK MK
Newer data available !!
https://docs.fd.io/csit/rls1807/report/index.html
FD.io CSIT-18.07: Packet Throughput Results
Baseline
Source: https://docs.fd.io/csit/rls1807/report/
200k of /32hFIB prefixes
20k of /32hFIB prefixes
2M of /32hFIB prefixes
IPv4 Routing (ip4)
19 Mpps
17 Mpps
18 Mpps
16 Mpps
18 Mpps
14 Mpps
16 Mpps
12 Mpps
L2 Switching with MAC Learning (l2bd)
Baseline
100k of /48L2FIB MACs
10k of /48L2FIB MACs
1M of /48L2FIB MACs
MK MK
1M of /48L2FIB MACs
Source: https://docs.fd.io/csit/rls1807/report/
FD.io CSIT-18.07: Throughput Speedup Results
• Predictable performance• Linear scaling with cores• Follows Amdahl’s Law
Baseline200k of /32hFIB prefixes
20k of /32hFIB prefixes
2M of /32hFIB prefixes
Baseline 100k of /48L2FIB MACs
10k of /48L2FIB MACs
VPP Multi-Core SpeedupProperties:
* Capped by 14.88 Mpps 10GE 64B link rate limit
* ** *
* ** *
* *
*
**
VPP: Multi-Core Speedup Properties Source: https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf
• Predictable performance• Linear scaling with cores• Follows Amdahl’s Law
VPP Multi-Core SpeedupProperties:
Packet Vectors are Good for You !
[1] https://www.netgate.com/products/tnsr/[2] https://www.alibabacloud.com/blog/network-service-optimization-with-the-vpp-platform_593985
MK MK
Alibaba: Network Service Optimization with the VPP Platform
Netgate: TNSR, hw appliances
Netgate shipping product(s) [1] Alibaba [2]
…
Baremetal Data Plane Performance Limit FD.io benefits from increased Processor I/O
YESTERDAY
Intel® Xeon® E5-2699v422 Cores, 2.2 GHz, 55MB Cache
Network I/O: 160 GbpsCore ALU: 4-wide parallel µopsMemory: 4-channels 2400 MHzMax power: 145W (TDP)
1
2
3
4
Socket 0
BroadwellServer CPU
Socket 1
BroadwellServer CPU
2
DD
R4
QPI
QPI
4
2
DD
R4
DD
R4
PC
Ie PC
Ie
PC
Ie
x8 50GE
x16 100GE
x16 100GE
3
1
4
PC
Ie PC
Ie
x8 50GE
x16 100GE
Ethernet
1
3DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
SATA
B IOS
PCH
Intel® Xeon® Platinum 816824 Cores, 2.7 GHz, 33MB Cache
TODAY
Network I/O: 280 GbpsCore ALU: 5-wide parallel µopsMemory: 6-channels 2666 MHzMax power: 205W (TDP)
1
2
3
4
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
DDR4 DDR4
DDR4
PC
Ie
PC
Ie
PC
Ie PC
Ie
PC
Ie
PC
Ie
x8 50GE
x16 100GE
x8 50GE
x16 100GE
x16 100GE
SATA
B IOS
2
4
2
1
3
1
4
3
x8 50GE
DDR4
PC
Ie
x8 40GE
LewisburgPCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
0 200 400 600 800 1000 1200
160
280
320
560
640
Server[1 Socket]
Server[2 Sockets]
Server2x [2 Sockets]
+75%
+75%
PCle Packet Forwarding Rate [Gbps]
Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors
1,120*Gbps+75%
* On compute platforms with all PCIe lanes from the Processors routed to PCIe slots.
Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !
FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors
No Code Change Required
https://goo.gl/UtbaHy
MK MK
Internet Mega Trends – ..
Portability and Efficiency
Scalability and Self-Healing
Open Source Platforms
Cloud Native Designs
Software Defined Networking
Cloud
NFV SDN
Portability and Efficiency
Scalability and Self-Healing
Open Source Platforms
Cloud Native Designs
Software Defined Networking
Cloud
NFV SDN
Internet Mega Trends – Being Addressed ..
32
Scalability and Self-healing
Portability and Efficiency
Software Defined Networking
Open Source Platforms
Cloud Native Designs
SOFTWARE DEFINED NETWORKING
CLOUD NETWORK SERVICES
LINUX FOUNDATION
PORTABILITY AND EFFICIENCY
SCALABILITY and SELF-HEALING
MK MK
Cloud
NFV SDNInternet Mega Trends – Being Addressed ..
ReferencesFD.io VPP, CSIT and related projects
• VPP: https://wiki.fd.io/view/VPP
• CSIT-CPL: https://wiki.fd.io/view/CSIT
• pma_tools - https://wiki.fd.io/view/Pma_tools
Benchmarking Methodology
• Kubecon Dec-2017, Benchmarking and Analysis.., https://wiki.fd.io/view/File:Benchmarking-sw-data-planes-Dec5_2017.pdf
• “Benchmarking and Analysis of Software Network Data Planes” by M. Konstantynowicz, P. Lu, S.M. Shah, https://fd.io/resources/performance_analysis_sw_data_planes.pdf
Opportunities to ContributeWe invite you to Participate in FD.io• Get the Code, Build the Code, Run the
Code• Try the vpp user demo• Install vpp from binary packages
(yum/apt)• Read/Watch the Tutorials• Join the Mailing Lists• Join the IRC Channels• Explore the wiki• Join FD.io as a member
35
Thank you!