SDN and NFV Portfolio - Open...

Kazuki Hyoudou Fujitsu Laboratories Ltd.

NUMA-aware DPDK-OVS for High Performance NFV Platform

What is NFV?

Network Appliances

Proprietary Hardware

Vendor specific API/CLI

High CAPEX & OPEX

Need long term to update

Network Functions

Virtualization (NFV)

NW functions by software

Standard Hardware

Refer to http://portal.etsi.org/NFV/NFV_White_Paper2.pdf

Important aspects for NFV Platform

Use of commodity (non-proprietary) hardware

Standard I/F for VNF such as virtio

I/O throughputs

The number of I/O ports (Eth I/F)

The number of VNFs in a BOX

To increase VNFs and I/O per BOX …

I/O Controller

Core Core Core Core

VM VM VM VM

for VMs

for DPDK-OVS

• Assumptions

• a CPU has 8 cores and 3 I/O slots

• use 2 cores for physical port and

more 2 cores for vhost-user

• a VM occupies one core

• A BOX supports

• 4 VMs (VNFs)

• 3 NICs

IA server (BOX)

To increase VNFs and I/O per BOX …

One of the easiest way is to use multi-socket server, But …

I/O Controller

Core Core Core Core

VM VM VM VM

I/O Controller

Core Core Core Core

VM VM VM VM

for VMs for VMs

for DPDK-OVS for DPDK-OVS

IA server (BOX)

• Assumptions

• a CPU has 8 cores and 3 I/O slots

• use 2 cores for physical port and

more 2 cores for vhost-user

• a VM occupies one core

• A BOX supports

• 4 VMs 8 VMs

• 3 NICs 6 NICs

Performance Issue of OVS on NUMA system

If the VM is running on a NUMA node different from the node to which NIC

is connected, its throughput decreases significantly.

CPU1 CPU2 M

I/O Controller I/O Controller

Eth Port Eth Port

OVS w/ DPDK

pmd thread

for PHY

pmd thread

for vhost-user

by NIC

VRING (virtio)

Remote memory access

by PMD thread (CPU)

Pre-test to Study the Issue

Measure the total throughput of 3 VNFs (L3 forwarder w/o DPDK)

with 2 cases as follows;

CPU1 CPU2

NIC Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#4

pmd (PHY)

Core#5

pmd (VIF)

Core#6 pmd (VIF)

Core#7

CPU1 CPU2

NIC Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#4

pmd (PHY)

Core#5

pmd (VIF)

Core#6 pmd (VIF)

Core#7

Case1: All components are in the same NUMA node

(no flows go through QPI)

Case2: VMs are running on different node

(flows go through QPI by CPU)

QPI QPI

Core#3

Core#2

Core#1 VM#1

Core#11

Core#10

Core#9

Result of Pre-test

Throughput goes down 30 to 40%!

(Max. decreasing is 43.5% in this test)

% of Case1

pkt size Case1 Case2

64 100.0 58.3

72 100.0 57.1

128 100.0 56.5

256 100.0 60.6

512 100.0 64.4

768 100.0 66.7

1024 100.0 76.0

1280 100.0 84.7

1500 100.0 88.0

10121416182022

64 72 128 256 512 768 1024 1280 1500

put (G

bits/s

Packet Size (Bytes)

Max. Throughput (no drop condition)

• CPU: Xeon® E5-2667 v3 @3.20GHz, 8 Cores x2

• Memory: 128 GB (DDR4-2133 16G x4 x2)

• NIC: Intel® X710-DA2 (PCIe Gen3 x8)

• OS: CentOS 7.1

• OVS: v.2.4.1 w/ DPDK v.2.0.0

Analysis of the Issue

Consider the cause of the issue

QPI link has enough bandwidth (307 Gbps bi-directional bandwidth)

Remote memory access latency for copying data and lock operations are High!

• L3 cache: 10ns, Local memory: 70ns, Remote memory: 120ns

Measure performance counters concerning L3 cache miss event

Event Name Case1 Case2

MEM_LOAD_UOPS_RETIRED.L3_MISS 552,945 6,708,912

MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM 554,079 644,671

MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM 11 171,975

MEM_LOAD_UOPS_LLC_MISS_RETIRED.REMOTE_HITM 69 5,585,607

Removing remote memory access by CPU,

Is it expected to increase throughput?

Remote memory

access after L3 miss

Additional Test for Assumption

Removing remote memory access by CPU using DMA

CPU1 CPU2

Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#12

pmd (PHY)

Core#13

pmd (VIF)

Core#14 pmd (VIF)

Core#15

Case3: Only NIC is connected to the NUMA node

which is different from all software are running

(flows go through QPI by DMA) • Modify OVS 2.4.1 for this test as follows

• User can specify NUMA-node per vport

• mbuf pool is allocated on the specified node

• DMA buffer is mapped to the specified node

if the vport is a DPDK port (physical port)

• Receiving process of the vport is handled by

a thread running on a core of the specified node

Core#11

Core#10

Core#9

Remote memory access

by DMA

traffic

walkthrough of data to VM

Result of Additional Test

Case3 (flows go through QPI by DMA) achieves high-throughput

as same as case1 (no flows go through QPI) !

10121416182022

64 72 128 256 512 768 1024 1280 1500

Case1Case2Case3

put (G

bits/s

Packet Size (Bytes)

Max. Throughput (no drop condition) % of Case1

pkt size Case1 Case2 Case3

64 100.0 58.3 108.3

72 100.0 57.1 107.1

128 100.0 56.5 100.0

256 100.0 60.6 103.0

512 100.0 64.4 108.5

768 100.0 66.7 102.5

1024 100.0 76.0 104.2

1280 100.0 84.7 102.0

1500 100.0 88.0 100.0

Summary on Studying the Issue

"Flows go through QPI by CPU" (Case2) mainly influences

throughput degradation on NUMA system.

"Flows go through QPI by DMA" (Case3) achieves comparative

performance with "no flows go through QPI" (Case1).

Considering the case VMs are running on all NUMA-nodes,

DMA should write flows to the memory of the NUMA node

on which their destination VM is running

Trial Implementation for

NUMA-aware DPDK-OVS

Concept and Basic Design

Use multiple RXQs & MAC filter features of standard NIC

Each NUMA node has a RXQ (associated with a HwQ) on its own memory

Flows are forwarded to each node according to the destination MAC address by MAC filter

CPU1 CPU2

OVS w/ DPDK

I/O Controller

Eth Port

MAC Filter

I/O Controller

Port#1

pmd (PHY) pmd (PHY)

VM1 VM2

pmd (VIF) pmd (VIF)

vhost vhost

Memory Memory

Port#1

Packets for VM1 to RXQ0

Packets for VM2 to RXQ1

Trial Implementation w/ Intel® X710

Use VMDq and MAC-VLAN filter

X710 PF

VMDq#1

MAC-VLAN filter

Eth Port

OVS w/ DPDK

Port#1

map to

NUMA#0 map to

NUMA#1

• Dest. VSI can be specified corresponding

to dest. MAC addr w/ MAC-VLAN filter

• Packets w/ unregistered MAC addr go to

main VSI of PF

• It only needs to register dest. MAC to go to NUMA#1

• Broadcast frames can be handled w/o configuration

• Needs some modifications to DPDK

• Default DPDK can not configure queues of specific

VMDq, so we changed i40e PMD to specify the

number of queues per VSI including main VSI.

VSI: Virtual Station Interface (a set of queues)

Memory Allocation for NUMA

Allocation of mbuf pool (dpdk_mp)

Setup Rx queues

dpdk_mp = dpdk_mp_get(dev->socket_id, mtu);

dpdk_mp = dpdk_rte_mzalloc(sizeof(struct dpdk_mp) * n_numas);

for (numa_id = 0; numa_id < n_numas; numa_id++) {

dpdk_mp[numa_id] = dpdk_mp_get(numa_id, mtu);

n_rxqs = MIN(max_rx_queue, n_dpdk_rxqs);

for (i = 0; i < n_rxqs; i++) {

rte_eth_rx_queue_setup(port_id, i, RX_Q_SIZE,

dev->socket_id, NULL,

dev->dpdk_mp->mp );

n_rxq_per_vsi = MIN(max_rx_queue_per_pool, n_dpdk_rxqs);

n_rxqs = n_rxq_per_vsi * n_numas;

for (qid = 0; qid < n_rxqs; qid++) {

numa_id = dpdk_get_numa_id_by_qindex(dev, qid);

rte_eth_rx_queue_setup(port_id, qid, RX_Q_SIZE,

numa_id, NULL,

dev->dpdk_mp[numa_id]->mp );

Current OVS allocates single mbuf pool for a dpdk

port with socket_id (numa id) of the device

We change it to allocate and hold multiple mbuf pools for

each NUMA node for a dpdk port

Current OVS sets up all rx queues for a port on the

same NUMA node specified by dev->socket_id

We change it to setup each Rx queue on the NUMA node

specified by numa_id associated with the queue index

MAC Address Registration

Develop new utility: ovs-numactl

issue command to add / del MAC-NUMA info to bridge

Add NUMA control features to bridge

manage the MAC-NUMA table by receiving command

from ovs-numactl

set MAC-NUMA info in the table to all dpdk-ports

Add API to netdev-dpdk

set MAC-NUMA info to specified dpdk-port

use rte_eth_dev_mac_addr_add() API to register

MAC address to MAC-VLAN filter of PF

vswitchd

ovs-numactl

bridge MAC-NUMA

vport vport

Netdev-

ethdev

PMD PMD

PF PF NIC

Evaluation

Measure the total throughput of 6 VNFs (L3 forwarder w/o DPDK)

Traffics are inputted from a port of each NIC (maximum 20 Gbps)

CPU1 CPU2

Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#12

pmd (PHY)

Core#13

pmd (VIF)

Core#14 pmd (VIF)

Core#15

Core#11

Core#10

Core#9

Eth Port Eth Port

pmd (PHY)

Core#4

pmd (PHY)

Core#5

pmd (VIF)

Core#6 pmd (VIF)

Core#7

Core#3

Core#2

Core#1

CPU1 CPU2

Eth Port Eth Port

OVS w/ DPDK

pmd (VIF)

Core#14 pmd (VIF)

Core#15

Core#11

Core#10

Core#9

Eth Port Eth Port

pmd (PHY)

Core#4

pmd (VIF)

Core#6 pmd (VIF)

Core#7

Core#3

Core#2

Core#1

Standard OVS: A dpdk port is mapped to only the

NUMA node to which its NIC is connected

Our Approach: Each dpdk port has multple queues

mapped to each NUMA node respectively

pmd (PHY)

Core#12

Result

Try 10 times of max rate search at no drop condition for each size

Our approach achieved over 2.5 times throughput in this test

64 72 128 256 512 768 1024 1280 1500

Std. OVS

Numa-aware OVS

64 72 128 256 512 768 1024 1280 1500

Std. OVS

Numa-aware OVS

bits/s

Packet Size (Bytes) T

bits/s

Packet Size (Bytes)

Maximum Throughputs in 10 results Median Throughputs of 10 results

Conclusions

We introduced a concept and trial implementation of NUMA-aware

DPDK-OVS

Improvement throughput by reducing remote memory access by CPU

Use multiple RXQs and MAC filter to forward received packets directly to the

memory of the NUMA node on which its destination VM is running

• Each NUMA node has a RXQ associated with HwQ on its own memory, and

• NICs can write received packets directly to there by DMA according dest. MAC address

• Implement on trial with Intel® X710 (use VMDq and MAC-VLAN filter)

We observed significant performance improvement in our approach

Conclusions

Limitations

NIC is limited to be enabled in our approach

a few MAC address can be registered (depends on NIC specification)

only destination MAC address is used to specify destination NUMA node

It is difficult to support a BOX using tunnel protocol in current standard NICs

Need investigations to extend the applicable cases

Thank You! hyoudou.kazuki@jp.fujitsu.com

BACKUP

Evaluation (Bi-direction, input from 4 ports)

Measure the total bi-directional throughput of 4 VNFs (dpdk l2fwd)

Each CPU has a dual 10GbE NIC (X710-DA2), four 10GbE ports are used for this test

CPU1 CPU2

Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#12

pmd (PHY)

Core#13

pmd (VIF)

Core#14 pmd (VIF)

Core#15

Core#10 VM#3

Core#9

Eth Port Eth Port

pmd (PHY)

Core#4

pmd (PHY)

Core#5

pmd (VIF)

Core#6 pmd (VIF)

Core#7

Core#2

Core#1

CPU1 CPU2

Eth Port Eth Port

OVS w/ DPDK

pmd (PHY)

Core#12

pmd (PHY)

Core#13

pmd (VIF)

Core#14 pmd (VIF)

Core#15

Core#10

Core#9

Eth Port Eth Port

pmd (PHY)

Core#4

pmd (PHY)

Core#5

pmd (VIF)

Core#6 pmd (VIF)

Core#7

Core#2

Core#1

Standard OVS: A dpdk port is mapped to only the

NUMA node to which its NIC is connected

Our Approach: Each dpdk port has multple queues

mapped to each NUMA node respectively

Result (Bi-direction)

Our approach achieved up to 2 times throughput compared with Std. OVS.

64 72 128 256 512 768 1024 1280 1500

Std. OVS

Numa-aware OVS

bits/s

Packet Size (Bytes)

Median Throughputs of 10 results

64 72 128 256 512 768 1024 1280 1500

Std. OVS

Numa-aware OVS

bits/s

Packet Size (Bytes)

Maximum Throughputs in 10 results

vhost-user NUMA Awareness in OVS 2.6

Memory for structure & mbuf pool for vhost can be allocated on NUMA

node on which connected VM is running

Refer to

https://software.intel.com/en-us/articles/vhost-user-numa-awareness-in-open-vswitch-with-dpdk?language=ru

Example of Connection Model for Our Products

All VNFs need to transmit / receive to / from all physical ports

Intranet Internet

LAG Redundancy

SDN and NFV Portfolio - Open...

Documents

Transcript of SDN and NFV Portfolio - Open...

Effects of Aggregate, Water/Cement Ratio, and Curing on the …onlinepubs.trb.org/Onlinepubs/trr/1992/1335/1335-007.pdf · 44 TRANSPORTATION RESEARCH RECORD 1335 Effects of Aggregate,

1818172269-20170323081128 · Geprüft nach Tested acc. to DIN EN 1335-1/08.02 DIN EN 1335-2/01.10 DIN EN 1335-3/08.09 ÄfPS GS 2014:01 Genehmigungsinhaber License Holder König +

Issue No. 1335 5 October 2018

ECI 1319 EQI 1331 EBI 1335 - HEIDENHAIN · 2018-12-04 · 6 Product Information ECN 1319, EQI 1331, EBI 1335 09/2018 ECN 1319 EQN 1331 EBI 1335 Screws1) Lot size Central screw for

DECRETO 1335 de 1987 (Seguridad)

64 - Apres Furniture€¦ · BS EN 1335-1:2000 (Part 1 Dimensions) BS EN 1335-2:2000 (Part 2 Safety requirements) BS EN 1335-3:2000 (Part 3 Safety Test Methods) BS EN ISO 5459:2000

Avis n° 10-A-21 du 19 novembre 2010 relatif à la …. 1335-8-1 [les DASRIP-PAT], conformément aux dispositions des articles R. 1335-6 et R. 1335-7. À défaut de dispositif de collecte

Codelist - support portalsupportportal.ruwido.com/.../easytip//easytip-m-series-codelist.pdf · contec 1243 1245 1249 1278 ... 1131 1203 1238 1260 1335 indiana 1335 infinity 1335

DIN EN 1335-1 2002-08

MF 1335 Housing Element 2021-2029 Cycle 6 RFP...T:\Community Development\Master Files\MF 1335 Housing Element 2021-2029 Cycle 6\MF 1335 Housing Element 2021-2029 Cycle 6 RFP.doc BACKGROUND

Martynas Pumputis - Open vSwitchopenvswitch.org/support/ovscon2016/8/0935-pumputis.pdf · Martynas Pumputis Weaveworks OVS for Containers with Weave Net. 2 ... iperf3 on AWS c3.8xlarge,

City of Kalama Ordinance 1335 - Wa

ECI 1319 EQI 1331 EBI 1335 - HEIDENHAIN...Product Information ECN 1319, EQI 1331, EBI 1335 09/2018 3 Specifications ECI 1319 – Singleturn EQI 1331 – Multiturn EBI 1335 – Multiturn

TECHNICAL DATA SHEET MT 1335 Manitou Telescopic

1335 N Astor St 7A Marketing Brochure

MyCatalog glasstec Special Glass Ltd. 1334 Jade Hardware Company Limited 1335 Jiangmen Jiangyi Industrial Co., Ltd. 1335

Fulvio Risso, - Open vSwitchopenvswitch.org/support/ovscon2016/7/1245-bertrone.pdf · datacenter-wise orchestration –Runs with different cloud management systems (OpenStack, Mesos,

Iaea Tecdoc 1335

1335 Dr Donald a Bruce Grouting

OVS Hardware offloads Discussion Panel - Open vSwitchopenvswitch.org/support/ovscon2016/7/1450-stringer.pdfDiscussion Panel Joe Stringer VMware OVS caching model ofproto datapath First