Is OpenStack Neutron production ready for large scale deployments?

54
Copyright © 2016 Mirantis, Inc. All rights reserved www.mirantis.com Is OpenStack Neutron production ready for large scale deployments? Oleg Bondarev, Senior Software Engineer, Mirantis Elena Ezhova, Software Engineer, Mirantis

Transcript of Is OpenStack Neutron production ready for large scale deployments?

Page 1: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

www.mirantis.com

Is OpenStack Neutron production ready for large scale deployments?Oleg Bondarev, Senior Software Engineer, MirantisElena Ezhova, Software Engineer, Mirantis

Page 2: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Why are we here?

“We've learned from experience that the truth will come out.”

Richard Feynman

Page 3: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Key highlights (Spoilers!)

Mitaka-based OpenStack deployed by Fuel

2 hardware labs were used for testing

378 nodes was the size of the largest lab

Line-rate throughput was achieved

Over 24500 VMs were launched on a 200-node lab

...and yes, Neutron works at scale!

Page 4: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Agenda

Labs overview & toolsTesting methodologyResults and analysisIssuesOutcomes

Page 5: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Deployment description

Mirantis OpenStack with Mitaka-based NeutronML2 OVSVxLAN/L2 POPDVRrootwrap-daemon ONovsdb native interface OFFofctl native interface OFF

Page 6: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Environment description. 200 node lab3 controllers, 196 computes, 1 node for Grafana/Prometheus

CPU2x CPU Intel Xeon E5-2650v3,Socket 2011, 2.3 GHz, 25MB Cache, 10 core, 105 W

RAM 8x 16GB Samsung M393A2G40DB0-CPB DDR-IV PC4-2133P ECC Reg. CL13

Network

2x Intel Corporation I350 Gigabit Network Connection (public network)2x Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

Controllers Computes

CPU1x INTEL XEON Ivy Bridge 6C E5-2620 V2 2.1G 15M 7.2GT/s QPI 80w SOCKET 2011R 1600

RAM4x Samsung DDRIII 8GB DDR3-1866 1Rx4 ECC REG RoHS M393B1G70QH0-CMA

Network

1x AOC-STGN-i2S - 2-port 10 Gigabit Ethernet SFP+

Page 7: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Environment description. 378 node lab3 controllers, 375 computes

Model Dell PowerEdge R63

CPU 2x Intel, E5-2680 v3, 2.5 GHz, 12 core

RAM 256 GB RAM, Samsung, M393A2G40DB0-CPB

Network 2x Intel X710 Dual Port, 10-Gigabit

Storage

3.6 TB, SSD, raid1 - Dell, PERC H730P Mini, 2 disks Intel S3610

Model Lenovo RD550-1U

CPU 2x E5-2680v3, 12-core CPUs

RAM 256GB RAM

Network 2x Intel X710 Dual Port, 10-Gigabit

Storage

2x Intel S3610 800GB SSD2x DP and 3Yr Standard Support 23 176 RD650-2

Page 8: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Tools

Control plane testingRallyData plane testingShakerDensity testingHeatCustom (ancillary) scriptsSystem resource monitoringGrafana/PrometheusAdditionallyEyes, hands, 6th sense

Page 9: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Integrity test

Control group of resources that must stay persistent no matter what other operations are performed on the cluster.2 server groups of 10 instances2 subnets connected by routerConnectivity checks by floating

IPs and fixed IPsChecks are run between other

tests to ensure dataplane operability

Page 10: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Integrity test

● From fixed IP to fixed IP in the same subnet

● From fixed IP to fixed IP in different subnets

Page 11: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Integrity test

● From floating IP to floating IP

● From fixed IP to floating IP

Page 12: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally control plane tests

Basic Neutron test suiteTests with increased number of iterations and

concurrencyNeutron scale test with many servers/networks

Page 13: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally basic Neutron test suite

create_and_update_ create_and_list_ create_and_delete_● floating_ips● networks● subnets● security_groups● routers● ports

Verify that cloud is healthy, Neutron services up and running

Page 14: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Concurrency 50-100Iterations 2000-5000

API testscreate-and-list-networkscreate-and-list-portscreate-and-list-routerscreate-and-list-security-groupscreate-and-list-subnets

Boot VMs testsboot-and-list-serverboot-and-delete-server-with-secgroupsboot-runcommand-delete

Page 15: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

All test runs were successful, no errors.

Results on Lab 378 slightly better than on Lab 200.

API testscreate-and-list-networkscreate-and-list-portscreate-and-list-routerscreate-and-list-security-groupscreate-and-list-subnets

Boot VMs testsboot-and-list-serverboot-and-delete-server-with-secgroupsboot-runcommand-delete

Scenario Iterations/

Concurrency

Time

Lab 200 Lab 378

create-and-list-routers

2000/50 avg 15.59max 29.00

avg 12.942max 19.398

create-and-list-subnets

2000/50 avg 25.973max 64.553

avg 17.415max 50.41

Page 16: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

First run on Lab 200:● 7.75% failures,

concurrency 100● 1.75% failures,

concurrency 15

Fixes applied on Lab 378:● 0% failures, concurrency

100● 0% failures, concurrency

50

API testscreate-and-list-networkscreate-and-list-portscreate-and-list-routerscreate-and-list-security-groupscreate-and-list-subnets

Boot VMs testsboot-and-list-serverboot-and-delete-server-with-secgroupsboot-runcommand-delete

Page 17: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Trendscreate_and_list_networks

● create - slow linear growth● list - linear growth

Page 18: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

create_and_list_networks trends

create network

list networks

Page 19: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Trendscreate_and_list_networks

● create - stable● list - linear growth

create_and_list_routers● create - stable● list - linear growth (6.5 times in 2000 iterations)

Page 20: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

create_and_list_routers trends

create router

list routers

Page 21: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Trendscreate_and_list_networks

● create - stable● list - linear growth

create_and_list_routers● create - stable● list - linear growth (6.5 times in 2000 iterations)

create_and_list_subnets● create - slow linear growth● list - linear growth (20 times in 2000 iterations)

Page 22: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

create_and_list_subnets trends

create subnet

list subnets

Page 23: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Trendscreate_and_list_networks

● create - stable● list - linear growth

create_and_list_routers● create - stable● list - linear growth (6.5 times in 2000 iterations)

create_and_list_subnets● create - low linear growth● list - linear growth (20 times in 2000 iterations)

create_and_list_ports● gradual growth

Page 24: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

create_and_list_ports trends

average load

Page 25: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally high load tests, increased iterations/concurrency

Trendscreate_and_list_networks

● create - stable● list - linear growth

create_and_list_routers● create - stable● list - linear growth (6.5 times in 2000 iterations)

create_and_list_subnets● create - low linear growth● list - linear growth (20 times in 2000 iterations)

create_and_list_ports● gradual growth

create_and_list_secgroups● create 10 sec groups - stable, with peaks● list - rapid growth rate by 17.2 times

Page 26: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

create_and_list_secgroups trendscreate 10 security groups

list security groups

Page 27: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally scale with many networks

100 networks per iteration1 VM per networkIterations 20, concurrency 3

Page 28: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Rally scale with many VMs

1 network per iteration100 VMs per networkIterations 20, concurrency 3

Page 29: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Architecture

Shaker is a distributed data-plane testing tool for OpenStack.

Page 30: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: L2 scenario

Tests the bandwidth between pairs of instances on different nodes in the same virtual network.

Page 31: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: L3 East-West scenario

Tests the bandwidth between pairs of instances on different nodes deployed in different virtual networks plugged into the same router.

Page 32: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: L3 North-South scenario

Tests the bandwidth between pairs of instances on different nodes deployed in different virtual networks.

Page 33: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 200, MTU 1500

Standard configurationBi-directional L3 East-West

scenario:● 561 Mbits/sec

upload, 528 Mbits/sec download

Intel 82599ES 10-Gigabit

Page 34: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 200, MTU 9000

Enabled jumbo framesBi-directional L3 East-West

scenario:● 3615 Mbits/sec

upload, 3844 Mbits/sec download

x7 increase in throughput

Intel 82599ES 10-Gigabit

Page 35: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-West Bi-directional test

HW offloads-capable NICHardware offloads boost

with small MTU (1500):● x3.5 throughput

increase in bi-directional test

Increasing MTU from 1500 to 9000 also gives a significant boost:● 75% throughput

increase in bi-directional test (offloads on)

Intel X710 Dual Port 10-Gigabit

Page 36: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-West Download test

HW offloads-capable NICHardware offloads boost

with small MTU (1500):● x2.5 throughput

increase in downloadIncreasing MTU from 1500 to

9000 also gives a significant boost:● 41% throughput

increase in download test (offloads on)

Intel X710 Dual Port 10-Gigabit

Page 37: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-West Download test

Near line-rate results in L2 and L3 east-west Shaker tests even with concurrency >50:● 9800 Mbits/sec in

download/upload tests

● 6100 Mbits/sec each direction in bi-directional tests

Intel X710 Dual Port 10-Gigabit

Page 38: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,Full L2 Download testIntel X710 Dual Port 10-Gigabit

Page 39: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-West Download testIntel X710 Dual Port 10-Gigabit

Page 40: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,Full L3 North-South Download testIntel X710 Dual Port 10-Gigabit

Page 41: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-west Bi-directional testIntel X710 Dual Port 10-Gigabit

Page 42: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Shaker: Lab 378,L3 East-west Bi-directional testIntel X710 Dual Port 10-Gigabit

Page 43: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Dataplane testing outcomes

Neutron DVR+VxLAN+L2pop installations are capable of almost line-rate performance.

Main bottlenecks: hardware configuration and MTU settings.

Solution:1. Use HW offloads-capable NICs2. Enable jumbo frames

North-South scenario needs improvement

Page 44: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Density test

Aim:Boot the maximum number of VMs the cloud can

manage.Make sure VMs are properly wired and have access

to the external network.Verify that data-plane is not affected by high load

on the cloud.

Page 45: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Environment description. 200 node lab3 controllers, 196 computes, 1 node for Grafana/Prometheus

CPU 20 core

RAM 128 GB

Network

2x Intel Corporation I350 Gigabit Network Connection (public network)2x Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

Controllers ComputesCPU 6 core

RAM 32 GB

Network

1x AOC-STGN-i2S - 2-port 10 Gigabit Ethernet SFP+

Page 46: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Density test process

Heat used for creating 1 network with a subnet, 1 DVR router, and 1 cirros VM per compute node.

1 Heat stack == 196 VMsUpon spawn VMs get their IPs

from metadata and send them to the external HTTP server

Iteration 1

Page 47: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Density test process

Heat stacks were created in batches of 1 to 5 (5 most of the times)

1 iteration == 196*5 VMsIntegrity test was ran

periodicallyConstant monitoring of lab

status using Grafana dashboard

Iteration k

Page 48: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Density test results

125 Heat stacks were createdTotal 24500 VMs on a clusterNumber of bugs filed and fixed: 8Days spent: 3People involved: 12Data-plane connectivity lost: 0

times

Page 49: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Grafana dashboard during density test

Page 50: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Density test load analysis

Page 51: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Issues faced

● Ceph failure!● Bugs

● LP #1614452 Port create time grows at scale due to dvr arp update ● LP #1610303 l2pop mech fails to update_port_postcommit on a loaded cluster ● LP #1528895 Timeouts in update_device_list (too slow with large # of VIFs)● LP #1606827 Agents might be reported as down for 10 minutes after all controllers restart ● LP #1606844 L3 agent constantly resyncing deleted router ● LP #1549311 Unexpected SNAT behavior between instances with DVR+floating ip ● LP #1609741 oslo.messaging does not redeclare exchange if it is missing● LP #1606825 nova-compute hangs while executing a blocking call to librbd

● Limits ● ARP table size on nodes ● cpu_allocation_ratio

Page 52: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Outcomes

● No major issues in Neutron● No threatening trends in control-plane tests● Data-plane tests showed stable performance on all

hardware● Data-plane does not suffer from control-plane failures● 24K+ VMs on 200 nodes without serious performance

degradation● Neutron is ready for large-scale production deployments

on 350+ nodes

Page 54: Is OpenStack Neutron production ready for large scale deployments?

Copyright © 2016 Mirantis, Inc. All rights reserved

Thank you

for your time