OpenStack Performance · ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs) RHEV 3.1 HP DL380p gen8 (16...

38
OpenStack Performance Ayal Baron and Doug Williams [email protected], [email protected] November 6, 2013

Transcript of OpenStack Performance · ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs) RHEV 3.1 HP DL380p gen8 (16...

OpenStack Performance

Ayal Baron and Doug [email protected], [email protected]

November 6, 2013

2

Topics

Conceptualizing OpenStack Performance Foundation

● Keystone Performance

OpenStack Nova● KVM Performance● Resource Over-commit● Nova Instance Storage – Boot Times and Snapshots● Nova Scheduler

OpenStack Cinder● Direct storage integration with QEMU● Glusterfs Performance Enhancements in Havana

Swift Performance● Swift and Blocking IO

What's Next

3

Background

Talk reflects work-in-progress Includes results from:

● RHEL-OSP 4 (Havana)● RHOS 3 (Grizzly)● RHOS 2 (Folsom)

Items not included in presentation● Neutron● Heat and most provisioning use-cases

4

Conceptualizing OpenStack Performance

5

High Level Architecture

Modular architecture

Designed to easily scale out

Based on (growing) set of core services

6

Control Plane vs Data Plane

Data Plane● Workloads in steady-state operation ● Performance dictated by components managed by OpenStack

Control Plane● Create/Delete/Start/Stop/Attach/Detach● Performance Dictated by OpenStack

Nova

Cinder

Neutron

Glance Swift

Hea

tVMs

Instance Storage

Volumes

Networks

Objects & ContainersImages

Lifecycle ops

Lifecycle ops

Lifecycle ops

Lif

ecyc

le o

ps

Control Plane Data Plane

Provisioning Steady State

7

Foundational Elements●Each service has associated databases●Extensive use of messaging for integration●Keystone as common identity service

8

Control Plane and Data Plane Technologies

Nova

Cinder

Neutron

Glance Swift

Hea

tVMs

Instance Storage

Volumes

Networks

Objects & ContainersImages

Lifecycle ops

Lifecycle ops

Lifecycle ops

Lif

ecyc

le o

ps

Control Plane Data Plane

Keystone

Ceilometer

Database

Messaging

ControlPlane

FoundationalServices

LDAP

MariaDB, MySQL, Galera

QPID, RabbitMQ

OpenStack HavanaPython 2.7

KVM, Xen, ESXi

Linux (RHEL)

XFS, RoC ...

Glusterfs, Ceph, NetApp, EMC, SolidFire ...

LinuxBridge, OVS, GRE, VXLAN, Nicera, BigSwitch ...Cisco Nexus, other switches

OpenStack HavanaPython 2.7

9

Control Plans and Data Plane TechnologiesTechnologies Used in Red Hat's Offering (RHEL-OSP 4)

Nova

Cinder

Neutron

Glance Swift

Hea

tVMs

Instance Storage

Volumes

Networks

Objects & ContainersImages

Lifecycle ops

Lifecycle ops

Lifecycle ops

Lif

ecyc

le o

ps

Control Plane Data Plane

Keystone

Ceilometer

Database

Messaging

Red Hat Identify Management

MariaDB, Galera

QPID

OpenStack HavanaPython 2.7

RHEL 6.5 with KVM

XFS, RoC ...

Glusterfs, Ceph, NetApp, EMC, SolidFire ...

LinuxBridge, OVS, GRE, VXLAN, Nicera, BigSwitch

Cisco Nexus, other switches

OpenStack HavanaPython 2.7

Note: Italics are examples of partner technologies

10

Factors influencing control plane performance demands

Size of Cell

in VM's

Dynamic nature of workloadProvisioning frequency

Static Production Workloads

Continuous Integration

Dev / Test

Autoscaling Production Workloads

Provisioning opera

tions p

er hr

11

Foundational Elements

12

Keystone Performance

Keystone findings - UUID in Folsom Chatty consumers: Multiple calls to keystone for new tokens

Database grows with no cleanup● As tokens expire they should eventually get removed

● Should help with indexing● For every 250K rows response times go up 0.1 secs

● Can be addressed via cron job● keystone-manage token_fl ush

Horizon Login 3 Tokens

Horizon Image page 2 Tokens

CLI (nova image-list) 2 Tokens

13

KeystoneInefficiencies in CLI due to Python libraries

Inefficiencies in CLI vs curl calls nova image-show

● Executes in 2.639s

curl -H “ “● Executes in .555s

Tracing of CLI shows that python is reading the data one byte at a time● Known httplib issue in the python standard library

● Next steps for testing are to move to Havana and PKI tokens

14

Nova

15

Understanding Nova Compute PerformanceKVM SPECvirt2010: RHEL 6 KVM Post Industry Leading Results

http://www.spec.org/virt_sc2010/results/

Virtualization Layer and HardwareBlue = Disk I/OGreen = Network I/O

Client Hardware

System Under Test (SUT)

Steady-State Performance of Nova Compute Nodes Strongly Determined by Hypervisor

For OpenStack this is typically KVM

Good news is RHEL / KVM has industry leading performance numbers.

16

VMware ESX 4.1 HP DL380 G7 (12 Cores, 78 VMs)

RHEL 6 (KVM) IBM HS22V (12 Cores, 84 VMs)

VMware ESXi 5.0 HP DL385 G7 (16 Cores, 102 VMs)

RHEV 3.1 HP DL380p gen8 (16 Cores, 150 VMs)

VMware ESXi 4.1 HP BL620c G7 (20 Cores, 120 VMs)

RHEL 6 (KVM) IBM HX5 w/ MAX5 (20 Cores, 132 VMs)

VMware ESXi 4.1 HP DL380 G7 (12 Cores, 168 Vms)

VMware ESXi 4.1 IBM x3850 X5 (40 Cores, 234 VMs)

RHEL 6 (KVM) HP DL580 G7 (40 Cores, 288 VMs)

RHEL 6 (KVM) IBM x3850 X5 (64 Cores, 336 VMs)

RHEL 6 (KVM) HP DL980 G7 (80 Cores, 552 VMs)

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

1,221 1,367 1,5702,442

1,878 2,1442,742

3,8244,682

5,467

8,956

Best SPECvirt_sc2010 Scores by CPU Cores

(As of May 30, 2013)

System

SP

EC

virt

_sc

201

0 sc

ore

Comparison based on best performing Red Hat and VMware solutions by cpu core count published at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.

2-socket 162-socket 12 2-socket 20

4-socket 40

8-socket 64/80

Understanding Nova Compute PerformanceSPECvirt2010: Red Hat Owns Industry Leading Results

OpenStack sweet spot oVirt / RHEV sweet spot

17

Understanding Nova Compute PerformanceGuest Performance

Expect the similar out of the box performance as RHEL / KVM ● Added tuned virtual-host to the Nova compute node configuration● RHOS generates its own XML file to describe the guest

● About the same as virt-manager● Of course smart storage layout is critical

Tuning for performance● Common for OpenStack and standalone KVM

● Big Pages, NUMA, tuned profiles● Optimizations for standalone KVM not currently integrated into

OpenStack● SR-IOV, process-level node bindings

18

Nova Out of the Box PerformanceComparison with Standalone KVM Results

1vm 4vm 8vm 12vm0

100000

200000

300000

400000

500000

600000

RHOS vs libvirt/KVM

java workload

RHOS Libvirt/KVM untuned

bops

19

Nova ComputeResource Over-Commit

Nova default configuration has some aggressive over commit ratios CPU has an over commit of 16

● The scheduler will need multiple suggestions based on the instance workload

Memory over commit is a much lower 1.5● Again depends on the workload● Anything memory sensitive falls off the cliff if you need to swap

20

Nova ComputeUnderstanding CPU Over-Commit

1vm 4vm 8vm 12vm 16vm 20vm 32VM 48VM0

100000

200000

300000

400000

500000

600000

RHOS Java workload vm scaling

Impact of overcommiting CPU on aggregate throughput

bops

However beware of cliff in excessive paging

Saturation

21

Nova ComputeEphemeral Storage

Look at ephemeral storage configuration Help determine guidelines for balancing ephemeral storage performance vs

cost / configuration● Trade-off between footprint (number of drives) and performance

● Initial cost / configuration, rack space 1U vs 2U, Power / cooling● How does network based storage perform

● Need to ensure proper network bandwidth

Configuration for tests Each instance uses a different image Hardware configs:

● Single system disk● Seven disk internal array● Fiber channel SSD drives

22

Nova Boot Times

2vm 16vm0

10

20

30

40

50

60

70

Nova Boot Times (Multiple Images) - Folsom

virt-host profile

System Disk Local RAID Cinder (FC SSD)

aver

age

bo

ot

tim

e (s

ecs

)

23

Impact of storage config on Nova Boot TimesLocal RAID and Cinder

2vm 4vm 6vm 8vm 10vm 12vm 14vm 16vm 18VM 20VM0

10

20

30

40

50

60

70

80

Nova Boot Times (Multiple Images)

no tuned

System Disk Local RAID Cinder (FC SSD array)

aver

age

bo

ot

tim

es (

sec

s)

24

Nova ComputeEphemeral Storage Snapshots Performance

1 Snap 2 Snap 4 Snap 6 Snap0

20

40

60

80

100

120

140

160

180

200

RHOS Concurrent Snapshot Timings (qemu-img convert only)

System Disk Local RAID

Concurrent Snapshots

Ave

rage

Sna

psho

t TIm

e (s

ecs)

Backing image and qcow => new image● Via qemu-img convert

mechanism● Written to temporary snap

directory● This destination is a tunable

25

Impact of Storage Configuration on Snapshots

1 snapshot 2 snapshots 0

2

4

6

8

10

12

14

16

11

15

RHOS Snapshot Timings (qem-img convert only) - Grizzly

Rhos 7 dsk lun

Avg

. Sna

psho

t Tim

e (s

ecs)

26

m1.medium m1.large m1.xlarge0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

Impact of Adding Guests on Total YCSB Throughput

total throughput Baseline Throughput

Ave

rag

e T

hro

ugh

put

3.3 x Baseline50% @ 5 VMs

12.8 x Baseline61% @ 21 VMs

5 x Baseline50% @ 10 VMs

Medium = 21 guests Large = 10 Xlarge = 5

Single guest

Nova Compute Single-node Scalability

27

m1.medium m1.large m1.xlarge0

20000

40000

60000

80000

100000

120000

Impact of Adding Guests on Average YCSB Throughput

Average Throughput (ops/sec) Baseline Throughput

Ave

rage

Thr

ough

put

Medium = 21 guests Large = 10 Xlarge = 5

Single guest

Per-Guest Performance Degradation Due to Over-Subscription

28

m1.medium m1.large m1.xlarge0

50100150200250300350400450500

Impact of Adding Guests on YCSB Latency

Average app latency (us) Baseline Latency

Late

ncy

(use

c)

Medium = 21 guests Large = 10 Xlarge = 5

Single guest

Per-Guest Performance Degradation Due to Over-Subscription

29

Nova Scheduler – Heterogeneous Memory ConfigsOut-of-Box Behavior

The default Nova scheduler assigns based on free memory● Not much concern about other system resources (CPU, memory, IO, etc)● You can change / tune this● Be aware if you have machines with different configurations

24cpu/48GB 24cpu/48GB 24cpu/96GB 24cpu/96GB0

10

20

30

40

50

60

1 1

49 49

Nova Scheduler Default Behavior

inst

anc

es

lau

nce

d

30

24cpu/48GB 24VMs 24cpu/48G 24cpu/96GB 24cpu/96GB0

5

10

15

20

25

30

35

3 4

85

2

7

7

43

6

5

6

4

4

5

7

7

3

8

2

Nova Scheduler Instance Placement at 20 vm launch increments

scheduler_host_subset_size=4

1-20 21-40 41-60 61-80 81-100

inst

ance

s la

unc

ed

Nova Scheduler – Heterogeneous Memory ConfigExample of Results After Scheduler Tuning

31

Cinder

32

Cinder QEMU Storage Integration

RHEL 6.5 and RHEL-OSP 4 (Havana) now includes tight QEMU integration with Glusterfs and Ceph clients

Benefits:● Direct QEMU integration avoids unnecessary context

switching● One client per guest configuration may help alleviate client-side

performance bottlenecks.

Integration model● QEMU includes hooks to dynamically link with storage vendor

supplied client● Delivered in a manner than preserved separation of concerns for

software distribution, updates and support● OpenStack and Linux, including QEMU, provided by Red Hat● Storage client libraries provided and supported by respective storage

vendors● Libgfapi for Glusterfs (RHS) supported by Red Hat● Librados for Ceph supported by InkTank

33

Glusterfs Support for Cinder in Havana With RHEL 6.5

34

Swift

35

Swift Performance – Glusterfs as Pluggable BackendTuning Worker Count, Max Clients, Chunk Size

1 3 5 7 9 11 13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

85

87

89

91

93

95

97

99

10

11

03

10

51

07

10

911

111

311

511

711

91

21

12

31

25

12

71

29

13

11

33

13

51

37

13

91

41

14

31

45

14

71

49

0

0.5

1

1.5

2

2.5

3

150 30 MB Objects Transferred Sequentially10 Ge, Concurrent w Four Other Clients (3K, 30K, 300K, 3M)

30 MB Default 30 MB Tuned

Object # (Sequential)

Se

cond

s pe

r O

bje

ct

3K Client Finished

30K Client Finished

300K Client Finished

Issue: Blocking IO for filesystem operations with Python Eventlet packaged used by OpenStack

Recommended Tuning:Workers: 10 (was 2) Max Clients: 1 (was 1024)Chunk Size: 2M (was 64K)

Reducing Max Clients also helps vanilla Swift

36

Wrap Up

37

Wrap Up

OpenStack is a rapidly evolving platform Out of the box performance is already pretty good

● Need to focus on infrastructure out of the box configuration and performance

Still just scratching the surface on the testing● Control plane performance via emulated Vms● Neutron performance (GRE, VXLAN)● Ceilometer● Performance impact of Active-Active foundational

components (DB, messaging)

38

Questions

?