Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Performance Tuning a Cloud Application: A Real World Case Study
-
Upload
shanegibson -
Category
Technology
-
view
550 -
download
1
description
Transcript of Performance Tuning a Cloud Application: A Real World Case Study
![Page 1: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/1.jpg)
Cloud Platform Engineering
Performance Tuning a Cloud ApplicationShane GibsonSr. Principal Infrastructure Architect
![Page 2: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/2.jpg)
2
Agenda• About Symantec and Me
• Key Value as a Service
• The Pesky Problem
• Resolving “The Pesky Problem”
• Performance Tuning Recommendations
• Summary
• Q&A
![Page 3: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/3.jpg)
3
About Symantec and Me
![Page 4: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/4.jpg)
4
The Symantec Team• Cloud Platform Engineering
– We are building a consolidated cloud platform that provides infrastructure and platform services for next generation Symantec products and services
– Starting small, but scaling to tens of thousands of nodes across multiple DCs– Cool technologies in use: OpenStack, Hadoop, Storm, Cassandra, MagnetoDB– Strong commitment to provide back to Open Source communities
• Shane Gibson– Served 4 years in USMC as a computer geek (mainframes and Unix)– Unix/Linux SysAdmin, System Architect, Network Architect, Security Architect– Now Cloud Infrastructure Architect for CPE group at Symantec
![Page 5: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/5.jpg)
5
Key Value as a Service(the “cloud” application)
![Page 6: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/6.jpg)
6
Key Value as a Service: General Architecture• MagnetoDB is a key value store with OpenStack REST and AWS
DynamoDB API compatibility
• Uses a “pluggable” backend storage capability
• Composite service made up of:– MagnetoDB front-end API and Streaming service– Cassandra for back end, Key Value based storage– OpenStack Keystone– AMQP Messaging Bus (eg RabbitMQ, QPID, ZeroMQ)– Load Balancing capabilities (Hardware or LBaaS)
![Page 7: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/7.jpg)
7
Key Value as a Service: MagnetoDB– API Services Layer
• Data API• Streaming API• Monitoring API• AWS DynamoDB API
– Keystone and Notifications integrations
– MagnetoDB Database Driver• Cassandra
![Page 8: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/8.jpg)
8
Key Value as a Service: MagnetoDB– API Services Layer
• Data API• Streaming API• Monitoring API• AWS DynamoDB API
– Keystone and Notifications integrations
– MagnetoDB Database Driver• Cassandra
![Page 9: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/9.jpg)
9
Key Value as a Service: Cassandra– Database storage engine – Massively linearly scalable– Highly available w/ no SPoF– Other features:
• tunable consistency• key-value data model• ring topology• predictable high performance
and fault tolerance• Rack and Datacenter awareness
![Page 10: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/10.jpg)
10
Key Value as a Service: Cassandra– Database storage engine – Massively linearly scalable– Highly available w/ no SPoF– Other features:
• tunable consistency• key-value data model• ring topology• predictable high performance
and fault tolerance• Rack and Datacenter awareness
![Page 11: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/11.jpg)
11
Key Value as a Service: Other Stuff – Need a load balancing layer of
some sort• LBaaS or hardware
– Keystone service– AMQP service
• RabbitMQ
![Page 12: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/12.jpg)
12
Key Value as a Service: Other Stuff – Need a load balancing layer of
some sort• LBaaS or hardware
– Keystone service– AMQP service
• RabbitMQ
![Page 13: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/13.jpg)
13
Key Value as a Service: Other Stuff – Need a load balancing layer of
some sort• LBaaS or hardware
– Keystone service– AMQP service
• RabbitMQ
![Page 14: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/14.jpg)
14
Key Value as a Service: Other Stuff – Need a load balancing layer of
some sort• LBaaS or hardware
– Keystone service– AMQP service
• RabbitMQ
![Page 15: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/15.jpg)
15
Key Value as a Service: Putting it all Together
![Page 16: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/16.jpg)
16
The Pesky Problem
![Page 17: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/17.jpg)
17
The Pesky Problem: Deployed on Bare Metal• Initial deployment of KVaaS service on bare metal nodes
• Mixed both MagnetoDB API service on same node as Cassandra– MagnetoDB CPU –vs- Cassandra Disk I/O profile
• Cassandra directly managing the disks via JBOD (good!)
• MagnetoDB likes lots of CPU, direct access to 32 (HT) CPUs– Please don’t start me on a HyperThread CPU count rant
• KVaaS team performance expectation set from this experience!
![Page 18: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/18.jpg)
18
The Pesky Problem: Moved to OpenStack Nova• KVaaS service migrated to a “stock” OpenStack Nova cluster
• Nova Compute nodes set with RAID 10 ephemeral disks
• OpenContrail used for SDN configuration
• Performance for each VM Guest roughly 66% of bare metal
• KVaaS team was unhappy
![Page 19: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/19.jpg)
bare metal
250 RPS / HT Core*
virtualized
165 RPS / HT Core*
19
The Pesky Problem: Moved to OpenStack Nova, cont.
performance comparison of “list_tables”
* results averaged by core since test beds were different
![Page 20: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/20.jpg)
20
The Pesky Problem: The Goal• Deploy our KVaaS service … as a flexible and scalable solution
• Ability to use OpenStack APIs to manage the service
• Cloud Provider run KVaaS service or Tenant managed service
• Initial deployment planned for OpenStack Nova platform– Not a containerization service … – Though … considering it …
• Easier auto-scaling, better service packing, flexibility, etc.
• Explore mixed MagnetoDB/Cassandra –vs- separated services
![Page 21: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/21.jpg)
21
Resolving “The Pesky Problem”
![Page 22: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/22.jpg)
22
Resolving the “Pesky Problem”: Approach• Baseline the test environment
– Bare metal deployment and test– Mimics the original deployment characteristics
• Deploy OpenStack Nova – Install KVaaS services
• Performance tune each component – Linux OS and Hardware configuration– KVM Hypervisor/Nova Compute performance tuning– MagnetoDB/Cassandra performance tuning
![Page 23: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/23.jpg)
23
Resolving the “Pesky Problem”: Testing Tools• Linux OS and Hardware
– perf, openssl speed, iostat, iozone, iperf, dd (yes, really!), dtrace
• KVM Hypervisor/Nova Compute– kvm_stat, kvmtrace, perf stat –e ‘kvm:*’, specvirt
• MagnetoDB/Cassandra– magnetodb-test-bench, jstat, cstar_perf, cassandra-stress
• General Test Suite– Phoronix Test Suite
![Page 24: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/24.jpg)
24
Resolving the “Pesky Problem”: Test Architecture
![Page 25: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/25.jpg)
25
Resolving the “Pesky Problem”: Test Bench
![Page 26: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/26.jpg)
26
Performance TuningRecommendations
![Page 27: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/27.jpg)
27
Guest:• vhost_net or virtio_net,
virtio_blk, virtio_balloon, virtio_pc
• Paravirtualization !• Disable system perf. gathering – get
info from host hyper. tools• Elevator scheduler to “noop”• Give guests as much memory as you
can (FS cache!)
Performance Tuning Results: Linux OS and Hardware Recommendations:
Host:• vhost_net, transparent_hugepages,
high_res_timer, hpet, compaction, ksm, cgroups
• task scheduling tweaks (CFS)• Filesystem mount options
(noatime, nodirtime, relatime)• Tune wmem and rmem buffers !!!• Elevator I/O Scheduler = deadline
![Page 28: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/28.jpg)
28
Performance Tuning Results: Linux OS and Hardware
7-10x
30%
10% less latency8x throughput
2x throughput
Host:• vhost_net, transparent_hugepages,
high_res_timer, hpet, compaction, ksm, cgroups
• task scheduling tweaks (CFS)• Filesystem mount options
(noatime, nodirtime, relatime)• Tune wmem and rmem buffers !!!• Elevator I/O Scheduler = deadline
Recommendations:Guest:• vhost_net or virtio_net,
virtio_blk, virtio_balloon, virtio_pc
• Paravirtualization !• Disable system perf. gathering – get
info from host hyper. tools• Elevator scheduler to “noop”• Give guests as much memory as you
can (FS cache!)
![Page 29: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/29.jpg)
29
Performance Tuning Results: KVM /Nova ComputeRecommendations:
Host:• tweak Transparent Huge Pages• bubble up raw devices if possible
(warning: migration/portability)• multi-queue virtio-net• SR-IOV if can dedicate NIC
(warning: see bubble up warning!)
Guest:• qcow2 or raw for guest file backing• disk partition alignment is still very
important• preallocate metadata (qcow2)• fallocate entire guest image if can
(qcow2, lose oversubscribe ability) • set VM swappiness to zero • Async. I/O set to “native”
![Page 30: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/30.jpg)
Host:• tweak Transparent Huge Pages• bubble up raw devices if possible
(warning: migration/portability)• multi-queue virtio-net• SR-IOV if can dedicate NIC
(warning: see bubble up warning!)
Recommendations:
30
Performance Tuning Results: KVM /Nova Compute
30
2 to 15% gain
~ 10% gain
40+% gain w/Host + Guest
8% gain in TPM
Guest:• qcow2 or raw for guest file backing• disk partition alignment is still very
important• preallocate metadata (qcow2)• fallocate entire guest image if can
(qcow2, lose oversubscribe ability) • set VM swappiness to zero • Async. I/O set to “native”
![Page 31: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/31.jpg)
31
Performance Tuning Results: MagnetoDB/Cassandra Recommendations:
• disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may help write work loads that have ordered writes, or writes in bursty chunks
• “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for write performance improvement
• GC tuning of Java heap/new gen – significant latency decreases
• Tune Bloom Filters, Data Caches, and Compaction
• Use compression for similar “column families”
![Page 32: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/32.jpg)
32
Performance Tuning Results: MagnetoDB/Cassandra 10x pages
25-35% read perf.5-10% write gains
Recommendations:
• disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may help write work loads that have ordered writes, or writes in bursty chunks
• “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for write performance improvement
• GC tuning of Java heap/new gen – significant latency decreases
• Tune Bloom Filters, Data Caches, and Compaction
• Use compression for similar “column families”
![Page 33: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/33.jpg)
33
Summary
![Page 34: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/34.jpg)
34
Summary: Notes• “clouds” are best composed of small services that can be
independently combined, tuned, and scaled
• human expectations in the transition from bare metal to cloud need to be reset
• an iterative step-by-step approach is best – Test … Tune … Test … Tune … !
• lots of complex pieces in a cloud application
![Page 35: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/35.jpg)
35
Summary: Notes (continued)• Compose your services as individual building blocks
• Tune each component/service independently
• Then tune the whole system
• Automation is critical to iterative test/tune strategies!!
• Performance tuning is absolutely worth the investment
• Knowing your work loads is still (maybe even more?) critical
![Page 36: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/36.jpg)
36
Questions and (hopefully?) Answers
Let’s talk…
![Page 37: Performance Tuning a Cloud Application: A Real World Case Study](https://reader036.fdocuments.us/reader036/viewer/2022062303/557d609ad8b42aba3d8b5007/html5/thumbnails/37.jpg)
Thank you!
Copyright © 2014 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.
This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.
37
Shane [email protected]