VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM...
-
Upload
angel-mitchell -
Category
Documents
-
view
219 -
download
0
Transcript of VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM...
![Page 1: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/1.jpg)
![Page 2: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/2.jpg)
2
VM Performance: I/O
Arne Wiebalck
LHCb Computing Workshop, CERN
May 22, 2015
Arne Wiebalck -- VM Performance: I/O
![Page 3: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/3.jpg)
3
In this talk: I/O only
• Most issues were I/O related- Symptom: High IOwait
• You can optimize this - Understand service offering- Tune your VM
• CPU being looked at- Host-side, too early
Arne Wiebalck -- VM Performance: I/O
![Page 4: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/4.jpg)
4
Hypervisor limits
Disk
hypervisor
Arne Wiebalck -- VM Performance: I/O
• Two-disk RAID-1- Effectively one disk
• Disk is a shared resource- IOPS / #VMs
• Your VM isco-hosted with7 other VMs
User expectation: ephemeral disk ≈ local disk
RAM
RAM
CPUs
pro rata
IOPS
4 cores 8 GB
25 IOPS
virtual machine
![Page 5: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/5.jpg)
5Arne Wiebalck -- VM Performance: I/O
Ask not only what theVM can do for you …
![Page 6: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/6.jpg)
6
Tip 1: Minimize disk usage
• Use tmpfs
• Reduce logging
• Configure AFS memory caches- Supported in Puppet
• Use volumes
• …
Arne Wiebalck -- VM Performance: I/O
![Page 7: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/7.jpg)
7
Tip 2: Check IO Scheduling• The “lxplus problem”
• VMs use ‘deadline’ elevator- Set by ‘virtual-guest’ tuned profile, RH’s default for VMs- Not always ideal (interactive machines):
‘deadline’ prefers reads, can delay writes (default: 5 secs)
- Made to allow reads under heavy load (webserver)- lxplus: sssd makes DB updates during login
• IO scheduler on the VM changed to CFQ- Completely Fair Queuing
• Benchmark: login loop
Arne Wiebalck -- VM Performance: I/O
![Page 8: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/8.jpg)
8
‘deadline’ vs. ‘CFQ’
IO benchmarkstarts
Switch from ‘deadline’to ‘CFQ’
IO benchmarks continues
Arne Wiebalck -- VM Performance: I/O
![Page 9: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/9.jpg)
9
Tip 3: Use volumes
• Volumes are networked virtual disks- Show up as block devices - Attached to one VM at a time- Arbitrary size (within your quota)
• Provided by Ceph (and NetApp)
• QoS for IOPS and bandwidth- Allows to offer different types
Arne Wiebalck -- VM Performance: I/O
![Page 10: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/10.jpg)
10
Volumes types
Arne Wiebalck -- VM Performance: I/O
Name Bandwidth IOPS Comment
standard 80MB/s 100 std disk performance
io1 120MB/s 500 quota on request
cp1 80MB/s 100 critical power
cp2 80MB/s 100critical powerWindows only
(in preparation)
![Page 11: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/11.jpg)
11
Ceph volumes at work
ATLAS TDAQ monitoring application
Y- Axis: CPU % spent in IOwait
Blue: CVI VM (h/w RAID-10 with cache)
Yellow: OpenStack VM
IOPS QoS changedfrom 100 to 500 IOPS
EGI Message Broker monitoring
Y- Axis: Scaled CPU load(5 mins of load / #cores)
IOPS QoS changedfrom 100 to 500 IOPS
Arne Wiebalck -- VM Performance: I/O
![Page 12: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/12.jpg)
12
Tip 4: Use SSD block level caching
• SSDs as disks in hypervisors would solve all IOPS and latency issues
• But still (too expensive and) too small
• Compromise: SSD block level caching- flashcache (from Facebook, used at CERN for AFS before)- dm-cache (in-kernel since 3.9, rec. by RedHat, in CentOS7)- bcache (in kernel since 3.10)
Arne Wiebalck -- VM Performance: I/O
![Page 13: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/13.jpg)
13
bcache
Disk100
IOPS
hypervisor
RAM
RAM
CPUs
SSD20k
IOPS
• Change cache mode at runtime- Think SSD replacements
• Strong error-handling- Flush and bypass on error
• Easy setup- Transparent for VM- Need special kernel for HV
Arne Wiebalck -- VM Performance: I/O
![Page 14: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/14.jpg)
14
bcache in action (2)
Switch cache modefrom ‘none’
to ‘writeback’
On a 4 VM hypervisor:
~25 IOPS/VM ~1000 IOPS/VM
Benchmarking a caching system is non-trivial:
- SSD performance can vary over time
- SSD performance can vary between runs
- Data distribution important (c.f. Zipf)
Benchmark ended,#threads decreased
Arne Wiebalck -- VM Performance: I/O
![Page 15: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/15.jpg)
15
bcache and Latency
SSD block level caching sufficientfor IOPS and latency demands.
Blue: CVI VM (h/w RAID-10 w/ cache)
Yellow: OpenStack VM
Red: OpenStack on bcache HV
Use a VM on a bcache hypervisor
Arne Wiebalck -- VM Performance: I/O
Caveat: SSD failures are fatal!
Clients: lxplus, ATLAS build service, CMS Frontier, root, … (16 tenants)
![Page 16: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/16.jpg)
16
“Tip” 5: KVM caching
• I/O from the VM goes directly to the disk- Required for live migration- Not optimal for performance
• I/O can be cached onthe hypervisor- Operationally difficult- No live migration- Done for batch
Arne Wiebalck -- VM Performance: I/O
Disk
hypervisor
RAM
RAM
CPUs
4 cores 8 GB
virtual machine
![Page 17: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/17.jpg)
17
KVM caching in action
Arne Wiebalck -- VM Performance: I/O
ATLAS SAM VM(‘none’ to ‘write-back’)
Batch nodes with and without KVM caching
![Page 18: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/18.jpg)
18
Take home messages
• The Cloud service offers various options to improve the I/O performance of your VMs
• You need to analyze and pick the right one for your use case- Reduce I/O- Check I/O scheduler- Use volumes- Use SSD hypervisors- (Use KVM caching)
• Get in touch with the Cloud team in case you need assistance!
Arne Wiebalck -- VM Performance: I/O
![Page 19: VM Performance: I/O 2 Arne Wiebalck LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O.](https://reader035.fdocuments.us/reader035/viewer/2022081513/5697c01e1a28abf838cd1055/html5/thumbnails/19.jpg)