Differentiated I/O services in virtualized environments
description
Transcript of Differentiated I/O services in virtualized environments
1
Differentiated I/O services in virtualized environments
Tyler Harter, Salini SK & Anand Krishnamurthy
2
Overview• Provide differentiated I/O services for applications in
guest operating systems in virtual machines• Applications in virtual machines tag I/O requests• Hypervisor’s I/O scheduler uses these tags to provide
quality of I/O service
3
Motivation• Variegated applications with different I/O requirements
hosted in clouds• Not optimal if I/O scheduling is agnostic of the
semantics of the request
4
Motivation
Hypervisor
VM 1 VM 2 VM 3
5
Motivation
Hypervisor
VM 2
VM 3
6
Motivation• We want to have high and low priority processes that
correctly get differentiated service within a VM and between VMs
Can my webserver/DHT log pusher’s IO be served differently
from my webserver/DHT’s IO?
7
Existing work & Problems• Vmware’s ESX server offers Storage I/O Control (SIOC)• Provides I/O prioritization of virtual machines that
access a shared storage pool
But it supports prioritization only at host granularity!
8
Existing work & Problems• Xen credit scheduler also works at domain level
• Linux’s CFQ I/O scheduler supports I/O prioritization
– Possible to use priorities at both guest and hypervisor’s I/O scheduler
9
Original Architecture
QEMU VirtualSCSI Disk
SyscallsI/O Scheduler
(e.g., CFQ)
SyscallsI/O Scheduler
(e.g., CFQ)
GuestVMs
Host
HighLow HighLow
10
Original Architecture
11
Problem 1: low and high may get same service
12
Problem 2: does not utilize host caches
13
Existing work & Problems• Xen credit scheduler also works at domain level
• Linux’s CFQ I/O scheduler supports I/O prioritization
– Possible to use priorities at both guest and hypervisor’s I/O scheduler
• Current state of the art doesn’t provide differentiated services at guest application level granularity
14
Solution
Tag I/O and prioritize in the hypervisor
15
Outline• KVM/Qemu, a brief intro…• KVM/Qemu I/O stack• Multi-level I/O tagging• I/O scheduling algorithms• Evaluation• Summary
16
KVM/Qemu, a brief intro..
Hardware
Linux Standard Kernel with KVM - Hypervisor
KVM module part of Linux kernel since
version 2.6
Linux has all the mechanisms a VMM
needs to operate several VMs.
Has 3 modes:- kernel, user, guest
kernel-mode: switch into guest-mode and handle exits due to I/O operations
user-mode: I/O when guest needs to access devices
guest-mode: execute guest code, which is the guest OS except I/O
Relies on avirtualization capable CPU with either Intel VT or AMD SVM extensions
17
KVM/Qemu, a brief intro..
Hardware
Linux Standard Kernel with KVM - Hypervisor
KVM module part of Linux kernel since
version 2.6
Linux has all the mechanisms a VMM
needs to operate several VMs.
Has 3 modes:- kernel, user, guest
kernel-mode: switch into guest-mode and handle exits due to I/O operations
user-mode: I/O when guest needs to access devices
guest-mode: execute guest code, which is the guest OS except I/O
Relies on avirtualization capable CPU with either Intel VT or AMD SVM extensions
18
KVM/Qemu, a brief intro..
Hardware
Linux Standard Kernel with KVM - Hypervisor
Each Virtual Machine is an user space process
19
KVM/Qemu, a brief intro..
Hardware
Linux Standard Kernel with KVM - Hypervisor
libvirt
Other user
space ps
KVM/Qemu I/O stackApplication in
guest OSApplication in
guest OS
System calls layer
read, write, stat ,…
VFS
FileSystem BufferCache
Block
SCSI ATA
Issues an I/O-related system call (eg: read(), write(), stat()) within a user-space context of the virtual machine.
This system call will lead to submitting an I/O request from within the kernel-space of theVM
The I/O request will reach a device driver - either an ATA-compliant (IDE) or SCSI
KVM/Qemu I/O stackApplication in
guest OSApplication in
guest OS
System calls layer
read, write, stat ,…
VFS
FileSystem BufferCache
Block
SCSI ATA
The device driver will issue privileged instructions to read/write to the memory regions exported over PCI bythe corresponding device
KVM/Qemu I/O stack
Hardware
Linux Standard Kernel with KVM - Hypervisor
These instructions will trigger VM-exits, that will be handled by the coreKVM module within the Host's kernel-space context
Qemu emulatorThe privileged I/O relatedinstructions are passed by the hypervisor to the QEMU machine emulator
A VM-exit will take place for each of the privilegedinstructions resulting from the original I/O request in the VM
KVM/Qemu I/O stack
Hardware
Linux Standard Kernel with KVM - Hypervisor
Qemu emulator
These instructions will then beemulated by device-controller emulation modules within QEMU (either as ATA or as SCSI commands)
QEMU will generate block-access I/O requests, in a special blockdeviceemulation module
Thus the original I/O request will generate I/O requests to the kernel-space of the Host
Upon completion of the system calls, qemu will "inject" an interrupt into the VM that originally issued the I/O request
Multi-level I/O tagging modifications
Modification 1: pass priorities via syscalls
Modification 2: NOOP+ at guest I/O scheduler
Modification 3: extend SCSI protocol with prio
Modification 2: NOOP+ at guest I/O scheduler
Modification 4: share-based prio sched in host
Modification 5: use new calls in benchmarks
31
Scheduler algorithm-Stride - ID of application = Shares assigned to V – Virtual IO counter for = Global_shares/
Dispatch request(){
Select the ID which has lowest Virtual IO counterIncrease by if ( reaches threshold)
Reinitialize all to 0 Dispatch request in the queue
}
32
Scheduler algorithm cntd• Problem: Sleeping process can monopolize the resource
once it wakes up after a long time• Solution: – If a sleeping process wakes up, then set
= max( min(all which are non zero), )
33
Evaluation• Tested on HDD and SSD• Configuration:
Guest RAM size 1GBHost RAM size 8GBHard disk RPM 7200SSD 35000 IOPS Rd, 85000 IOPS
WrGuest OS Ubuntu Server 12.10 LK 3.2Host OS Kubuntu 12.04 LK 3.2Filesystem(Host/Guest) Ext4Virtual disk image format qcow2
34
Results • Metrics:– Throughput– Latency
• Benchmarks:– Filebench– Sysbench– Voldemort(Distributed Key Value Store)
35
Shares vs Throughput for different workloads : HDD
36
Shares vs Latency for different workloads : HDD
• Priorities are better respected if most of the read request hits the disk
37
Effective Throughput for various dispatch numbers : HDD
• Priorities are respected only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time
• Downside: Dispatch number of the disk is directly proportional to the effective throughput
38
Shares vs Throughput for different workloads : SSD
39
Shares vs Latency for different workloads : SSD
• Priorities in SSDs are respected only under heavy load, since SSDs are faster
40
Comparison b/w different schedulers
• Only Noop+LKMS respects priority! (Has to be, since we did it)
41
Results
Hard drive/SSD
Webserver Mailserver Random Reads
Sequential Reads
Voldemort DHT Reads
Hard disk
Flash
42
Summary• It works!!! • Preferential services are possible only when dispatch
numbers of the disk is lower than the number of read requests generated by the system at a time
• But lower dispatch number reduces the effective throughput of the storage
• In SSD, preferential service is only possible under heavy load• Scheduling at the lowermost layer yields better
differentiated services
43
Future work• Get it working for writes• Get evaluations on VMware ESX SIOC and compare with
our results
44