Differentiated I/O services in virtualized environments

1

Differentiated I/O services in virtualized environments

Tyler Harter, Salini SK & Anand Krishnamurthy

2

Overview• Provide differentiated I/O services for applications in

guest operating systems in virtual machines• Applications in virtual machines tag I/O requests• Hypervisor’s I/O scheduler uses these tags to provide

quality of I/O service

3

Motivation• Variegated applications with different I/O requirements

hosted in clouds• Not optimal if I/O scheduling is agnostic of the

semantics of the request

4

Motivation

Hypervisor

VM 1 VM 2 VM 3

5

Motivation

Hypervisor

VM 2

VM 3

6

Motivation• We want to have high and low priority processes that

correctly get differentiated service within a VM and between VMs

Can my webserver/DHT log pusher’s IO be served differently

from my webserver/DHT’s IO?

7

Existing work & Problems• Vmware’s ESX server offers Storage I/O Control (SIOC)• Provides I/O prioritization of virtual machines that

access a shared storage pool

But it supports prioritization only at host granularity!

8

Existing work & Problems• Xen credit scheduler also works at domain level

• Linux’s CFQ I/O scheduler supports I/O prioritization

– Possible to use priorities at both guest and hypervisor’s I/O scheduler

9

Original Architecture

QEMU VirtualSCSI Disk

SyscallsI/O Scheduler

(e.g., CFQ)

SyscallsI/O Scheduler

(e.g., CFQ)

GuestVMs

Host

HighLow HighLow

10

Original Architecture

11

Problem 1: low and high may get same service

12

Problem 2: does not utilize host caches

13

Existing work & Problems• Xen credit scheduler also works at domain level

• Linux’s CFQ I/O scheduler supports I/O prioritization

– Possible to use priorities at both guest and hypervisor’s I/O scheduler

• Current state of the art doesn’t provide differentiated services at guest application level granularity

14

Solution

Tag I/O and prioritize in the hypervisor

15

Outline• KVM/Qemu, a brief intro…• KVM/Qemu I/O stack• Multi-level I/O tagging• I/O scheduling algorithms• Evaluation• Summary

16

KVM/Qemu, a brief intro..

Hardware

Linux Standard Kernel with KVM - Hypervisor

KVM module part of Linux kernel since

version 2.6

Linux has all the mechanisms a VMM

needs to operate several VMs.

Has 3 modes:- kernel, user, guest

kernel-mode: switch into guest-mode and handle exits due to I/O operations

user-mode: I/O when guest needs to access devices

guest-mode: execute guest code, which is the guest OS except I/O

Relies on avirtualization capable CPU with either Intel VT or AMD SVM extensions

17


Hardware


KVM module part of Linux kernel since

version 2.6

Linux has all the mechanisms a VMM

needs to operate several VMs.

Has 3 modes:- kernel, user, guest

kernel-mode: switch into guest-mode and handle exits due to I/O operations

user-mode: I/O when guest needs to access devices

guest-mode: execute guest code, which is the guest OS except I/O

Relies on avirtualization capable CPU with either Intel VT or AMD SVM extensions

18


Hardware


Each Virtual Machine is an user space process

19


Hardware


libvirt

Other user

space ps

KVM/Qemu I/O stackApplication in

guest OSApplication in

guest OS

System calls layer

read, write, stat ,…

VFS

FileSystem BufferCache

Block

SCSI ATA

Issues an I/O-related system call (eg: read(), write(), stat()) within a user-space context of the virtual machine.

This system call will lead to submitting an I/O request from within the kernel-space of theVM

The I/O request will reach a device driver - either an ATA-compliant (IDE) or SCSI

KVM/Qemu I/O stackApplication in

guest OSApplication in

guest OS

System calls layer

read, write, stat ,…

VFS

FileSystem BufferCache

Block

SCSI ATA

The device driver will issue privileged instructions to read/write to the memory regions exported over PCI bythe corresponding device

KVM/Qemu I/O stack

Hardware


These instructions will trigger VM-exits, that will be handled by the coreKVM module within the Host's kernel-space context

Qemu emulatorThe privileged I/O relatedinstructions are passed by the hypervisor to the QEMU machine emulator

A VM-exit will take place for each of the privilegedinstructions resulting from the original I/O request in the VM

KVM/Qemu I/O stack

Hardware


Qemu emulator

These instructions will then beemulated by device-controller emulation modules within QEMU (either as ATA or as SCSI commands)

QEMU will generate block-access I/O requests, in a special blockdeviceemulation module

Thus the original I/O request will generate I/O requests to the kernel-space of the Host

Upon completion of the system calls, qemu will "inject" an interrupt into the VM that originally issued the I/O request

Multi-level I/O tagging modifications

Modification 1: pass priorities via syscalls

Modification 2: NOOP+ at guest I/O scheduler

Modification 3: extend SCSI protocol with prio

Modification 2: NOOP+ at guest I/O scheduler

Modification 4: share-based prio sched in host

Modification 5: use new calls in benchmarks

31

Scheduler algorithm-Stride - ID of application = Shares assigned to V – Virtual IO counter for = Global_shares/

Dispatch request(){

Select the ID which has lowest Virtual IO counterIncrease by if ( reaches threshold)

Reinitialize all to 0 Dispatch request in the queue

}

32

Scheduler algorithm cntd• Problem: Sleeping process can monopolize the resource

once it wakes up after a long time• Solution: – If a sleeping process wakes up, then set

= max( min(all which are non zero), )

33

Evaluation• Tested on HDD and SSD• Configuration:

Guest RAM size 1GBHost RAM size 8GBHard disk RPM 7200SSD 35000 IOPS Rd, 85000 IOPS

WrGuest OS Ubuntu Server 12.10 LK 3.2Host OS Kubuntu 12.04 LK 3.2Filesystem(Host/Guest) Ext4Virtual disk image format qcow2

34

Results • Metrics:– Throughput– Latency

• Benchmarks:– Filebench– Sysbench– Voldemort(Distributed Key Value Store)

35

Shares vs Throughput for different workloads : HDD

36

Shares vs Latency for different workloads : HDD

• Priorities are better respected if most of the read request hits the disk

37

Effective Throughput for various dispatch numbers : HDD

• Priorities are respected only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time

• Downside: Dispatch number of the disk is directly proportional to the effective throughput

38

Shares vs Throughput for different workloads : SSD

39

Shares vs Latency for different workloads : SSD

• Priorities in SSDs are respected only under heavy load, since SSDs are faster

40

Comparison b/w different schedulers

• Only Noop+LKMS respects priority! (Has to be, since we did it)

41

Results

Hard drive/SSD

Webserver Mailserver Random Reads

Sequential Reads

Voldemort DHT Reads

Hard disk

Flash

42

Summary• It works!!! • Preferential services are possible only when dispatch

numbers of the disk is lower than the number of read requests generated by the system at a time

• But lower dispatch number reduces the effective throughput of the storage

• In SSD, preferential service is only possible under heavy load• Scheduling at the lowermost layer yields better

differentiated services

43

Future work• Get it working for writes• Get evaluations on VMware ESX SIOC and compare with

our results

Differentiated I/O services in virtualized environments

Documents

Transcript of Differentiated I/O services in virtualized environments