Post on 12-Feb-2016
description
Storage Management in Virtualized Cloud
EnvironmentsSankaran Sivathanu, Ling Liu, Mei Yiduo and
Xing Pu
Student Workshop on Frontiers of Cloud Computing, IBM 2010
2
Talk Outline
• Introduction• Measurement results & Observations
– Data Placement & Provisioning– Workload Interference– Impacts of Virtualization
• Summary
3
Cloud & Virtualization
• Cloud Environment – Goals– Flexibility in resource configuration– Maximum resource utilization– Pay-per-use Model
• Virtualization – Benefits– Resource consolidation – Re-structuring flexibility– Separate protection domains
• Virtualization suits as one of the basic foundations of Cloud infrastructures
4
Fundamental Issues
• Could Service Providers (CSPs) vs. Customers– Customers purchase computing resources– CSPs provide virtual resources (VMs)– Customers perceive their resources as physical
machines!• Multiple VMs reside in single physical host
– Resource Interference – End-user performance depends on other users
• End-user unaware of where their data physically exists
5
Goals of our Measurement
• For cloud service providers– How to place data such that end-user performance is
maximized ?– How to co-locate workloads for least interference ?
• For End-Users– How to purchase resources in tune with requirement ?– How to tune applications for maximum performance ?
• General insights on storage I/O in virtualized environments
6
Benchmarks Used
• Postmark– Mail Server Workload– Create/Delete, Read/Append files– Parameters
• File Size• # of files• Read/Write ratio
• Synthetic Workload– Sequential vs. random accesses– Zipf Distribution
7
Data Provisioning & Placement
8
Workload
Data footprint ~150MB
4GB Partition
40GB Partition
Throughput : 2.1 MB/s Throughput : 1.4 MB/s
Performance Difference : 33%
Disk Provisioning
Consider a 100GB Disk
Case - I Case - II
9
Where to place VM disk ?
• Postmark benchmark– Read operation
• Cases :– Read from physical
partitions in different zones
• Based on LBNs• LBNs start from inner
zone and proceeds towards outer zones.
– Read from disk file (.vmdk)
10
Where to place multiple VM disks ?
• Postmark benchmark– 2 instances (1 for each VM)
• Random reads• Compare physical
partitions placed in different zones– O -> Outer– I -> Inner– M -> Mid
11
Observations
• Customers should purchase storage based on workload requirement, not price
• Thin provisioning may be practiced• Throughput intensive VMs can be placed in outer disk
zones• Multiple VMs that may be accessed simultaneously
should be co-located on disk– CSPs can monitor access patterns and move virtual disks
accordingly
12
Workload Interference
13
CPU-Disk Interference
VM - 1 VM - 2
CPU CPU
DISK DISKDISK
Throughput : 23.4 MB/s
CPU
Throughput : 27.6 MB/s
Performance Difference : 15.3%
Physical Host
14
CPU-Disk Interference
CPU allocation ratios has no effect on disk throughput across VMs
Disk intensive job performs better along with a CPU intensive job
15
Reason ?
Dynamic Frequency Scaling
CPU-Disk Interference
16
CPU-Disk Interference
CPU DFS is enabled in Linux by defaultThree ‘governors’ to control the DFS policy
On-demand (default)Performance Power-save
When 1 core is idle, entire CPU is down-scaled because overall CPU utilization falls
17
Disk-Disk Interference
• 1 instance of Postmark in each VMs• 65.3% more time taken when compared to running
Postmark in a single VM• Overhead mainly attributed to disk seeks : No more
sequential accesses
CPU CPUV.Disk-1 V.Disk-2
Physical Disk
VM-1 VM-2
Physical Host
18
CPU CPUV.Disk-1 V.Disk-2
Disk - 1 Disk - 2
VM-1 VM-2
Disk-Disk Interference
• VMs using separate physical disks• 17.52% more time taken when compared to running
Postmark in a single VM• Overhead attributed to contention in Dom-0’s queue
structures
Physical Host
19
Disk-Disk Interference
• Postmark Benchmark (Reads)
• Cases :– Running in a single VM– 1 instance in each of two
VMs• 2 VMs reading from virtual
disks in same physical disk• 2 VMs reading from virtual
disks in different physical disks
20
Disk-Disk Interference
• IO scheduling policy in Dom-0 has less effect
• ‘Ideal’ case is time taken when running Postmark in single VM
• Other cases are running 1 instance of Postmark in each of 2 VMs (separate physical disks)
21
Disk-Disk Interference
• Interference with respect to workload type
• Synthetic read workload• VMs use separate
physical disks• Cases :
– Mix of sequential versus random reads
• Sequential requests from both VMs flood Dom-0 queue - contention
22
Observations
• CPU-intensive and disk-intensive workloads can be co-located for optimal performance and power
• Virtual disks that may be accessed simultaneously must be placed in separate physical disks
• I/O scheduling in Dom-0 has less effect on disk workload interference
• Two sequential workloads, when co-located suffer in performance due to queue contention
• With separate disks, workload contention is generally minimal, other than the case of two sequential workloads
23
Impacts of Virtualization
24
Sequentiality
• Postmark benchmark (reads)
• No much overhead seen for random disk accesses
• VM overhead is mitigated by larger disk overhead
• More felt for sequential disk accesses
25
Block Size
• Postmark sequential reads
• Fixed overhead with every requests
• As block sizes increase, # of requests are reduced, hence overhead is reduced
• Efficient to read in larger blocks
26
Block size wrt. Locality
27
Observations
• VM overhead is not felt in random workloads – amortized by disk seeks
• Extra layers of indirection is the reason for VM overhead – when block size is large, overhead is amortized
• Block size may be increased only if there is sufficient locality in access
28
Summary
• Storage purchased must depend on requirement, not price!• It is better to place sequentially accessed streams in outer
disk zone• Co-locate virtual disks that may be accessed simultaneously• Co-locate CPU intensive task with disk intensive task for
better power and performance• Avoid co-locating two sequential workloads on single
physical machine – even when it goes to separate physical disks!
• Read in large blocks only when there is locality in workload
29
Questions
Contact : sankaran@gatech.edu