Storage Management in Virtualized Cloud Environments

29
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing, IBM 2010

description

Storage Management in Virtualized Cloud Environments. Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu. Student Workshop on Frontiers of Cloud Computing, IBM 2010. Talk Outline. Introduction Measurement results & Observations Data Placement & Provisioning Workload Interference - PowerPoint PPT Presentation

Transcript of Storage Management in Virtualized Cloud Environments

Page 1: Storage Management in Virtualized Cloud Environments

Storage Management in Virtualized Cloud

EnvironmentsSankaran Sivathanu, Ling Liu, Mei Yiduo and

Xing Pu

Student Workshop on Frontiers of Cloud Computing, IBM 2010

Page 2: Storage Management in Virtualized Cloud Environments

2

Talk Outline

• Introduction• Measurement results & Observations

– Data Placement & Provisioning– Workload Interference– Impacts of Virtualization

• Summary

Page 3: Storage Management in Virtualized Cloud Environments

3

Cloud & Virtualization

• Cloud Environment – Goals– Flexibility in resource configuration– Maximum resource utilization– Pay-per-use Model

• Virtualization – Benefits– Resource consolidation – Re-structuring flexibility– Separate protection domains

• Virtualization suits as one of the basic foundations of Cloud infrastructures

Page 4: Storage Management in Virtualized Cloud Environments

4

Fundamental Issues

• Could Service Providers (CSPs) vs. Customers– Customers purchase computing resources– CSPs provide virtual resources (VMs)– Customers perceive their resources as physical

machines!• Multiple VMs reside in single physical host

– Resource Interference – End-user performance depends on other users

• End-user unaware of where their data physically exists

Page 5: Storage Management in Virtualized Cloud Environments

5

Goals of our Measurement

• For cloud service providers– How to place data such that end-user performance is

maximized ?– How to co-locate workloads for least interference ?

• For End-Users– How to purchase resources in tune with requirement ?– How to tune applications for maximum performance ?

• General insights on storage I/O in virtualized environments

Page 6: Storage Management in Virtualized Cloud Environments

6

Benchmarks Used

• Postmark– Mail Server Workload– Create/Delete, Read/Append files– Parameters

• File Size• # of files• Read/Write ratio

• Synthetic Workload– Sequential vs. random accesses– Zipf Distribution

Page 7: Storage Management in Virtualized Cloud Environments

7

Data Provisioning & Placement

Page 8: Storage Management in Virtualized Cloud Environments

8

Workload

Data footprint ~150MB

4GB Partition

40GB Partition

Throughput : 2.1 MB/s Throughput : 1.4 MB/s

Performance Difference : 33%

Disk Provisioning

Consider a 100GB Disk

Case - I Case - II

Page 9: Storage Management in Virtualized Cloud Environments

9

Where to place VM disk ?

• Postmark benchmark– Read operation

• Cases :– Read from physical

partitions in different zones

• Based on LBNs• LBNs start from inner

zone and proceeds towards outer zones.

– Read from disk file (.vmdk)

Page 10: Storage Management in Virtualized Cloud Environments

10

Where to place multiple VM disks ?

• Postmark benchmark– 2 instances (1 for each VM)

• Random reads• Compare physical

partitions placed in different zones– O -> Outer– I -> Inner– M -> Mid

Page 11: Storage Management in Virtualized Cloud Environments

11

Observations

• Customers should purchase storage based on workload requirement, not price

• Thin provisioning may be practiced• Throughput intensive VMs can be placed in outer disk

zones• Multiple VMs that may be accessed simultaneously

should be co-located on disk– CSPs can monitor access patterns and move virtual disks

accordingly

Page 12: Storage Management in Virtualized Cloud Environments

12

Workload Interference

Page 13: Storage Management in Virtualized Cloud Environments

13

CPU-Disk Interference

VM - 1 VM - 2

CPU CPU

DISK DISKDISK

Throughput : 23.4 MB/s

CPU

Throughput : 27.6 MB/s

Performance Difference : 15.3%

Physical Host

Page 14: Storage Management in Virtualized Cloud Environments

14

CPU-Disk Interference

CPU allocation ratios has no effect on disk throughput across VMs

Disk intensive job performs better along with a CPU intensive job

Page 15: Storage Management in Virtualized Cloud Environments

15

Reason ?

Dynamic Frequency Scaling

CPU-Disk Interference

Page 16: Storage Management in Virtualized Cloud Environments

16

CPU-Disk Interference

CPU DFS is enabled in Linux by defaultThree ‘governors’ to control the DFS policy

On-demand (default)Performance Power-save

When 1 core is idle, entire CPU is down-scaled because overall CPU utilization falls

Page 17: Storage Management in Virtualized Cloud Environments

17

Disk-Disk Interference

• 1 instance of Postmark in each VMs• 65.3% more time taken when compared to running

Postmark in a single VM• Overhead mainly attributed to disk seeks : No more

sequential accesses

CPU CPUV.Disk-1 V.Disk-2

Physical Disk

VM-1 VM-2

Physical Host

Page 18: Storage Management in Virtualized Cloud Environments

18

CPU CPUV.Disk-1 V.Disk-2

Disk - 1 Disk - 2

VM-1 VM-2

Disk-Disk Interference

• VMs using separate physical disks• 17.52% more time taken when compared to running

Postmark in a single VM• Overhead attributed to contention in Dom-0’s queue

structures

Physical Host

Page 19: Storage Management in Virtualized Cloud Environments

19

Disk-Disk Interference

• Postmark Benchmark (Reads)

• Cases :– Running in a single VM– 1 instance in each of two

VMs• 2 VMs reading from virtual

disks in same physical disk• 2 VMs reading from virtual

disks in different physical disks

Page 20: Storage Management in Virtualized Cloud Environments

20

Disk-Disk Interference

• IO scheduling policy in Dom-0 has less effect

• ‘Ideal’ case is time taken when running Postmark in single VM

• Other cases are running 1 instance of Postmark in each of 2 VMs (separate physical disks)

Page 21: Storage Management in Virtualized Cloud Environments

21

Disk-Disk Interference

• Interference with respect to workload type

• Synthetic read workload• VMs use separate

physical disks• Cases :

– Mix of sequential versus random reads

• Sequential requests from both VMs flood Dom-0 queue - contention

Page 22: Storage Management in Virtualized Cloud Environments

22

Observations

• CPU-intensive and disk-intensive workloads can be co-located for optimal performance and power

• Virtual disks that may be accessed simultaneously must be placed in separate physical disks

• I/O scheduling in Dom-0 has less effect on disk workload interference

• Two sequential workloads, when co-located suffer in performance due to queue contention

• With separate disks, workload contention is generally minimal, other than the case of two sequential workloads

Page 23: Storage Management in Virtualized Cloud Environments

23

Impacts of Virtualization

Page 24: Storage Management in Virtualized Cloud Environments

24

Sequentiality

• Postmark benchmark (reads)

• No much overhead seen for random disk accesses

• VM overhead is mitigated by larger disk overhead

• More felt for sequential disk accesses

Page 25: Storage Management in Virtualized Cloud Environments

25

Block Size

• Postmark sequential reads

• Fixed overhead with every requests

• As block sizes increase, # of requests are reduced, hence overhead is reduced

• Efficient to read in larger blocks

Page 26: Storage Management in Virtualized Cloud Environments

26

Block size wrt. Locality

Page 27: Storage Management in Virtualized Cloud Environments

27

Observations

• VM overhead is not felt in random workloads – amortized by disk seeks

• Extra layers of indirection is the reason for VM overhead – when block size is large, overhead is amortized

• Block size may be increased only if there is sufficient locality in access

Page 28: Storage Management in Virtualized Cloud Environments

28

Summary

• Storage purchased must depend on requirement, not price!• It is better to place sequentially accessed streams in outer

disk zone• Co-locate virtual disks that may be accessed simultaneously• Co-locate CPU intensive task with disk intensive task for

better power and performance• Avoid co-locating two sequential workloads on single

physical machine – even when it goes to separate physical disks!

• Read in large blocks only when there is locality in workload

Page 29: Storage Management in Virtualized Cloud Environments

29

Questions

Contact : [email protected]