Disk IO Benchmarking in shared multi-tenant environments

43
Disk IO Benchmarking in shared multi-tenant environments Rodrigo Campos [email protected] - @xinu

description

Disk IO Benchmarking in shared multi-tenant environments, Presented at CMG Brasil 2013

Transcript of Disk IO Benchmarking in shared multi-tenant environments

Page 1: Disk IO Benchmarking in shared multi-tenant environments

Disk IO Benchmarking in shared multi-tenant

environments

Rodrigo [email protected] - @xinu

Page 2: Disk IO Benchmarking in shared multi-tenant environments

Agenda

• Considerations about IO performance benchmarks

• Some available tools

• Problems

• Proposed solution & results

• Conclusions

Page 3: Disk IO Benchmarking in shared multi-tenant environments

Considerations

How most people think it is...

Process

Disk

Page 4: Disk IO Benchmarking in shared multi-tenant environments

ConsiderationsPrivate / single-tenant system

Process

Disk

Kernel IO Interface

Disk Controller Disk Controller Disk Controller

DiskDisk

Page 5: Disk IO Benchmarking in shared multi-tenant environments

ConsiderationsPrivate / single-tenant system

Process

Disk

Kernel IO Interface

Disk Controller Disk Controller Disk Controller

DiskDisk

Cache

Cache

Cache

Page 6: Disk IO Benchmarking in shared multi-tenant environments

ConsiderationsShared multi-tenant system (simplified view)

Caches... Caches Everywhere

Process Kernel Virtualization Layer

KernelNetwork Interface

Process

Kernel

IO Interface

Disk Controller

Process

Kernel

IO Interface

Disk Controller

Process

Kernel

IO Interface

Disk Controller

Process Kernel

Process Kernel

Disk Disk Disk

Disk Controller

SSDDisk Controller

Disk

Disk Controller

Disk

Page 7: Disk IO Benchmarking in shared multi-tenant environments

• Linux Buffers & Caches

• Buffers: Filesystem Metadata + Active in-flight pages

• Caches: File contents

Page 8: Disk IO Benchmarking in shared multi-tenant environments

• Kernel Tunables (pdflush)

• /proc/sys/vm/...

Page 9: Disk IO Benchmarking in shared multi-tenant environments

Some available tools

• Too many to name but some are popular:

• iozone, fio, dd, hdparm, bonnie++

• http://bitly.com/bundles/o_4p62vc3lid/4

Page 10: Disk IO Benchmarking in shared multi-tenant environments

Problems

Most published benchmarks measured the environment only once, at a single point in time!

Page 11: Disk IO Benchmarking in shared multi-tenant environments

Problems

Some tools have become so complex that is now almost impossible to reproduce results consistently

Page 12: Disk IO Benchmarking in shared multi-tenant environments

Proposed solution

• Create a simple yet effective tool to measure performance

• Define a reproducible methodology for long-term testing

Page 13: Disk IO Benchmarking in shared multi-tenant environments

Language

• Need for access to low-level system calls

• Low abstraction level

• Choice: C

Page 14: Disk IO Benchmarking in shared multi-tenant environments

Requisites

•Keep it simple!

• One process

• One thread

• One workload file

Page 15: Disk IO Benchmarking in shared multi-tenant environments

What it does?

• Serial Write

• Serial Read

• Random Rewrite

• Random Read

• Mixed Random Read & Write

Page 16: Disk IO Benchmarking in shared multi-tenant environments

Mitigating buffers

• It is impossible to avoid buffering at all levels in a non-proprietary system

• But we can use posix_fadvise & Direct IO to mitigate local kernel buffers

Page 17: Disk IO Benchmarking in shared multi-tenant environments

posix_fadvise

int posix_fadvise(int fd, off_t offset, off_t len, int advice);

“Programs can use posix_fadvise() to announce an intention to access file data in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations.”

Page 18: Disk IO Benchmarking in shared multi-tenant environments

posix_fadvise

POSIX_FADV_DONTNEED attempts to free cached pages associated with the specified region.

Page 19: Disk IO Benchmarking in shared multi-tenant environments

posix_fadvise

/* *TRY* to minimize buffer cache effect *//* There's no guarantee that the file will be removed from buffer cache though *//* Keep in mind that buffering will happen at some level */if (fadvise == true){ rc = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); ...

Page 20: Disk IO Benchmarking in shared multi-tenant environments

posix_fadviseTest DONTNEED NORMAL Difference

Write

Read

Rewrite

Reread

Random

5,82 6,05 0,96

0,163 0,017 9,59

3,037 2,993 1,01

1,244 0,019 65,47

2,403 1,559 1,54

100Mbytes file - 4k BS -XFS - 20 run average

Page 21: Disk IO Benchmarking in shared multi-tenant environments

posix_fadviseTest DONTNEED NORMAL Difference

Write

Read

Rewrite

Reread

Random

5,82 6,05 0,96

0,163 0,017 9,59

3,037 2,993 1,01

1,244 0,019 65,47

2,403 1,559 1,54

100Mbytes file - 4k BS -XFS - 20 run average

6,0 GB/s

5,2 GB/s

Page 22: Disk IO Benchmarking in shared multi-tenant environments

Transfer Rates

• SSD transfer rates typically range from 100MB/s to 600MB/s

• Something is wrong...

Page 23: Disk IO Benchmarking in shared multi-tenant environments

Synchronous IO

int open(const char *pathname, int flags, mode_t mode);

O_SYNC

The file is opened for synchronous I/O. Any write(2)s on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware. But see NOTES below.

Page 24: Disk IO Benchmarking in shared multi-tenant environments

Direct IO

int open(const char *pathname, int flags, mode_t mode);

O_DIRECT

Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers.

Page 25: Disk IO Benchmarking in shared multi-tenant environments

Direct IO

flags = O_RDWR | O_CREAT | O_TRUNC | O_SYNC;if (directIO == true){ myWarn(3,__FUNCTION__, "Will try to enable Direct IO"); flags = flags| O_DIRECT;}

Page 26: Disk IO Benchmarking in shared multi-tenant environments

Notes below

Most Linux file systems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which require only actual file data and meta-data necessary to retrieve it to be on disk by the time the system call returns.

Page 27: Disk IO Benchmarking in shared multi-tenant environments

ResultsTest -Direct IO (s) +Direct IO (s) Difference

Write

Read

Rewrite

Reread

Random

5,82 6,640 0,88

0,163 2,197 0,07

3,037 2,905 1,05

1,244 2,845 0,44

2,403 2,941 0,82

100Mbytes file - 4k BS -XFS - 20 run average

Page 28: Disk IO Benchmarking in shared multi-tenant environments

ResultsTest +Direct IO (s) MB/s

Write

Read

Rewrite

Reread

Random

6,640 15,79

2,197 47,72

2,905 36,09

2,845 36,85

2,941 35,64

100Mbytes file - 4k BS -XFS - 20 run average

Page 29: Disk IO Benchmarking in shared multi-tenant environments

iomeltIOMELT Version 0.71Usage: -b BYTES Block size used for IO functions (must be a power of two) -d Dump data in a format that can be digested by pattern processing commands -D Print time in seconds since epoch -h Prints usage parameters -H Omit header row when dumping data -n Do NOT convert bytes to human readable format -o Do NOT display results (does not override -d) -O Reopen worload file before every test -p PATH Directory where the test file should be created -r Randomize workload file name -R Try to enable Direct IO -s BYTES Workload file size (default: 10Mb) -v Controls the level of verbosity -V Displays version number

-b and -s values can be specified in bytes (default), Kilobytes (with 'K' suffix), Megabytes (with 'M'suffix), or Gigabytes (with 'G' suffix) Unless specified, block size value is the optimal block transfer size for the file system as returned by statvfs

Page 30: Disk IO Benchmarking in shared multi-tenant environments

iomelt

• Available at http://iomelt.com

• Fork it on GitHub:

• https://github.com/camposr/iomelt

• Artistic License 2.0

• http://opensource.org/licenses/artistic-license-2.0

Page 31: Disk IO Benchmarking in shared multi-tenant environments

Methodology

How to measure the performance of several instance types on different regions for long periods of time?

Page 32: Disk IO Benchmarking in shared multi-tenant environments

Methodology

1. Create a single AMI

1.1.Update Kernel, compiler and libraries

2. Replicate it in several regions and different instance types:

2.1.m1.small

2.2.m1.medium

2.3.m1.large

Page 33: Disk IO Benchmarking in shared multi-tenant environments

Methodology

Source: http://amzn.to/12zSyZV

Page 34: Disk IO Benchmarking in shared multi-tenant environments

Methodology

Schedule a cron job to run every five minutes*/5 * * 8 * /root/iomelt/iomelt -dor >> /root/iomelt.out 2>&1

Page 35: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 36: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 37: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 38: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 39: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 40: Disk IO Benchmarking in shared multi-tenant environments

Results

Page 41: Disk IO Benchmarking in shared multi-tenant environments

Results

For a complete list of results:

http://bit.ly/19L9xm2

Page 42: Disk IO Benchmarking in shared multi-tenant environments

Conclusions

• Shared multi-tenant environments create new challenges for performance analysis

• Traditional benchmark methodologies are not suitable for these environments

• Excessive versatility in most available tools make it hard to get reproducible measurements

Page 43: Disk IO Benchmarking in shared multi-tenant environments

Conclusions

• Performance (in)consistency must be considered when designing systems that will run in the cloud

• “What you don’t know might hurt you”