Getput suite

25
A Swift Benchmarking Tool Mark Seger Hewlett Packard Cloud Services 4/19/2013 1 Getput Swift Performance Tools

description

A lot of effort has gone into cloud storage peformance benchmarking, both of swift and other cloud stacks and part of the result is a lot of confusion in the numbers, in large part because there is no standard. This is further complicated because some implementations are written in java, some in python and some in raw curl. Furthermore, the underlying libraries themselves can cause variances as they do not all use the same buffer sizes, enable/disable ssl-compression and probably other parameters as well. I would like to talk about our benchmarking methodologies at HP as well as describe a tool suite I've developed that implements them and share some results of benchmarking our own OpenStack implementation. One thing I've discovered over previous months of testing is that both latency and cpu overhead can have a major impact on performance and those are captured as well, something most tools typically don't report. The tools are written in python and use the OpenStack python-swiftclient library. Speakers Mark Seger

Transcript of Getput suite

Page 1: Getput suite

A Swift Benchmarking Tool

Mark Seger

Hewlett Packard

Cloud Services

4/19/20131

Getput Swift Performance Tools

Page 2: Getput suite

Problem Statement

• Performance Measurements

– Consistent/standard mechanisms for controlled experiments

– Ability to easily modify test parameters

– Minimal installation, configuration and use

– Easy to compare results of multiple runs

– Easy to clean up when done

• Benchmarking – run performance tests at scale

– Repeat tests while increasing demand for resources

– Parallel tests must be coordinated: start/finish together

4/19/2013 2Getput Swift Performance Tools

Page 3: Getput suite

Getput Suite

• Multiple tools organized in a hierarchy

– getput: actual workhorse, runs tests on single client

– gpmaster: coordinates running getput on multiple clients

– gpsuite: defines suites of tests to minimize switches usage

– yourscript: can call gpsuite multiple times when desired

4/19/2013 3Getput Swift Performance Tools

Page 4: Getput suite

getput.py

• Uses swiftclient library

• Lots of switches, lots of different behaviors

– Standalone• Basic: creds, cname, oname, size, num/runtime, tests, rep count

• More: processes, container type: shared/byproc/bynode, latency details, operation logging, and still more

– Multi-node (controlled by gpmaster)• start time, rank

4/19/2013 4Getput Swift Performance Tools

Page 5: Getput suite

gpmaster.py

• Coordinates running of getput on multiple clients– Assures all start together and finish approx together

– Summarizes results as a single line

– Unlike getput only runs 1 test at a time, job for gpsuite

• More required switches than getput– Credentials file

– Rank

– Start time

– Hosts file or single client name, may need ssh key too

– And a few more…

• But rarely run by itself!

4/19/2013 5Getput Swift Performance Tools

Page 6: Getput suite

gpsuite.py

• Removes complexity of running gpmaster

• Think of macros: gpsuite –suite full

– Sets of object sizes, eg: 1k, 10k, 100k, etc

– Numbers of threads, eg: 1, 2, 4, 8, etc• Distributes threads across multiple clients

• Some runs can take hours with a single command

• Cleans up after each run

4/19/2013 6Getput Swift Performance Tools

Page 7: Getput suite

Getput OutputEarliest versionsInst Start End Seconds Tests Num MB/S IOPS Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0

4/19/2013 7Getput Swift Performance Tools

Page 8: Getput suite

Getput OutputEarliest versionsInst Start End Seconds Tests Num MB/S IOPS Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0

Added latency range in later versions

4/19/2013 8Getput Swift Performance Tools

Page 9: Getput suite

Getput OutputEarliest versionsInst Start End Seconds Tests Num MB/S IOPS Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no

Added latency range in later versions

Added CPU and started playing with compression in more recent versions

4/19/2013 9Getput Swift Performance Tools

Page 10: Getput suite

Getput OutputEarliest versionsInst Start End Seconds Tests Num MB/S IOPS Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange Errs Procs OSize %CPU Comp

0 13:59:20 13:59:29 8.57 put 100 0.11 11.67 0.085 0.02-00.22 0 1 10k 0.30 no

0 13:59:29 13:59:33 4.03 get 100 0.24 24.83 0.040 0.04-00.05 0 1 10k 0.39 no

0 13:59:33 13:59:34 1.80 del 100 0.54 55.68 0.018 0.01-00.05 0 1 10k 0.58 no

Added latency range in later versions

Added CPU and started playing with compression in more recent versions

Eventually added latency distribution histogramLatency LatRange Errs Procs OSize 0.0 0.1 0.2 0.3 0.4 0.5

0.106 0.02-00.36 0 10 10k 527 396 67 10 0 0

0.041 0.01-00.07 0 10 10k 1000 0 0 0 0 0

0.031 0.01-00.16 0 10 10k 964 36 0 0 0 0

4/19/2013 10Getput Swift Performance Tools

Page 11: Getput suite

Observations

• Swift multi-scaling excellent– With multiple clients performance grows close to linearly

– With single client and multiple threads• Smaller objects scale very well with even lots of threads

• Larger objects hit either CPU/Network wall!

• Both compression and encryption cost CPU– Limits large object bandwidth, less so with smaller ones

– Early testing: !compression up to 2X boost for large objects• Similar behavior when using http instead of https

– Only just started looking at changing ciphers

Recommendation: make compression, ssl and cipher choice optional in swiftclient

4/19/2013 11Getput Swift Performance Tools

Page 12: Getput suite

Look at the network during tests

segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n1 -s100m -tp --comp

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange

0 15:52:15 15:52:20 5.85 put 1 17.10 0.17 5.800 5.80-05.80

segerm@az1-nv-compute-0001:~$ collectl

waiting for 1 second sample...

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut

0 0 1342 1078 0 0 20 2 0 4 70 56

0 0 261 304 0 0 20 2 0 3 0 2

1 0 580 578 0 0 0 0 0 5 0 3

3 0 4697 780 0 0 0 0 135 2010 15956 11517

4 0 5859 1324 0 0 0 0 138 2345 19037 13708

4 0 5168 609 0 0 48 6 138 2354 19036 13706

4 0 5597 993 0 0 4 1 138 2351 19053 13717

4 0 5129 538 0 0 0 0 139 2366 19053 13716

3 0 4579 1070 0 0 0 0 107 1817 14554 10495

0 0 154 201 0 0 20 2 0 1 0 1

This is always true for uncompressible objects: upload speed ~= network bandwidth

4/19/2013 12Getput Swift Performance Tools

Page 13: Getput suite

Compression can be your friend too

segerm@az1-nv-compute-0001:~$ ./getput.py -cc -oo -n5 -s100m -tp --otype s --comp

Inst Start End Seconds Tests Num MB/S IOPS Latency LatRange

0 16:00:19 16:00:29 10.33 put 5 48.42 0.48 2.060 2.03-02.09

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut

0 0 223 292 0 0 56 9 0 1 0 1

1 0 618 565 0 0 0 0 14 20 2 16

3 0 1380 694 0 0 0 0 14 167 605 317

4 0 1846 1194 0 0 0 0 11 165 508 304

3 1 9799 1008 0 0 12 2 173 2949 848 2949

4 1 11071 996 0 0 0 0 198 3377 607 3376

Another reason to make compression optional!

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut

1 0 1512 523 0 0 16 3 5 36 8 34

8 2 6377 892 0 0 0 0 658 6588 171130 117279

7 2 5488 1835 0 0 8 1 519 4933 150290 103175

6 2 8772 6113 0 0 0 0 744 8679 162089 114059

Look what the proxy is doing

3 Obj Servers

…but only for compressible objects

4/19/2013 13Getput Swift Performance Tools

Page 14: Getput suite

Let’s talk about latency

• Latency metrics originally based on averages– Like coarse monitoring, great for trends but poor for exceptions

– Soon realized more detail was needed

• Consider the following. What does it really mean?– Is the only problem that one entry of 0.083?

4/19/2013 14Getput Swift Performance Tools

Page 15: Getput suite

On closer inspection

• The first 4 entries don’t look too bad

• Even the bottom one isn’t that horrible

4/19/2013 15Getput Swift Performance Tools

Page 16: Getput suite

Ranges shed more light

• Even though first 4 lines have close latencies, look at their max values

• Now we know why line 5 so bad

• Even line 6 has very high max

4/19/2013 16Getput Swift Performance Tools

Page 17: Getput suite

But even that’s not enough

• Min/Max doesn’t tell us how many outliers• Line 2/4 have almost 50 in the .5 bucket• Line 5 has 6 PUTs >4 seconds• Line 6 all over the place

4/19/2013 17Getput Swift Performance Tools

Page 18: Getput suite

Example 1: Latency of 0.04 too high!

• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k were lower!– Great reason to look at more than MB/sec

• After much digging discovered this only applied to object sizes 7888 -> 22469 bytes– This could only have been found by running sets of tests and looking

very closely at the numbers

• What’s going on here?

4/19/2013 18Getput Swift Performance Tools

Page 19: Getput suite

Example 1: Latency of 0.04 too high!

• When looking at 1k, 10k and 100k GETS, noticed IOPS for 10k were lower!– Great reason to look at more than MB/sec

• After much digging discovered this only applied to object sizes 7888 -> 22469 bytes– This could only have been found by running sets of tests and looking

very closely at the numbers

• What’s going on here?– We run pound on proxies to support multiple connection ports

– Proxy does fast get and passes data to pound over loopback address

– Max segsize for loopback >> network MSS

– Eventlet uses 8192 byte buffers

– Nagle algorithm: bytes > 8192 and ~<8192+MSS have delayed ACK

• Eventlet needs bigger buffers? Turn off nagle?4/19/2013 19Getput Swift Performance Tools

Page 20: Getput suite

Example 2: Latency 0.5

• Observed a number of these in small object PUTs

• Caused by a proxy timeout connecting to obj server

• Might be worth looking into ways to reduce and/or not try to re-contact a non-responsive server

4/19/2013 20Getput Swift Performance Tools

Page 21: Getput suite

Example 3: Latency 6 Secs

• These occur less frequently, but do happen

• Traced back to disk error on object server

• BUT the other 2 object servers responded in < 1sec

• Think about how many IOPS are being lost!

Might it be worth it to return after 2 successes?Maybe at least ignore writes to that disk?

4/19/2013 21Getput Swift Performance Tools

Page 22: Getput suite

So what’s next for latency?

• Investigate why some ops have even longer latencies

• Added another switch to getput! --logops

– Extended put_object() to return transaction ID

– Writes detailed log records for every operation

– Makes it possible for longer latency transactions to be traced

segerm@az1-nv-compute-0000:~$ more /tmp/getput-p-0-1363878303.log15:05:03.522 1363878303.521659 1363878303.459080 0.062547 eb4194b73e46f52f774a63fa552755d4 o-0-1-115:05:03.574 1363878303.574005 1363878303.521702 0.052291 eb4194b73e46f52f774a63fa552755d4 o-0-1-215:05:03.627 1363878303.627218 1363878303.574032 0.053174 eb4194b73e46f52f774a63fa552755d4 o-0-1-315:05:03.686 1363878303.686175 1363878303.627244 0.058918 eb4194b73e46f52f774a63fa552755d4 o-0-1-415:05:03.747 1363878303.746874 1363878303.686201 0.060661 eb4194b73e46f52f774a63fa552755d4 o-0-1-515:05:03.804 1363878303.804106 1363878303.746900 0.057194 eb4194b73e46f52f774a63fa552755d4 o-0-1-615:05:03.866 1363878303.866148 1363878303.804133 0.061979 eb4194b73e46f52f774a63fa552755d4 o-0-1-715:05:03.932 1363878303.931911 1363878303.866175 0.065724 eb4194b73e46f52f774a63fa552755d4 o-0-1-8

Recommendation: GET, PUT and DEL calls should return transaction IDs

4/19/2013 22Getput Swift Performance Tools

Page 23: Getput suite

swcmd: a nifty helper utility• One challenge of benchmarking can be LOTs of

containers and objects needing cleanup

– Can have dozens to 100s containers

– Can have Ks to 100Ks of objects

– Swift client too slow for deletes!

• Swift client utility could use some more functionality

– How about displaying numbers of objects in containers?

– Container sizes and even dates?

– When listing containers same things

– What about parallel or even wild card listing/deletes?

• Only parallelizes for >1K objects in a container

• Uses multiprocessing can hit 300-400 deletes/sec

4/19/2013 23Getput Swift Performance Tools

Page 24: Getput suite

Examples

swcmd ls

63482 61M 2013-03-21 16:19:12 qc-1363882747

49 4G 2013-03-09 13:13:36 vlat-1362834811

0 0 2013-03-20 22:05:06 vlat-1363817101

1 10 2013-03-15 13:58:37 xxx-0-0

1 200M 2013-03-11 12:28:16 xyxxy

2 200M 2013-03-11 12:29:01 xyzzy

2901 702M 2013-02-12 16:34:19 zzz

swcmd –p ls xyz # list containers starting with xyz

swcmd –f rc zzz # force removal of zzz even though not empty

swcmd –p pf x # force removal of ALL containers starting with x

Swcmd rm xyzzy/xyzzy # remove specific object

Recommendation: add these types of features to the swift utility

4/19/2013 24Getput Swift Performance Tools

Page 25: Getput suite

Questions?

4/19/2013 25Getput Swift Performance Tools