Benchmarking Oracle I/O Performance with ORION -...

73
Benchmarking Oracle I/O Performance with ORION Alex Gorbachev Calgary, AB September 2013

Transcript of Benchmarking Oracle I/O Performance with ORION -...

Benchmarking Oracle I/O Performance with ORION

Alex GorbachevCalgary, ABSeptember 2013

© 2011-2012 Pythian

Alex Gorbachev

• Chief Technology Officer at Pythian

• Blogger

• Cloudera Champion of Big Data

• OakTable Network member

• Oracle ACE Director

• Founder of BattleAgainstAnyGuess.com

• Founder of Sydney Oracle Meetup

• IOUG Director of Communities

• EVP, Ottawa Oracle User Group

2

© 2011-2012 Pythian

Who is Pythian?

3

15 Years of Data infrastructure management consulting

170+ Top brands

6000+ databases under management

Over 200 DBA’s, in 26 countries

Top 5% of DBA work force, 9 Oracle ACE’s, 2 Microsoft MVP’s

Oracle, Microsoft, MySQL, Netezza, Hadoop, MongoDB, Oracle Apps, Enterprise Infrastructure

© 2011-2012 Pythian4

Apply at [email protected]

© 2011-2012 Pythian5

ORION - ORacle I/O Numbers

Generate I/O workload similar to database patterns

&measure I/O performance

© 2011-2012 Pythian6

Orion is designed to

stress testthe I/O subsystem

© 2011-2012 Pythian7

Orion is not perfect for simulation but

good enough

© 2011-2012 Pythian8

Use Orion before moving/deploying databases to the new

platform

© 2011-2012 Pythian9

Orion is used in

two scenarios

© 2011-2012 Pythian10

You have no idea what you needand want to ensure you get

the best you can

You know what you needand want to

ensure you have it

or

The first one is based on capacity planning.The second you can call an infrastructure tuning

© 2011-2012 Pythian

Infrastructure tuning - what’s the goal?

• When you don’t know how much you need you try at least to ensure you take all you can

• Assess what’s your possible bottlenecks• 1 Gbit Ethernet => 100+ MBPS or 10,000+ IOPS (8K)

• 15K RPM disk

• will easily serve 100-150 IOPS with average resp. time <10ms

• can get to 200-250 IOPS but response time increase to 20 ms

• SSD - see vendors specs

• reads: random vs sequential... small vs large... no matter

• writes: pattern matters

11

© 2011-2012 Pythian

Orion

• Uses code-base similar to Oracle database kernel• Standalone binary or part of Oracle home since 11.2.0.1• Standalone Orion downloadable version is only 11.1

• Tests only I/O subsystem• Minimal CPU consumption

• Async I/O is used to submit concurrent I/O requests• Each run includes multiple data points / tests• Scaling concurrency of small and large I/Os

12

© 2011-2012 Pythian

Controlling Orion

• Workload patterns• Small random I/O size and scale

• Large I/O size, scale and pattern (random vs sequential)

• Write percentage• Cache warming• Duration of each test (data point)• Data layout (concatenation vs striping)

13

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x... .............................. .............................. ..............................

14

Each Orion run performs several tests and collects metrics for each test. The set of metrics for one test is a data point. Based on the run configuration, Orion collects several data points scaling concurrency of small random IOs and concurrency of large IOs.

Each data point is defined by the number of concurrent small I/O requests and the number of concurrent large IO streams.

Orion iterates through concurrency of large I/Os from minimal to maximum (which can be the only one depending on the run configuration) and then for each large IO concurrency level, it iterates through concurrency levels of small IOs from minimum to maximum (which can be the only one as well depending on the run configuration). We will see how it these ranges are selected later.

If you look at the matrix then you can imagine this process as running the tests row by row from top to bottom and for each row, the sequence of tests is from left to right. Just like in English writing.

As Orion performs the tests, it writes the results in the trace file and at the end of the test it produces several matrix files with collected metrics.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

15

-run advanced -matrix detailed# of tests = (Xlarge + 1) * (Xsmall + 1)

There are several types of runs. Let’s first look into “advanced” mode and the rest of the runs are simpler versions which present some of the parameters for you. You can think of them as wizard modes.

To define which data points are collected by Orion, the matrix type is defined. Detailed matrix is the most time consuming to run - Orion will test every combination of large and small I/O workload - it will iterate from 0 concurrency level to maximum concurrency level for both large and small IOs.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

16

-run advanced -matrix row -num_large 2# of tests = Xsmall + 1

Matrix row fixes number of concurrent large I/O streams to a configurable number (can be zero) and iterates through concurrency of small IOs.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

17

-run advanced -matrix col -num_small 3# of tests = Xlarge + 1

Matrix col fixes number of concurrent small IOs to a configurable number (can be zero) and iterates through concurrency of large IO streams.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

18

-run advanced -matrix basic# of tests = Xlarge + Xsmall + 1

Matrix basic performs tests of non-mixed small and large workloads.First, Orion iterates through different concurrency levels of small IOs without any large IO streams.Then, Orion iterates through concurrency of large IO streams without any small IOs.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 0, x, x, x, x, x, x, x, x, x x 1, x, x, x, x, x, x, x, x, x x 2, x, x, x, x, x, x, x, x, x x 3, x, x, x, x, x, x, x, x, x x 4, x, x, x, x, x, x, x, x, x x 5, x, x, x, x, x, x, x, x, x x 6, x, x, x, x, x, x, x, x, x x 7, x, x, x, x, x, x, x, x, x x 8, x, x, x, x, x, x, x, x, x x 9, x, x, x, x, x, x, x, x, x x 10, x, x, x, x, x, x, x, x, x x 11, x, x, x, x, x, x, x, x, x, x

19

-run advanced -matrix max# of tests = Xlarge + Xsmall + 1

Matrix max is similar to basic but instead of performing no large IO activity while iterating through small IOs, Orion performs maximum number of large IO streams. The same with iterating through large IO streams concurrency -- Orion will run at maximum concurrent small I/Os.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

20

-run advanced -matrix point -num_large 2 -num_small 3# of tests = 1

Matrix point is the fastest run as it runs exactly one test defined.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

21

-run simple

Non-advanced runs automatically define matrix type as well as most of other parameters.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

22

-run normal

Non-advanced runs automatically define matrix type as well as most of other parameters.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

23

-run oltp

Non-advanced runs automatically define matrix type as well as most of other parameters.

© 2011-2012 Pythian

Data Points Matrix

Large/Small, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9... 0, x, x, x, x, x, x, x, x, x, x... 1, x, x, x, x, x, x, x, x, x, x... 2, x, x, x, x, x, x, x, x, x, x... 3, x, x, x, x, x, x, x, x, x, x... 4, x, x, x, x, x, x, x, x, x, x... 5, x, x, x, x, x, x, x, x, x, x... 6, x, x, x, x, x, x, x, x, x, x... 7, x, x, x, x, x, x, x, x, x, x... 8, x, x, x, x, x, x, x, x, x, x... 9, x, x, x, x, x, x, x, x, x, x... 10, x, x, x, x, x, x, x, x, x, x... 11, x, x, x, x, x, x, x, x, x, x...

24

-run dss

Non-advanced runs automatically define matrix type as well as most of other parameters.

© 2011-2012 Pythian

Orion I/O Performance Metrics

• Small IOs• iops - average number of IOs per second

• {test name}_{date}_{time}_iops.csv

• lat - average IO response time

• {test name}_{date}_{time}_lat.csv

• Large IOs• mbps - throughput MB per second

• {test name}_{date}_{time}_mbps.csv

25

© 2011-2012 Pythian

Sample for -matrix detailed

26

Large/Small, 1, 2, 3, 4, 5 0, 58, 114, 117, 127, 84 1, 11, 29, 49, 63, 81 2, 12, 23, 30, 24, 31

iops

Large/Small, 1, 2, 3, 4, 5 0, 17184.84, 17487.14, 25594.11, 31505.73, 59205.26 1, 88272.75, 66781.92, 60642.59, 62514.76, 61699.40 2, 80854.55, 83085.06, 99019.72, 155528.65, 156500.44

lat (us)

Large/Small, 0, 1, 2, 3, 4, 5 1, 18.35, 12.14, 15.99, 16.99, 16.48, 16.37 2, 29.74, 27.07, 25.19, 21.18, 13.04, 13.33

mbps

Orion 11.1.0.7 and earlier reports response time in ms.11.2.0.1+ reports latency in us (microseconds)

Note how matrix is slightly different:- iops and lat matrix exclude column with zero small IOs- mbps matrix excludes row with zero large IOs

© 2011-2012 Pythian

Sample for -matrix basic

27

Large/Small, 1, 2, 3, 4, 5 0, 80, 153, 165, 163, 197 1 2

iops

Large/Small, 1, 2, 3, 4, 5 0, 12370.09, 13060.23, 18112.16, 24448.27, 25250.33 1 2

lat (us)

Large/Small, 0, 1, 2, 3, 4, 5 1, 31.84 2, 29.87

mbps

© 2011-2012 Pythian

Trace file contentran (small): VLun = 0 Size = 10737418240ran (small): Index = 0 Avg Lat = 22996.61 us Count = 431ran (small): Index = 1 Avg Lat = 23825.39 us Count = 417ran (small): nio=848 nior=652 niow=196 req w%=25 act w%=23ran (small): my 2 oth 1 iops 65 lat 26081 us, bw = 0.51 MBps dur 9.96 s size 8 K, min lat 932 us, max lat 227524 us READran (small): my 2 oth 1 iops 19 lat 14499 us, bw = 0.15 MBps dur 9.96 s size 8 K, min lat 1422 us, max lat 120529 us WRITEran (small): my 2 oth 1 iops 85 lat 23404 us, bw = 0.66 MBps dur 9.96 s size 8 K, min lat 932 us, max lat 227524 us TOTAL

seq (large): VLun = 0 Size = 10737418240seq (large): Index = 0 Avg Lat = 22038.99 us Count = 450seq (large): Stream = 0 VLun = 0 Start = 2675965952 End = 3152019456seq (large): Stream = 0 Avg Lat = 22038.99 us CIO = 1 NIO Count = 450seq (large): nio=450 nior=450 niow=0 req w%=25 act w%=0seq (large): my 1 oth 2 iops 45 lat 22039 us, bw = 45.22 MBps dur 9.95 s size 1024 K, min lat 9976 us, max lat 223534 us READseq (large): my 1 oth 2 iops 0 lat 0 us, bw = 0.00 MBps dur 9.95 s size 1024 K, min lat 18446744073709551614 us, max lat 0 us WRITEseq (large): my 1 oth 2 iops 45 lat 22039 us, bw = 45.22 MBps dur 9.95 s size 1024 K, min lat 9976 us, max lat 223534 us TOTAL

28

Separate read and write statistics.Actual write percentage is important for sequential large I/O because it assigns streams to write or read.IOPS, LAT and MBPS are actually calculated for all types of IO but matrix doesn’t report them all.

Can parse trace file to extract all statistics available.

Note: write stats for large sequential IO is bogus since there was no writes done.

© 2011-2012 Pythian

ConcurrentI/O requests

=number of

outstanding I/Os

29

Separate process for large and small I/Os

For each task, Orion forks 2 separate processes performing large and small IOs. If only large or only small IOs are performed then only one process is forked.

© 2011-2012 Pythian

Setting Scale of Concurrent I/Os

• Range of concurrency is {0..max}• unless specified with -num_small or -num_large or fixed by run type

• max for small IOs• num_disks * 5 for advanced, simple and normal runs

• num_disks * 20 for OLTP run

• max for large IOs• num_disks * 2 for advanced, simple and normal runs

• num_disks * 15 for DSS run

30

© 2011-2012 Pythian

OLTP and DSS runs are impractical*

• Range of concurrency is {0..max}• unless specified with -num_small or -num_large or fixed by run type

• max for small IOs• num_disks * 5 for advanced, simple and normal runs

• num_disks * 20 for oltp run

• max for large IOs• num_disks * 2 for advanced, simple and normal runs

• num_disks * 15 for dss run

31

20 steps with interval num_disks{num_disks..num_disks*20)

To much concurrency

15 steps with interval num_disks{num_disks..num_disks*15)

* 11.2.0.3 behavior

© 2011-2012 Pythian

Orion command-line syntaxrequired arguments: -testname & -run

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -num_streamIO {num} \ -size_large {Kb} \ -type rand|seq \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

32

Defines input file with the list of disks {testname}.lun in the current directory# cat mytest.lun/dev/sdc/dev/sdd/dev/sde

This is the full command-line syntax.The two parameters that are always required are -testname and -run.

-testname identifies the only input file that Orion needs with the list of disks - each disk is a path on the new line. The file name must be testname with added .lun extension and the file must be in the current directory. Orion will also prefix the output results with testname.

-run defines types of Orion run and the rest of parameters depend on it.

© 2011-2012 Pythian

Orion command-line syntax-run normal

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration 60 \ -num_disks {disks} \ -num_large {num} \ -type rand \ -num_streamIO {num} \ -size_large 1024 \ -num_small {num} \ -size_small 8 \ -simulate concat \ -stripe 1 \ -write 0 \ -cache_size {MB} \ -verbose

33

this is presetthis can’t bethis can be set

For -run normal, Orion sets most of the parameters to predefined value and you can only specify -num_disks, -cache_size and -verbose.

© 2011-2012 Pythian

Orion command-line syntax-run simple

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration 60 \ -num_disks {disks} \ -num_large {num} \ -type rand \ -num_streamIO {num} \ -size_large 1024 \ -num_small {num} \ -size_small 8 \ -simulate concat \ -stripe 1 \ -write 0 \ -cache_size {MB} \ -verbose

34

this is presetthis can’t bethis can be set

-run simple has identical settings but the the -matrix is basic.

© 2011-2012 Pythian

Orion command-line syntax-run oltp

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large 0 \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

35

this is presetthis can’t bethis can be set

-run oltp (make sure it’s lower case) lets you specify most of the other parameters but you really only need to care about parameters affecting small IOs. Defaults are used if you don’t define a specific value.

© 2011-2012 Pythian

Orion command-line syntax-run dss

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small 0 \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

36

this is presetthis can’t bethis can be set

-run dss (make sure it’s lower case) lets you specify most of the other parameters (except switching to sequential large IO streams) but parameter controlling small IOs don’t matter. Defaults are used if you don’t define a specific value.

© 2011-2012 Pythian

Orion command-line syntax-run advanced -matrix detailed | basic | max

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

37

this can’t bethis can be set

Run -advanced is the most flexible mode and depending on the matrix type selected, most of the parameters can be specified. When selecting -matrix detailed, basic or max, Orion selects concurrency ranges for large and small IOs based on -num_disks so -num_large and -num_small cannot be set explicitly.

© 2011-2012 Pythian

Orion command-line syntax-run advanced -matrix col

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

38

this must be setthis can’t bethis can be set

When selecting -matrix col (for column), you must specify -num_small to define the column of data points to collect while -num_large is not relevant.

© 2011-2012 Pythian

Orion command-line syntax-run advanced -matrix row

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

39

this must be setthis can’t bethis can be set

-matrix row is reverse to col - you must specify -num_large to define the row of data points to collect while -num_small is not relevant.

© 2011-2012 Pythian

Orion command-line syntax-run advanced -matrix point

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

40

this must be setthis can’t bethis can be set

To specify -matrix point, you need to explicitly set both -num_small and -num_large to identify the data point to collect.

© 2011-2012 Pythian

Orion command-line syntax-simulate raid0

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

41

Great way to simulate ASM

striping

Parameter -simulate controls how Orion treats multiple disks and it has two options:1. “concat” - all disks are concatenated sequentially into one single virtual disk against which Orion submits IO requests.2. “raid0” - Orion organizes a sing virtual disk by striping across all disks defined in the testname.lun file using stripe size that can be set by -stripe parameter (default 1Mb). This is the best way to simulate ASM striping.

© 2011-2012 Pythian

Orion command-line syntax-type seq

orion -testname {testname} \ -run advanced | normal | simple | oltp | dss \ -matrix detailed | col | row | basic | max | point \ -duration {seconds} \ -num_disks {disks} \ -num_large {num} \ -type rand|seq \ -num_streamIO {num default 4} \ -size_large {Kb} \ -num_small {num} \ -size_small {Kb} \ -simulate concat|raid0 \ -stripe {Mb} \ -write {%} \ -cache_size {MB} \ -verbose

42

Parameter -type controls large IO pattern:1. “rand” - Orion performs large IOs across randomly selecting the offset for each IO request from the whole virtual disk.2. “seq” - Orion establishes multiple sequential IO streams starting from predefined offsets of the virtual disk (that produced by concatenating or striping). The starting offsets are selected at the beginning of each test by splitting the virtual disks in equal chunks of number of concurrent stream.

© 2011-2012 Pythian

Orion Sequential I/O

43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5

-num_streamIO 1

-num_streamIO 4

Schedule one

IO request

and wait

Schedule o

ne

IO request

and wait

Schedule one

IO request and wait

Schedule fourIO requests and wait

Schedule fourIO requests and wait

Schedule fourIO requests and wait

Each stream can also have multiple IO threads simulated (and by default there are 4 threads). Thus when you are testing sequential large IO, your real number of concurrent IO requests might actually by much higher than you think because of default value for -num_streamIO set to 4.

© 2011-2012 Pythian

What I/O in Oracle behaves like -num_streamIO 4?

• Some examples:• serial direct parallel read

• ARCH reads of redo logs

• some operations with temporary segments

• How do you verify/know?• Enable 10046 trace and OS trace (strace/truss/tusc)

44

* needs verification *

© 2011-2012 Pythian

Orion Flexibility (Inflexibility?)

45

• Single Orion run is enough to assess scalability at defined settings

• Need several separate Orion runs to vary• write %

• large IO pattern

• IO size

• striping

• Need multiple concurrent runs to• simulate more complex IO patterns

• simulate RAC

Orion has lots of flexibility in the settings. However, for a single run there is very limited control on data points collected. Variation of any settings other then concurrency requires separate Orion runs.When simulating more complex scenarios, you would also need to combine multiple run and make sure they are running in sync. To simplify synchronization, you would use -matrix point. Otherwise, sync different data points is a nightmare especially that Orion can’t be used to collect the same data point multiple times over and over in the same run while another run (or runs) iterates through other data points.

© 2011-2012 Pythian

Scenarios: OLTP traffic

• -run advanced -matrix row -large_num 0• Shadow processes’ “db file sequential reads”

• DBWR’s “db file parallel write”

• Optionally several runs with different settings like -write %

• Analyze IOPS & response time

46

Instead of using “-run oltp” use advanced run settings. This run will simulate random reads that foreground processes are doing as well as background random writes performed by DBWR.

One almost universally good variation to drill into is write percentage - this will let you assess how well I/O subsystem can handle random writes as opposed to random reads. These tests usually show that no matter what storage vendors claim about their super smart storage arrays and caching algorithms, sustained random writes ruin any parity based mirroring.

© 2011-2012 Pythian

Scenarios: OLTP traffic visualizationOracle Database Appliance example

47

0

1,000

2,000

3,000

4,000

5,000

1 2 3 4 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

5

10

15

20

25ODA: Small IOPS scalability / HDDs

Thro

ughp

ut, I

OPS

IO R

espo

nse

Tim

e, m

s

IOPS Response Time

This is an example of the first Orion run of Oracle Database Appliance to assess OLTP traffic scalability for read only workload.

© 2011-2012 Pythian

Scenarios: OLTP traffic variation analysisVarying write percentage in ODA

48

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 4000

10

20

30

40

50

60

70

80Small IOPS by writes percentage Oracle Database Appliance / OLPT / whole HDDs

Thro

ughp

ut, I

OPS

Concurrent IO requests

IO R

espo

nse

Tim

e, m

s

IOPS wrt 0% IOPS wrt 10% IOPS wrt 20% IOPS wrt 40% IOPS wrt 60%Latency wrt 0% Latency wrt 10% Latency wrt 20% Latency wrt 40% Latency wrt 60%

Now let’s introduce variable write percentage and assess the impact. Because ODA doesn’t use any RAID technology, we see almost no degradation.However, since ASM will be doing host based tripple mirroring (for this purpose comparable to RAID1), this IOPS metrics are from disks perspective and not from the database perspective. We need to adjust IOPS and write percentage to see the numbers from database perspective after ASM mirroring.

© 2011-2012 Pythian

Scenarios: OLTP traffic variation analysisWrite percentage adjusted for ASM mirroring

49

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 4000

10

20

30

40

50

60

70

80Small IOPS by writes percentage Oracle Database Appliance / OLPT / whole HDDs

Thro

ughp

ut, I

OPS

Concurrent IO requests

IO R

espo

nse

Tim

e, m

s

IOPS wrt 0% IOPS wrt 4% IOPS wrt 8% IOPS wrt 18% IOPS wrt 33%Latency wrt 0% Latency wrt 4% Latency wrt 8% Latency wrt 18% Latency wrt 33%

This is the adjusted IOPS and percentage values.

© 2011-2012 Pythian

Impact of writes on RAID5 is huge40% writes => 4 times lower IOPS

50

Here is an explicit example of RAID5 shortcomings.

© 2011-2012 Pythian

Same disks reconfigured as RAID1+040% writes => less than 50% hit

51

Much less writes impact with RAID10 which actually becomes noticeable closer to saturation point anyway.

© 2011-2012 Pythian

Scenarios: Data Warehouse queries

• -run advanced -matrix col -small_num 0• Keep read only (-write 0)• Concurrent users environment• -type rand

• Single dedicated user performance• -type seq

• -num_streamIO 1• Most reads in the DB are synchronous

• Analyze MBPS

52

To simulate data warehousing workload from concurrent users use read only workload with random large reads. Even though individual queries might be scanning tables in more sequential manner, the high concurrency level makes them look like random.Environments with low concurrency levels will probably look more like multiple sequential scan streams.For data warehouse performance you are normally interested in the scan throughput measured as MB per second.

© 2011-2012 Pythian

Scenarios: Data Warehouse IO visualization

53

0

75

150

225

300

1 2 4 6 8 10 12 14 18 20 22 24 26 28 30 32

Large IOs throughput

Thro

ughp

ut, M

BPS

Concurrent threads

Simple way to visualize.You could also add throughput per reading stream to see performance that each user doing serial scans will get, for example.

© 2011-2012 Pythian

Scenarios: RMAN backup

• -run advanced -matrix col -small_num 0 -type seq -num_streamIO 1

• Backup source only => -write 0• Backup destination only => -write 100• Database and backup destination combined => -write 50• Watch for actual write percentage

• 1 thread => 0% actual writes

• 2 threads => 50% actual writes

• 3 threads => 33% actual writes

• 4 threads => 50% actual writes and etc...

• Analyze MBPS

54

No backup compression overhead accounted for.

Orion will be actually more aggressive sending IO requests because it will keep either writing non-stop or reading non-stop while an RMAN process needs to read and write, read and write, read ...

© 2011-2012 Pythian

Scenarios: LGWR writes

• -run advanced -matrix point -small_num 0 -type seq -num_streamIO 1 -write 100 -num_large 1 -size_large 5• -size_large should be set to average LGWR write size which is often

about 5-20k for OLTP systems

• -num_large n• multiple instances

• multiple LGWR threads in RAC

• redo logs multiplexing

• Analyze IOPS and response time• Gather from Orion run’s trace file

55

© 2011-2012 Pythian

Scenarios: LGWR writes visualization

56

0

1600

3200

4800

6400

8000

2 4 6 8 10 12 14 160

0.20

0.40

0.60

0.80

1.00ODA SSD sequential 32K IO streams (tripple mirroring)

Writ

es p

er s

econ

d

Concurrent Threads

Aver

age

Resp

onse

tim

e, m

s

IOPS Response Time, ms

Because you can’t throttle down each thread, each thread will go as fast as it can so you you always pushing some kind of a limit and you will be throttled by the maximum what an I/O subsystem can deliver or by CPU but Orion consumes very little CPU so you ignore it.

© 2011-2012 Pythian

Combining different workloads

• Start multiple parallel Orion runs• OLTP -matrix point -num_large 0 -num_small X• LGWR -matrix point -num_large 1 -num_small 0 -write 100• ARCH -matrix point -num_small 0 -write {0 | 50}• RMAN - matrix point -num_small 0 -write {0 | 50}• Add batch data load with large parallel writes• Add batch reporting (DW-like) with large reads

57

Cannot throttle IO other than controlling number of outstanding IOs

Cannot schedule a run with repetitive data points - must schedule multiple consecutive runs

Combining multiple runs is only reliable if using -matrix point.

© 2011-2012 Pythian

EC2 large 5 EBS disks: first run to test scalability

58

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25IOPS

Response time, ms156 326 178 411 532 729 928 1,103 1,023 1,070 964 1,202 1,285 1,232 1,204 1,245 1,352 1,338 1,360 1,149 1,379 1,327 1,334 1,362 1,3636.4 6.1 10.2 9.7 9.3 8.2 7.5 7.2 8.8 9.3 11.4 10 10.1 11.3 12.4 12.8 12.6 13.4 14 17.3 15.2 16.5 17.2 17.6 18.2

0

375

750

1,125

1,500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 250

5

10

15

20Initial OTLP test with 5 disks and 20% writes

IOPS

Number of concurrent IOs

Aver

age

resp

onse

tim

e, m

s

IOPS Response time, ms

My initial run gives me general idea how my subsystem would scale under different OLTP load with 20% writes. If I’m curios, I might go further and perform few runs with different write percentage and visualize the difference.

© 2011-2012 Pythian

Let’s mix in additional I/O workloads

DUR=60

# OLTP test of scalability - original first run# /root/orion11203/orion -testname baseoltp -run advanced -duration $DUR \ -matrix row -num_large 0 -write 20

# OLTP point/root/orion11203/orion -testname oltp -run advanced -duration $DUR -matrix point \ -num_large 0 -num_small 10 -write 20 &# Adding LGWR/root/orion11203/orion -testname lgwr -run advanced -duration $DUR -matrix point \ -num_large 1 -num_small 0 -type seq -num_streamIO 1 -size_large 5 -write 100 &# Adding ARCH/root/orion11203/orion -testname arch -run advanced -duration $DUR -matrix point \ -num_large 2 -num_small 0 -type seq -num_streamIO 1 -size_large 1024 -write 50 &# Backup in 1 channel# /root/orion11203/orion -testname backup -run advanced -duration $DUR -matrix point \ -num_large 1 -num_small 0 -type seq -num_streamIO 1 -size_large 1024 -write 0 &# Backup in 4 channels# /root/orion11203/orion -testname backup -run advanced -duration $DUR -matrix point \ -num_large 4 -num_small 0 -type seq -num_streamIO 1 -size_large 1024 -write 0 &

wait

59

The first commented out command I used to assess initial scalability and build the run visualized on the previous slide.

I then take it and convert to “OLTP point” and run it.Next step I add “Adding LGWR” to run in parallel.After that I add ARCH and collect another data point and etc.

Note that they are all starting at the same time and run in parallel in the background and the script waits for all background jobs to complete at the end using “wait” command.

© 2011-2012 Pythian60

OLTP IOPS Response time, ms LGWR writes LGWR write, msOLTP only

OLTP +LGWROLTP+LGWR+ARCH

OLTP+LGWR+RMAN1OLTP+LGWR+RMAN4

1306 7.71239 8.1 139 7.1576 17.4 17 56.0778 12.8 38 26.1571 17.5 49 20.3

0

300

600

900

1200

1500

OLTP onlyOLTP +LGWR

OLTP+LGWR+ARCH

OLTP+LGWR+RMAN1

OLTP+LGWR+RMAN40

4

8

12

16

20

IOPS

Resp

onse

tim

e, m

s

OLTP IOPS Response time, ms

0

30

60

90

120

150

0

15

30

45

60

75

LGW

R w

rites

per

sec

ond

LGW

R w

rite,

ms

LGWR writes LGWR write, ms

EC2... visualizing combined workload impact

I can then record how my OLTP traffic is affected in different scenarios including LGWR performance.

© 2011-2012 Pythian

The best Orion 11.2 new feature

61

Bucket LGWR no ARCH

LGWR with ARCH

0 - 128 128 - 256 256 - 512 512 - 1024 1024 - 2048 2048 - 4096 4096 - 8192 8192 - 16384 16384 - 32768 32768 - 65536 65536 - 131072 131072 - 262144 262144 - 524288 524288 - 10485761048576 - 2097152

0 00 00 0

1085 13376 8395 1845 0

1406 21115 161161 699

4 1690 171 100 20 1

no ARCH

0 -

128

128

- 256

256

- 512

512

- 102

4

1

024

- 204

8

2

048

- 409

6

4

096

- 819

2

8

192

- 163

84

1

6384

- 32

768

3

2768

- 65

536

6

5536

- 13

1072

13

1072

- 26

2144

26

2144

- 52

4288

52

4288

- 10

4857

6

1048

576

- 209

7152

with ARCH

Histograms!

© 2011-2012 Pythian

Got RAC? Schedule parallel runs on each node

62

HP bladesHP Virtual ConnectFlex10Big NetApp box100 disks

© 2011-2012 Pythian

Example of Failed ExpectationsNetApp NAS, 1 Gbit Ethernet, 42 disks

63

0

1000

2000

3000

4000

5000

1 2 3 4 5 10 20 30 40 50 60 70 80 90 1000

7.5

15.0

22.5

30.0

IOPS

Late

ncy,

ms

IOPS Latency

Read only

0

1000

2000

3000

4000

5000

1 2 3 4 5 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

IOPS

Late

ncy,

msRead write

© 2011-2012 Pythian

Tune-Up ResultsSwitched from Intel to Broadcom NICs and disabled snapshots

64

0

2000

4000

6000

8000

10000

1 2 3 4 5 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

IOPS

Late

ncy,

ms

IOPS Latency

0

2500

5000

7500

10000

12500

15000

1 2 3 4 5 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

IOPS

Late

ncy,

ms

© 2011-2012 Pythian

Possible “What-If” scenarios

65

• Impact of a failed disk in a RAID group• Different block size• Different ASM allocation unit size (-stripe)• Assess foreign workload impact (shared SAN with other servers)

• Test impact of configuration / infrastructure changes• Impact of backup or a batch job• Impact of decreased MTTR target (higher -write %)• Platform stability test (repeating the same data point for many days)

• Impact of CPU starvation

© 2011-2012 Pythian

Concurrent IOs on axis X is not always the best...

66

0

1,200

2,400

3,600

4,800

6,000

1 2 3 4 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000

5

10

15

20

25ODA: Small IOPS scalability and data placement / HDDs

Thro

ughp

ut, I

OPS

IO R

espo

nse

Tim

e, m

s

IOPS whole disk IOPS outside 40% IOPS inside 60%Latency whole disk Latency outside 40% Latency inside 60%

© 2011-2012 Pythian

Smarter presentation50% IOPS at the same response time

67

0

1200

2400

3600

4800

6000

0 5 10 15 20 25

ODA: Improving IO throughput by data placement

IOPS

IO response time

whole diskoutside 40%inside 60%

© 2011-2012 Pythian

Storage types

• Anything as long as ASYNC IO is supported

• Local storage (LUNs or filesystem)• NAS via NFS• iSCSI / FC devices (any block or raw device)• Cluster filesystem should work just fine

68

© 2011-2012 Pythian

Beware of thin provisioning and other NAS magic

• Smart storage technologies play bad jokes• If in doubt - “initialize” disks with non-zeroes

69

© 2011-2012 Pythian

Orion 11.2

70

11.2.0.1 11.2.0.2 11.2.0.3

libcell11.so x x x

libclntsh.so.11.1 x x

libskgxp11.so x x

libnnz11.so x x

Included in• Database

• Grid home

• Client (tested Administrative option)

Dependencies

© 2011-2012 Pythian

Orion with SLOB (Silly Little Oracle Benchmark)

• Orion gives more control• Orion is easier to setup• Orion uses very little CPU - it doesn’t do anything with data• Easier to saturate IO subsystem without CPU starvation

• Less realistic results if you want to account database CPU use for LIO and processing the data

• Less realistic for multiprocess orchestration

• SLOB - is more realistic but more difficult to control

71

© 2011-2012 Pythian72

Visualization is the Key

© 2011-2012 Pythian

Thank you and Q&A

73

http://www.pythian.com/news/

http://www.facebook.com/pages/The-Pythian-Group/

http://twitter.com/pythian

http://www.linkedin.com/company/pythian

1-866-PYTHIAN

[email protected] or [email protected]

To contact us…

To follow us…[email protected]