Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff...

46
Understanding I/O Performance with PATROL- Perform and PATROL- Predict Debbie Sheetz Sr. Staff Consultant BMC Software

Transcript of Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff...

Page 1: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

Understanding I/O Performance with

PATROL-Perform and PATROL-Predict

Debbie Sheetz

Sr. Staff Consultant

BMC Software

Page 2: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

2

I/O Performance Analysis Overview

• I/O metric definitions

• Baseline I/O performance analysis

• What–if I/O performance analysis

Page 3: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

3

How Important is I/O to Performance?

• Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed

• Don’t solve a problem that you don’t have

CPU is the dominant factor here

Page 4: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

4

Source of I/O Metrics

• Key to understanding I/O is to know your metrics • Disks are reported/collected as they are

defined/known to UNIX or NT• This may or may not correspond 1-to-1 to physical

units• Disk configuration is collected from standard interface

for the particular OS• Disk statistics are collected from standard interface for

the particular OS (same metrics used by iostat, etc.)

• Analyze/Predict interprets and reports based on these metrics

Page 5: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

5

I/O Configuration Collection Issues

• Sometimes the disk configuration is reported as “Unknown”

• Three possible causes1. Disk configuration is not available from the OS2. Standard interface to OS fails to return the disk configuration3. Collected disk configuration is not matched by an entry in the

hardware (.hrw) and .odm

• RAID is not collected directly• This DOES NOT AFFECT the baseline metrics or

baseline model calibration• For certain ‘what-if’ disk modeling scenarios, the disk

must be identified

Page 6: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

6

Key I/O Metrics

• A few metrics tell most of the story about disk I/O

• Disk throughput• Data transferred (e.g. bytes, words, etc.)

– Disk reads/writes

• Disk accesses

• Disk utilization (active time)

Page 7: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

7

I/O Metrics: Throughput

• Data transferred (e.g. bytes, words, etc.)• PATROL-Perform and Predict report I/Os in 4 KB

units– Consistency for reporting (Analyze, Visualizer, Predict)

– Ease of modeling I/O cross-node and cross-platform

• Units measured vary by platform– HP, OSF: words Disk Statistics, Words Xfered

– Solaris, AIX: blocks Disk Statistics, Blocks Read/Written

– NT: bytes NT Physical Disk, Disk Read/Write

Page 8: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

8

I/O Metrics: Throughput

• Disk accesses (i.e. transfers) • Number of times an I/O request was made of the disk

– Size of data transfer can vary– Doesn’t matter where the I/O is actually serviced:

» Physical disk (seek, latency, and data transfer)» Cache on the disk» Cache on the disk controller

– Doesn’t matter whether RAID or non-RAID

• Similar metrics collected for UNIX/NT– UNIX Disk Statistics, Transfers– NT NT Physical Disk, Disk Transfers/Sec

Page 9: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

9

I/O Metrics: Throughput

• Disk reads/writes • Number of times a read vs. write I/O request was

made of the disk– Size of data transfer can vary

• Different metrics collected for UNIX/NT– Solaris, AIX Disk Statistics, Blocks Read/Written

– HP, OSF Not Available

– NT NT Physical Disk, Disk Read/Write Bytes/Sec

• Reported in Analyze/Predict in 4 KB=I/O rates

Page 10: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

10

I/O Metrics: Utilization (Active Time)

• Disk utilization (active time) • Amount of time disk was observed to be actively

servicing an I/O request – Doesn’t matter where the I/O is actually serviced:

» Physical disk (seek, latency, and data transfer)» Cache on the disk» Cache on the disk controller

– Doesn’t matter whether RAID or non-RAID

• Should reflect the relative efficiency of I/O processing when compared with disk throughput measures– Use disk service time for this (service time = utilization / IOs)

Page 11: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

11

I/O Metrics: Utilization (Active Time)

• Disk active time • Different metrics collected for UNIX/NT

– UNIX Disk Statistics, Active Time

– NT NT Physical Disk, % Disk Time

– Windows 2000 NT Physical Disk, % Idle Time

• Windows/NT metrics are reinterpreted by Analyze– Perfmon caps calculated utilization at 100%

– Observations of collected Windows/NT disk data show utilizations well over 100%

– Analyze scales all collected NT times down

– Perfmon and Analyze/Predict will not match

Page 12: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

12

I/O Metrics Collection Issues

• If “iostat” can’t see it, the collector can’t collect it • The OS is supplying the metrics

• If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same

• Problem needs to be addressed by the OS vendor

• Refer any questions about valid I/O metrics to BMC Technical Support

• Always need to know the exact platform (e.g. HP 11.00, 64-bit)

• Run iostat and the collector in parallel

• Use current collector for the platform

Page 13: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

13

Baseline I/O Performance Analysis Overview

• Observe key disk I/O metrics from baseline measurements

• Identify I/O patterns• For the system

• For a disk or group of disks– Distribution amongst disks

• For a workload/transaction

• Determine how important I/O is to overall performance

Page 14: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

14

Baseline I/O Performance Analysis Overview

• Observe key disk I/O metrics from baseline measurements

• Identify I/O performance characteristics• Relative speed of I/O processing

• Read/write ratios

• Blocksize used

• Disk utilization objectives– Distribution amongst disks

Page 15: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

15

Baseline Case Study

CPU pattern doesn’t precisely match I/O pattern

Page 16: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

16

Baseline Case Study• I/O is dominated by one oracle instance, but there are other contributors• Study patterns within days and across days, weeks, etc.

Page 17: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

17

Baseline Case Study

• I/O is the major component of response time during prime time

Page 18: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

18

Baseline Case Study

Distribution of I/O amongst disks is fairly even

Page 19: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

19

I/O Analysis Technique: CUTDISK

• How to filter I/O data so only the important disks are studied?

• Use “CUT DISK” feature• In Analyze• In Manager• If already specified in .an file input to Manager, don’t

need Manager specification, too

• Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present

Page 20: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

20

I/O Analysis Technique: CUTDISK

• Concept is to aggregate I/O from less utilized disks, preserve important disks individually

• I/Os are NOT removed from the model

• Choose appropriate threshold• I/O rate or Disk utilization may be used

• Threshold value can be set for a specific purpose– Setting of 0 removes only disks which are not used at all

– Setting of 5% utilization removes most disks

– Paging disks are never removed

Page 21: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

21

I/O Analysis Technique: CUTDISK

• Specify under Options, Cut Disk Options in Analyze

Page 22: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

22

I/O Analysis Technique: CUTDISK

• Specify under Options, Advanced Features in Manager

Page 23: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

23

Baseline Case Study

• Observe Disk Utilization patterns

Utilizations mostly even, most under 40%

Page 24: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

24

Baseline Case Study

• Observe Disk processing efficiency

Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look …

Page 25: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

25

Baseline Case Study

• Look at ssd4

High service time isn’t so high after all: 12.69 transfers divided by 9.85 I/Os is 1.3. That means 12.11 service time is for 1.3 actual data transfers or 9.3 ms per physical transfer.

Page 26: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

26

Baseline Case Study

• Look at ssd3High service time isn’t really high here either: 10.66 transfers divided by 1.37 I/Os is 7.8. That means 53.84 service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB.

Page 27: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

27

Baseline Case Study

• In fact, good (larger) blocksizes explain the good disk performance

These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize

Page 28: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

28

Baseline Case Study Conclusion

• Even though I/O is a major contributor to response time, there are no obvious tuning opportunities

• Continue to study the key I/O metrics over time

• Identify trends in I/O performance

Page 29: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

29

What-if I/O Performance Analysis Overview

• Via the Predict model, you can change:

• I/O patterns• For the system

– Change in workload volume

– Change in the types of workloads

• For a disk or group of disks– Distribution amongst disks

• Change in amount of transaction I/O required

Page 30: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

30

What-if I/O Performance Analysis Overview

• Via the Predict model, you can change:

• I/O performance characteristics• Relative speed of I/O processing

– Disk configuration change

– Blocksize used

Page 31: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

31

What-if I/O Performance Analysis Overview

• Predict shows how this affects performance

• Performance objectives

• Workload/transaction response objectives

• Disk utilization objectives

• Reports I/O patterns

• System

• Distribution amongst disks

• Reports individual disk performance

• Can view results in Predict and/or Visualizer

Page 32: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

32

What-if Case Study

• Management wants to know how performance will change if a new RAID disk technology is implemented

• Study strategy1. Perform Visualizer analysis of baseline I/O

performance characteristics, build baseline model

2. Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”)

3. Use Predict to do ‘what-if’

Page 33: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

33

What-if Case Study: Benchmark Data Analysis

• Benchmark demonstrates substantial I/O rate• Since current system has high I/O rates, a subset of the benchmark will be studied

Page 34: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

34

What-if Case Study: Benchmark Data Analysis• Selected subset of the benchmark

Page 35: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

35

What-if Case Study: Benchmark Data Analysis• Key I/O characteristics: I/Os vs. transfers

Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access

Page 36: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

36

What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: reads vs. writes

Ratio of reads to writes is about 1.5:1

Page 37: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

37

What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: service time for 4 KB I/O

Predominant service time is about .5 ms

Page 38: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

38

What-if Case Study: Benchmark Data Analysis• Key I/O characteristic: service time for 4 KB I/O

View by controller, disks over 5% utilization

Note less efficiency at lower I/O load

Page 39: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

39

What-if Case Study: Change Model

• Only one change is needed in the Predict model

• Set the disk service time/IO according to the benchmark

• DO NOT use the hardware table method because more specific info is available• Hardware table method applies ratio of new disk type

to current disk type

• Both disk types must be in the hardware table

• Baseline disk type must be specified

Page 40: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

40

What-if Case Study: Change Model

• Model must be baselined

• Two methods for changing service time

• Edit the disk service time/IO in the GUI

• Use a command file if there are many disks

Command file format

MODIFY DISK hdisk10

EDISKTIME .5

MODIFY DISK hdisk11

EDISKTIME .5

Etc.

Page 41: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

41

What-if Case Study: Modeling Results

• Model is evaluated and net change is observed

<< Baseline

What–if >>

Page 42: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

42

What-if Case Study: Modeling Results

• Relative reduction in response time reported with relative response time

Reduction of 26% for the workload of interest

Page 43: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

43

What-if Case Study: Modeling Results

• Why not a larger reduction?

New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O

<< Baseline

What-if >>

Page 44: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

44

What-if Case Study: Modeling Results

• What else will improve performance more?

More even I/O distribution in benchmark

Page 45: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

45

What-if Case Study: Modeling Results

• What else will improve performance more?

Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec

Should confirm with more benchmark data and/or vendor

Page 46: Understanding I/O Performance with PATROL- Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software.

C4P075

46

What-if Case Study Conclusion

• Change to new technology will• Reduce I/O service time

• Reduce I/O wait time• From reduced utilization (due to service time decrease)

• From better I/O distribution (due to more even utilizations)

• Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms)

• Allows for additional workload growth compared with current technology