Post on 27-Nov-2014
Insight 2008 – NetApp Confidential Limited Use
Performance Analysis – Understanding Perfstat Data
Spencer G. Watson
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 2
Agenda
Introductions
Tools and Data Collection
Level 1 – “Quick Look Analysis”
Process & Workflow
Level 2 – Perfstat “Deeper Analysis”
Responsibilities & Round-Up
Data Review & Translation
Q & A
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 3
Objectives
After completing this session you will be able to:
– Collect and Analyse Performance Data
– Monitor Performance
– Perform Bottleneck Analysis
– Make Recommendations
– Know When to Sell Chargeable PS Performance Analysis
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 4
Performance Analysis – The Basics
Why Monitor Performance?– Pre-sales sizing for new environments
– Additional workload sizing
– Replacement of older systems
– Analyse system headroom
– Increases customer satisfaction
Don’t Panic!
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 5
How Do You See Performance?
How do you see performance issues?
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 6
1st Rule for Performance Cases – No Fear
1 =
2 =
3 =
4 =
5 =
6 =
7 =
8 =
9 =
How long would it take to memorize this code?
15 minutes?
10 minutes?
5 minutes?
5 seconds??
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 7
1st Rule for Performance Cases – No Fear
1 2 3
4 5 6
7 8 9
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 8
Process & Workflow
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 9
C U S T O M E R
Performance Analysis Process
TSC
TSE
FSE Escalations
Sales / SE
Level 1 “Quick Look
Analysis”
Formal PS
Analysis (£)
Perfstat “Level 2 Analysis”
Customer Issue
Customer Audit
Customer Refresh
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 10
Performance Analysis is a Correlation of Data
Single / Multiprocessor CPU Utilization
FCAL Loop throughput
Efficiency of disk writes
Hit rate of read cache
Network speed and throughput
Type of workload
Client OS configuration
Multiple loops?Multipath I/O?Clustering?
MP?Single CPU?
%utilization?
300GB
300GB
300GB
300GB
10K RPM?15K RPM?ATA?
Chain lengths?Even %utilization?
FCALLoop
Mb/s throughput?
Ethernet
Wire speed?Congested network?
GB Ethernet?VIF?
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 11
Tools and Data Collection
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 12
Monitoring Storage Controller Performance
There are several tools used to collect controller performance metrics:
– From console commands (“Level-1”)
– From client using perfstat (“Level-2”)
In this session we will cover “sysstat”, “statit”, “stats”, “perfstat” and a few others…
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 13
Level-1 SE“Quick Look Analysis”
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 14
“sysstat 1”
Reports real-time aggregated system performance statistics
Depends on workload and how
much RAM
NFS Ops has effect on CPU
UT%
Network traffic compared to max speed
Disk writes compared to reads indicate type
of activity
CPU fairly busy Mostly Disk Reads except during CP
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 15
“sysstat –x 1” CP types one factor of write performance
CP Types of NVLog Full, Flush to Disk Affecting Disk Utilisation
Extended performance statistics per sec
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 16
Consistency Points
1st Field – (Type)
CP Types Two fields that dictate what kind of CP is happening
- No CP started during sampling interval
number Number of CPs started during sampling interval
B Back to back CPs (CP generated CP)
b Deferred back to back CP
F CP caused by full NVLog
H CP caused by high water mark
L CP caused by low water mark
S CP caused by snapshot operation
T CP caused by timer
U CP caused by flush
Z CP caused by internal sync
: continuation of CP from previous interval
# continuation of CP from previous interval, and the NVLogor the next CP is now full, so the next CP will be type B.
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 17
Consistency Points
CP Types Two fields that dictate what kind of CP is happening
2nd Field – (Phase)0 Initializing
n Processing normal files
s Processing special files
f Flushing modified data to disk
v Writing superblock information to disk
q Processing quota files
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 18
“sysstat -m” CPU
Running the sysstat command at the admin level (or higher) with the new multi-processor option (-m) will display:-• The percentage of time one or more processors were utilized
(ANY). This is the same as the standard sysstat command's CPU column
• The average utilization of all processors in the system (AVG) • The utilization of each individual processor (CPUx)
All statistics listed above will range from 0% to 100%. Note that if the -m option is invoked on a uni-processor system,the sysstat command will display its standard help menu.
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 19
“sysstat -M” CPU (undocumented)Running sysstat command at diag level (or higher) with new multi-processor option (-M) will display: • The average utilization of all processors in the system (AVG) • The utilization of each processor (CPUx) • Parallelism level accounting (ANYx+) • CSMP Domain utilization for each CSMP Domain (Network,
Storage, Kahuna, Exempt) • Interrupt utilization (Intr) • Operations per second (Ops/s) • Amount of time processing a consistency point (CP)
These statistics are useful for spotting CPU bottlenecks caused by overload/imbalance.
But just because something is 100% it may not be a problem!!
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 20
So what is a good or bad value?
Look for limits being reached like –– 1 GigE Network ~120MB/sec
– FCAL 2Gb ~180MB/s, 4Gb ~360MB/s
– CPU driven by real client ops
– Disks at 100% for extended periods
The main item of concern should be the end user latency.
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 21
Case Study #1 – “sysstat” Example
SE “Quick-Look Analysis”– Translating a real customer sysstat output
?
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 22
Case Study #1 – “sysstat” Example
CPU NFS CIFS Total Net kB/s Disk kB/s Cache Cache CP CP Disk iSCSIin out read write age hit time ty util iSCSI
95% 2348 1605 4909 46299 26415 46773 16 41s 97% 0% - 77% 95684% 2336 1692 4831 26636 30408 50628 7 37s 97% 0% - 75% 80383% 2055 1405 3962 8166 28938 43933 13305 37s 98% 40% Fn 71% 50289% 2241 1810 4605 35697 30198 51612 36155 37s 94% 100% :s 76% 55495% 2124 1518 4104 45802 21917 34179 76323 36s 99% 100% :s 71% 46276% 3444 2666 6821 11297 27633 45383 24220 37s 97% 100% :s 67% 71173% 2599 2885 6134 7724 20151 27640 24 37s 97% 100% :s 54% 65071% 2323 2779 5678 28031 21946 23697 0 37s 98% 100% #s 59% 57654% 2079 2161 4995 2032 46478 21822 8 37s 99% 100% #s 52% 75575% 2018 2092 4975 2520 51821 19317 24 37s 99% 100% #s 49% 86570% 2031 2041 4934 2342 53533 26432 0 38s 99% 100% #s 57% 86250% 2187 1666 4516 1781 38545 22851 0 39s 99% 100% #s 37% 66359% 1769 2030 4612 1967 46198 28087 24 40s 99% 100% #s 69% 81367% 0 1982 2942 1346 40913 28107 8 41s 99% 100% #s 46% 96075% 271 1269 2049 3034 19880 32610 27730 42s 95% 100% bn 67% 50997% 2509 2240 5985 71663 43530 39063 29767 43s 96% 100% :f 50% 123696% 2117 1527 4762 19900 41703 43079 20968 43s 96% 13% : 78% 1118
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 23
Case Study #1 – “sysstat” Answer
High CPU Rates. Full NVLog, Continuation of CP, B-2-B CPs Writes to Disk Halted During Excessive (100%) CP processing
CPU NFS CIFS Total Net kB/s Disk kB/s Cache Cache CP CP Disk iSCSIin out read write age hit time ty util iSCSI
95% 2348 1605 4909 46299 26415 46773 16 41s 97% 0% - 77% 95684% 2336 1692 4831 26636 30408 50628 7 37s 97% 0% - 75% 80383% 2055 1405 3962 8166 28938 43933 13305 37s 98% 40% Fn 71% 50289% 2241 1810 4605 35697 30198 51612 36155 37s 94% 100% :s 76% 55495% 2124 1518 4104 45802 21917 34179 76323 36s 99% 100% :s 71% 46276% 3444 2666 6821 11297 27633 45383 24220 37s 97% 100% :s 67% 71173% 2599 2885 6134 7724 20151 27640 24 37s 97% 100% :s 54% 65071% 2323 2779 5678 28031 21946 23697 0 37s 98% 100% #s 59% 57654% 2079 2161 4995 2032 46478 21822 8 37s 99% 100% #s 52% 75575% 2018 2092 4975 2520 51821 19317 24 37s 99% 100% #s 49% 86570% 2031 2041 4934 2342 53533 26432 0 38s 99% 100% #s 57% 86250% 2187 1666 4516 1781 38545 22851 0 39s 99% 100% #s 37% 66359% 1769 2030 4612 1967 46198 28087 24 40s 99% 100% #s 69% 81367% 0 1982 2942 1346 40913 28107 8 41s 99% 100% #s 46% 96075% 271 1269 2049 3034 19880 32610 27730 42s 95% 100% bn 67% 50997% 2509 2240 5985 71663 43530 39063 29767 43s 96% 100% :f 50% 123696% 2117 1527 4762 19900 41703 43079 20968 43s 96% 13% : 78% 1118
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 24
Case Study #2 – “sysstat” Example
CPU Total Net kB/s Disk kB/s Cache Cache CP CP Disk FCP FCP kB/sin out read write age hit time ty util in out
60% 2007 3 3 26466 128052 27 98% 86% Ff 15% 2007 57566 2591428% 2108 3 9 15840 82408 27 94% 100% :f 16% 2108 65316 2661018% 1745 1 3 12303 18299 27 95% 36% : 13% 1745 51064 2673020% 1737 1 3 11838 0 27 91% 0% - 11% 1737 56387 1854862% 2786 1 3 28351 69429 27 97% 77% Ff 15% 2786 94237 6088624% 2137 1 4 28819 168587 27 99% 100% :f 17% 2137 72937 4797122% 1776 1 3 11305 96 27 96% 7% : 8% 1776 77683 3033568% 3154 1 3 31089 139776 27 98% 66% Ff 18% 3154 97516 9220233% 2884 1 3 16160 69349 27 97% 42% : 15% 2884 107474 4917553% 2175 1 3 21083 49794 27 97% 33% Fn 16% 2175 83760 4236338% 2444 1 3 24244 145267 27 99% 100% :f 14% 2444 101737 3554527% 2015 2 4 15249 35319 27 95% 18% : 12% 2015 60705 4684537% 1736 1 3 16255 7 27 94% 19% Fn 13% 1736 56553 2811552% 2253 1 9 23673 224311 27 98% 65% : 20% 2253 66491 4708015% 1294 1 3 6934 0 27 96% 0% - 6% 1294 48047 22544
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 25
Case Study #2 – “sysstat” Answer
CPU Avg Low. Similar CP write traffic. Write Interval Varies WidelyLikely Under Utilised Controller.
CPU Total Net kB/s Disk kB/s Cache Cache CP CP Disk FCP FCP kB/s
Data Wrote per CP
Time Taken to write
in out read write age hit time ty util in out60% 2007 3 3 26466 128052 27 98% 86% Ff 15% 2007 57566 25914 228M 2.22s28% 2108 3 9 15840 82408 27 94% 100% :f 16% 2108 65316 2661018% 1745 1 3 12303 18299 27 95% 36% : 13% 1745 51064 2673020% 1737 1 3 11838 0 27 91% 0% - 11% 1737 56387 1854862% 2786 1 3 28351 69429 27 97% 77% Ff 15% 2786 94237 60886 237M 1.84s24% 2137 1 4 28819 168587 27 99% 100% :f 17% 2137 72937 4797122% 1776 1 3 11305 96 27 96% 7% : 8% 1776 77683 3033568% 3154 1 3 31089 139776 27 98% 66% Ff 18% 3154 97516 92202 208M 1.08s33% 2884 1 3 16160 69349 27 97% 42% : 15% 2884 107474 4917553% 2175 1 3 21083 49794 27 97% 33% Fn 16% 2175 83760 42363 229M 1.51s38% 2444 1 3 24244 145267 27 99% 100% :f 14% 2444 101737 3554527% 2015 2 4 15249 35319 27 95% 18% : 12% 2015 60705 4684537% 1736 1 3 16255 7 27 94% 19% Fn 13% 1736 56553 28115 224M 0.84s52% 2253 1 9 23673 224311 27 98% 65% : 20% 2253 66491 4708015% 1294 1 3 6934 0 27 96% 0% - 6% 1294 48047 22544
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 26
Level-2 “Deeper Analysis”
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 27
Perfstat Tool
Perfstat is a script that captures statit, sysstat, and other controller and client side performance statistics and configuration
Engineers run perfstat instead of individual performance commands
perfstat for UNIX
perfstat for Windows
Right-click on perfstat.sh and download to a valid host machine
Run perfstat from the host machine to produce the perfstat output file
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 29
Perfstat – Downloadhttp://now.netapp.com/NOW/download/tools/perfstat/
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 30
Typical Usage of Perfstat
With workload to be monitored running in backgroundperfstat -f <storage cntlr> -t 15 -i 88 -F -p > <cntlr>1.txt
(Collect at 15min intervals for 22 hours of day (outside backup window), controller performance data only, output to labelled text file)
Multiple filer use:perfstat -f filername1,filername2 -t 15 > perfstat.out
Multiple host use:perfstat -h host1,host2 -f filername -t 15 > perfstat.out
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 32
Accessing PerfViewer (Partners)
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 33
Accessing PerfViewer (Partners)
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 34
Perfstat Tools – Offline Perfstat Grapher (Partners)
Reads in perfstat output and graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 35
Perfstat Tools – Perfstat Grapher (Direct Employees)
Reads in perfstat output and graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 36
Perfstat Tools – Perfstat Viewer
Reads in perfstat output and gives breakdown of output
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 37
Data Review and Translation
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 38
Sample Perfstat Output - pre v7.2
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 39
Perfstat Output – Pre DOT v7.2.x
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 40
statit Command
An advanced command for analysis of system resources
Gathers a set of performance statistics over an interval between time statit is begun and ended
– CPU
– Network interfaces
– Disks
– System software
Some of the information provided is also available from other commands
(Keep “baseline” of performance for future reference)
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 41
statit – “CPU Statistics”
1
2[1][2]
[4][3]
[6][7]
[5]
> 190% = CPU bottleneck
The first section reports the system name and system id number, the amount of RAM, the software version, and date and time when statit -b was executed.
The second section of the report breaks down the usage of the CPU(s) to microsecond precision. [1] is elapsed time for the measurement, in seconds.
[2] ("system time") is time that the CPU was in use (not idle) and the percentage of elapsed time in that state. On a multiprocessor system, the system time is reported as a
sum across all CPUs (0-200%).
[3] ("rupt time") is time that the CPU was executing interrupt-level code, the percentage of elapsed time in that state, the number of interrupts, and the average CPU-time per interrupt.
On a multiprocessor system, the system time is reported as a sum across all CPUs (0-200%).
[4] ("non-rupt system time") is time that the CPU was executing base-level codeand the percentage of elapsed time in that state. On a multiprocessor system,
the non-rupt system time is reported as a sum across all CPUs (0-200%).
[5] ("idle time") is time that the CPU was idle and the percentage of elapsed time in that state.On a multiprocessor system, idle time is reported as a sum across all CPUs (0-200%).
[6] ("time in CP") is time that WAFL had a consistency point (CP) in progress (which may include time when the CPU was idle) and the percentage of elapsed time in that state (0-100%).
[7] ("rupt time in CP") is time that the CPU was executing interrupt-level code during a CP, the percentage of CP time spent in interrupt-level code, the number of CP
interrupts, and the average CPU-time per CP interrupt. On a multiprocessor system, rupt time in CP is reported as a sum across all CPUs (0-200%).
Can you tell what may be wrong with this picture?
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 42
statit – Multi Processors
Coarse Symmetric Multi Processing (CSMP)– Multiprocessing architecture developed by NetApp
– Parallelism of processing across multiple CPUs through “Domains”
Data ONTAP Domains– DataONTAP is divided up into multiple “domains”
– A group of processes that contains and are organized together for ease and synchronisation
– A single processor can only work in one domain at any time
– Exception is the ‘WAFL Exempt’ (and IDLE) domain
– Ongoing development for multi processing improvements
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 43
statit – Multi Processors
IDLE The CPU is not doing any processing
Network Net drivers, TCP/UDP/IP, NFS
RAID SCSI & FC operations
TARGET Several miscellaneous calls (FCP & iSCSI)
STORAGE SCSI & FC device drivers
EXEMPT Locking & libraries
KAHUNA WAFL, SnapMirror & the bulk of Data ONTAP code (Note that CIFS was given it’s own domain in DOT v7.2)
NETCACHE / 2Code shared with DataONTAP from Netcache software (Redundant Domains)
Data ONTAP Domains
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 44
statit – Multi Processors
New with v7.2.xCIFS Handles CIFS protocol requests. New with v7.2.x
WAFL_EXEMPT New with v7.2.x
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 45NetApp Confidential – Do Not Distribute 45
DOT v7.3 Multi-CPU Performance Improvements
Ongoing optimizations and parallelism for four-way FAS platforms– Better CPU utilization for FAS platforms with four or more
processors / cores – FAS3070, FAS6070, FAS6080– More efficient distribution of parallel network layer
processing on multiple CPUs– Move write allocation processing from kernel
to separate domain
Performance improvements are workload dependent– Most protocols should see benefit to some degree –
NFSv3 & 4, CIFS, iSCSI
Transparent – no configuration or commands
How? Many more domains!!
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 46
statit – Multi Processor Statistics
2) MP Domain
4) Active CPU Time
3) Microseconds per Second
5) Active CPU Percentage
1) Kernel Events / Switches
# divided by 106 X 100 = MP
domain utilization %
Percent / usec CPUs have been running in Domain (>80%=concern)
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 47
statit – Miscellaneous Statistics
Look for network KB received and transmitted to get an idea of the amount of traffic also
distribution of read and write workload
Receiving less than Writing – maybe due to parity calc, client ops or fragmentation
Overall Cache Effectiveness and Network Traffic against Disk Activity
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 48
statit – WAFL Statistics
Type of CPs is one indicator of amount of write requests, disk i/o, and write performance
WAFL Statistics (per second) 63.20 name cache hits ( 28%) 165.49 name cache misses ( 72%) 1443.76 inode cache hits ( 96%) 55.13 inode cache misses ( 4%) 3213.54 buf cache hits ( 100%) 2.96 buf cache misses ( 0%) 0.23 blocks read 0.00 blocks read-ahead 0.00 chains read-ahead 0.00 dummy reads 0.00 blocks speculative read-ahead 415.10 blocks written 26.13 stripes written 0.00 blocks over-written 0.05 wafl_timer generated CP 0.00 snapshot generated CP 0.00 wafl_avail_bufs generated CP 0.00 dirty_blk_cnt generated CP 0.04 full NV-log generated CP 0.00 back-to-back CP 0.00 flush generated CP 0.00 sync generated CP 0.00 wafl_avail_vbufs generated CP 0.00 deferred back-to-back CP 850.86 non-restart messages 0.03 IOWAIT suspends 144280 buffers
Name cache hits = Name to File handle cache
Inode cache hits = Inode cache given a file handle
Buf cache hits = Cached block information
CP = Indicator of write requests, disk I/O, write performance
Performance of file and buffer cache and I/O Character by CP Type
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 49
statit – Network Interfaces
Amount of network traffic in KBs. Can be divided
by 1024 to find MB/s.
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 50
statit – RAID Stripes on Statit
Looking at RAID group sizes of 4 drives (not including parity)
Only 1 disk was written to the RAID group = 1.24 stripes
Only 2 disks were written to in the RAID group = 0.59 stripes
Only 3 disks were written to in the RAID group = 1.87 stripes
All 4 disks were written to in the RAID group = 76.15 stripes
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 51
statit – RAID Stripes on Statit
Poor Write Allocation / Affect of RG Increase?We know the total number of stripes written is 79.85 stripesWe know the total number of full stripes written is 76.15 stripesSo ~95% of all stripes written to all 4 data disk RAID group are full stripes.
Also…664.09 blocks written vs. 200.42 stripes written = 3.31 to 1 ratio(Ratio of 2.5 - 1 or lower indicates potential poor write allocation (fragmentation))
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 52
statit – Disk Statistics
Disk transfers indicate number of reads,writes, and other disk I/O; Max about 130 - 180
A handy guideto Disk Statistics
disk ut% xfers ureads-chain-usecs writes-chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /vol0/plex0/rg0:
0a.1 4 3.57 0.54 1.00 26667 2.86 4.56 1644 0.18 14.00 1143 0.00 .... . 0.00 .... .
0a.0 3 2.86 0.54 1.00 18333 2.14 5.08 1967 0.18 2.00 5000 0.00 .... . 0.00 .... .
0a.5 2 1.61 0.18 1.00 15000 1.43 7.00 1429 0.00 .... . 0.00 .... . 0.00 .... .
0a.6 2 2.50 0.18 1.00 10000 2.14 5.67 1368 0.18 12.00 1167 0.00 .... . 0.00 .... .
0a.8 3 3.57 1.07 2.67 2313 2.32 5.38 1457 0.18 14.00 1071 0.00 .... . 0.00 .... .
0a.9 2 1.61 0.54 1.00 11667 0.89 11.00 1036 0.18 1.00 9000 0.00 .... . 0.00 .... .
0a.10 2 1.61 0.18 1.00 17000 1.25 7.86 1345 0.18 1.00 9000 0.00 .... . 0.00 .... .
/vol0/plex0/rg1:
0a.19 2 1.79 0.18 1.00 4000 1.61 6.22 1464 0.00 .... . 0.00 .... . 0.00 .... .
0a.20 4 3.75 2.32 2.08 4333 1.43 7.00 1554 0.00 .... . 0.00 .... . 0.00 .... .
0a.21 3 4.64 3.21 2.72 1265 1.25 7.86 1291 0.18 1.00 8000 0.00 .... . 0.00 .... .
0a.22 3 3.21 1.61 1.11 7800 1.61 6.22 1571 0.00 .... . 0.00 .... . 0.00 .... .
Disk Statistics (per second)
ut% is the percent of time the disk was busy.
xfers is the number of data-transfer commands issued per second.
xfers = ureads + writes + cpreads + greads + gwrites
chain is the average number of 4K blocks per command.
usecs is the average disk round-trip time per 4K block.
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 53
statit – Disk Statistics
disk writes--chain-usecs cpreads-chain-usecs/vol0/plex0/rg0:0b.16 1.67 1.75 1400 1.88 2.33 60000b.17 1.51 1.72 1439 1.68 2.33 51430b.18 1.53 1.67 1419 1.73 2.33 5443
/vol1/plex0/rg0:0b.19 157.58 15.90 477 1.30 16.00 4600b.20 157.42 15.90 506 2.42 11.74 6130b.21 156.67 15.96 481 2.23 10.87 645
1. Add up the write operations for a RAID group
2. Add up the CPreads operations for a RAID group
3. Divide total write operations by total CPread operations
> 1.20 = RAID Group Good
1.20 – 1.0 = Concern
< 1.0 = Probably Fragmented
Method for Identifying likely Fragmentation
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 54
statit – Disk Statistics
disk writes--chain-usecs cpreads-chain-usecs/vol0/plex0/rg0:0b.16 1.67 1.75 1400 1.88 2.33 60000b.17 1.51 1.72 1439 1.68 2.33 51430b.18 1.53 1.67 1419 1.73 2.33 5443
/vol1/plex0/rg0:0b.19 137.31 15.90 477 163.30 16.00 4600b.20 137.22 15.90 506 167.42 11.74 613
0b.21 146.56 15.96 481 164.23 10.87 645
/vol2/plex0/rg0:0b.22 157.58 14.90 437 1.30 16.00 4600b.23 157.42 14.90 596 2.42 11.74 6130b.24 156.67 14.96 421 2.23 10.87 645
Vol0
Total Writes = 4.35
Total CPreads = 5.29
Total Writes / Total CPreads = 0.82
Vol1
Total Writes = 421.09
Total CPreads = 494.95
Total Writes / Total CPreads = 0.85
maybe not Fragmented
RAID Group Good
Likely Fragmented
Vol2
Total Writes = 471.67
Total CPreads = 5.95
Total Writes / Total CPreads = 79.27
Lets test this theory…
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 55
Throughput vs. Response Time CurvesAverage Request Response Time vs. IOPS
0
5
10
15
20
25
30
0 50 100 150 200 250 300 350
IOPS per data drive
mill
isec
on
ds
7.2K rpm SATA 10K rpm FC 15K rpm SAS 15K rpm FC
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 56
“The Story”What have we surmised…This is a pre v7.2.x DOT (only 9 NetApp “Domains”) systemTwo processor system with lack of resources Spending almost all its time (97%) with CP WritesKahuna Domain more than x2 of others (not good) (Bottleneck in CPUs or lack of parallelism… even though flat out?)Kahuna and RAID switching indicates most of Domain activityOnly NFS operations. No other protocol active in periodGeneral workload Net-in with similar disk write activityGood caching with Net-out 3x disk reads (Network-in Bottleneck at 124MB/s due to connectivity limitations?) (Writing more to disk than Net-in. Fragmentation or Client operation?)WAFL file & buffer cache effective. CP Write activity high for WAFLFour NICS all active. Bottleneck not apparent now. No network errors.For most active RGs there doesn’t appear to be fragmentation (95% full stripes)Low disk utilisation/IOPS but mostly write traffic
RECOMMENDATION: “Likely under CPU resourced controller. Under performance for required workloads (in this sample). Controller upgrade necessary before DOT upgrade as latent demand/CSMP may create more issues”
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 57
Case Study #3 – PerfViewer Output
Generating PerfViewer Output & Translating
?
SE “Deeper Analysis”
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 58
Perfstat Tools – Throughput by Protocol
Reads in perfstat output & graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 59
Perfstat Tools – CPU & Disk Utilisation
Reads in perfstat output & graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 60
Perfstat Tools – Aggregate Throughput
Reads in perfstat output & graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 61
Perfstat Tools – Domain Utilisation
Reads in perfstat output & graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 63
Perfstat Tools – “sysstat – M” Output
Reads in perfstat output & graphs the data
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 64
Sample Perfstat v7.3 Output
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 65
Perfstat Output – DOT v7.3
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 66
Performance AnalysisWrap Up
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 67
C U S T O M E R
Performance Analysis Process
TSC
TSE
FSE Escalations
Sales / SE
Level 1 “Quick Look
Analysis”
Formal PS
Analysis (£)
Perfstat “Level 2 Analysis”
Customer Issue
Customer Audit
Customer Refresh
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 68
Performance Strategy
Moving Forward…
– Premium AutoSupport Tool (“Willow” ASUP visualization & graphs)
– Operations Manager with Performance Advisor (a “must use”)
– Internal Tool using Perfstats and ASUPs to provide stats and graphs
– Updates to Stats command is the future!
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 69
Performance Strategy
The process again…
– Practice & prepare yourself to understand command line / perfstat output
– If an issue… a call must be raised with GSC!!
– If it’s a sizing, scoping, general performance pre-sale question then use command line output
– If large project then use PS Service
– If formal pre-sales analysis then use perfstat
– Perfstats take a while to collect & translate
– Give yourself enough time to complete
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 70
Performance Analysis Education
© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 71
Thank You
© 2008 NetApp. All rights reserved. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, FlexClone, FlexVol, RAID-DP, and Snapshot are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Windows is a registered trademark of Microsoft Corporation. Linux is a registered trademark of Linus Torvalds. Solaris is a trademark of Sun Microsystems, Inc. Oracle is a registered trademark of Oracle Corporation. SAP is a registered trademark of SAP AG. VMware is a registered trademark of VMware, Inc. UNIX is a registered trademark of The Open Group. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.