INTERPRETING AND REPORTING PERFORMANCE TEST RESULTS
ERIC PROEGLER
2
ABOUT ME• 20 Years in Software, 14 in Performance,
Context-Driven for 12
• Performance Engineer/Teacher/Consultant
• Product Manager
• Board of Directors
• Lead Organizer
• Mentor
3
1. Discover• ID processes & SLAs• Define use case
workflows• Model production
workload
3. Analyze• Run tests • Monitor system
resources• Analyze results
5. Report• Interpret results• Make
recommendations• Present to
stakeholders
2. Develop• Develop test scripts• Configure
environment monitors
• Run shakedown test
3
4. Fix• Diagnose • Fix• Re-test
DAN DOWNING’S 5 STEPS OF LOAD TESTING
4
ABOUT THIS SESSION• Participatory!
• Graphs from actual projects – have any?• Learn from each other
• Not About Tools
• First Half• Making observations and forming hypotheses
• Break (~10:00)
• Second Half• Interpreting and reporting actionable results
5
WHAT CAN WE OBSERVE ABOUT THIS APP?
6
WHAT’S THIS SUGGESTING?
7
WHAT’S THIS SUGGESTING?
8
AND THIS?
9
PERFORMANCE KPIS*
Scalability Throughput
System Capacity
*Key Performance Indicators
Workload Achievement
10
HOW COULD WE ANNOTATE THIS GRAPH?
Note scale of each metric
Mix
ed u
nits
(sec
., co
unt,
%)
11
WHAT DOES THIS SAY ABOUT CAPACITY?
12
WHAT OBSERVATION CAN WE MAKE HERE?
13
AND HERE?
14
HMMM…YIKES!
15
WHAT CAN WE SAY HERE?
16
WHAT CAN WE SAY HERE?
17
DESCRIBE WHAT HAPPENED HERE?
18
TELL A PLAUSIBLE STORY ABOUT THIS
19
WHAT’S THE LESSON FROM THIS GRAPH? Hurricane Center “Average” of
US hurricane Forecast models
Averages Lie!
20
MEASURE WHAT WHERE?
Mentora /NoVA
Load Injectors(remote users)
NeoLoad Controller(1000 vuser license)
Resource Monitor
Load Injector(Local users)
7100
Web servers (Linux/
WebLogic)
App servers(Linux,
Tuxedo)DB server(Solaris/Oracle)
F5 Load
Balancer
ssh port 22htt ps 16000
ssh port 22
ssh port 22JDBC port 1521
7200
80/443
4
1
2
3
5
6
21
MEASURE WHAT WHERE?
Mentora /NoVA
Load Injectors(remote users)
NeoLoad Controller(1000 vuser license)
Resource Monitor
Load Injector(Local users)
7100
Web servers (Linux/
WebLogic)
App servers(Linux,
Tuxedo)DB server(Solaris/Oracle)
F5 Load
Balancer
ssh port 22htt ps 16000
ssh port 22
ssh port 22JDBC port 1521
7200
80/443
Proper load balancing (really at Web/App
servers)
HW resourcesWeb Server connections,
queuing, errors
HW resourcesJVM heap memory
DB Connection pools
HW resourcesLock waits / Deadlocks
SQL recompilesFull table scansSlowest queries
SAN IOPS
Bandwidth throughputLoad injector capacity
LoadResponse timeHW resources
22
MEASURE WITH WHAT?
23
ANYTHING CONCERNING HERE?Before:Slowest transactions show spikes of 5 - 8 seconds, every 10 minutes
After:Spikes substantially reduced after VM memory increased to 12 GB
24
WHAT ARE WE LOOKING AT HERE?
When does this become a problem?
When heap space utilization keeps growing, despite
garbage collection, and reaches its max
allocation
25
ANY HYPOTHESES ABOUT THIS?Before:Abnormally high TCP retransmits between Web and App server
After:Network issues resolved
26
TELL A DATA-SUPPORTED STORY ABOUT THIS; ANNOTATE THE GRAPH
27
HOW MUCH LOAD?
28
HOW MUCH LOAD?
29
HOW MUCH LOAD?
30
HOW MUCH LOAD?
31
…Think “CAVIAR”C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting
FOR ACTIONABLE PERFORMANCE RESULTS…
32
COLLECTING• Objective: Gather all results from test that
• help gain confidence in results validity• Portray system scalability, throughput & capacity• provide bottleneck / resource limit diagnostics• help formulate hypotheses
Types Examples Sources Granularity Value
LoadUsers | sessions simulated, files sent, "transactions" completed
test "scenario", "test harness" counts, web / app logs, db queries
At appropriate levels (virtual users, throughput) for your context
Correlate with Response times yields "system scalability"
Errorshttp, application, db, network; "test harness"
web | app logs, db utility, network trace
Raw data at most granular level Confidence in results validity
Response Times
page | action / "transaction" / end-to-end times
test tools | "scripts", web logsAt various levels of app granularity, linked to objectives
Fundamental unit of measure for performance
Resourcesnetwork, "server", "middleware", database, storage, queues
OS tools (vmstat, nmon, sar, perfmon), vendor monitoring tools
5-15 second sampling rates, with logging, to capture transient spikes
Correlated with Response times yields "system capacity"
Anecdotesmanual testing, transient resources, screenshots
People manually testing or monitoring during the test
Manual testing by different people & locations
Confidence / corroberation / triangulation of results
33
AGGREGATING• Objective: Summarize measurements using
• Various sized time-buckets to provide tree & forest views• Consistent time-buckets across types to enable accurate
correlation• Meaningful statistics: scatter, min-max range, variance,
percentiles• Multiple metrics to “triangulate”, confirm (or invalidate)
hypothesesTypes Examples Tools Statistics Value
LoadUsers/sessions; requests; no. files / msgs sent/rcvd
avg Basis for ID'ing load sensitivity of all other metrics
ErrorsError rate, error counts; by url |type; http 4xx & 5xx
avg-maxID if errors correlate with load or resource metrics
Response Times
workflow end-to-end time; page/action time
min-avg-max-Std Deviation90th percentile
Quantify system scalability
Network Thruput
Megabits/sec. avg-maxID if thruput plateaus while load still ramping, or exceeds network capacity
App Thruput
Page view rate; completed transactions by type
avg-maxID if page view rate can be sustained, or if "an hours work can be done in an hour"
Resources% cpu; cpu, disk queue depth; memory usage; IOPS; queued requests; db contention
avg-maxID limiting resources; provide diagnostics; quantify system capacity
Testing tool graphs, monitoring tools, Excel pivot tables
34
VISUALIZING
• Objective: Gain “forest view” of metrics relative to load• Turn barrels of numbers into a few pictures• Vary graph scale & summarization granularity to expose
hidden facts• ID load point where degradation begins• ID system tier(s) where bottlenecks appear, limiting
resources
35
VISUALIZING
• My key graphs, in order of importance• Errors over load (“results valid?”)• Bandwidth throughput over load (“system bottleneck?”)• Response time over load (“how does system scale?”)
• Business process end-to-end• Page level (min-avg-max-SD-90th percentile)
• System resources (“how’s the infrastructure capacity?”)• Server cpu over load• JVM heap memory/GC• DB lock contention, I/O Latency
36
INTERPRETING
• Objective: Draw conclusions from observations, hypotheses• Make objective, quantitative observations from graphs /
data • Correlate / triangulate graphs / data• Develop hypotheses from correlated observations • Test hypotheses and achieve consensus among tech
teams• Turn validated hypotheses into conclusions
37
INTERPRETING
• Observations:• “I observe that…”; no evaluation at this point!
• Correlations:• “Comparing graph A to graph B…” – relate observations
to each other• Hypotheses:
• “It appears as though…” – test these with extended team; corroborate with other information (anecdotal observations, manual tests)
• Conclusions:• “From observations a, b, c, corroborated by d, I
conclude that…”
38
SCALABILITY: RESPONSE TIME OVER LOAD
• Is 2.5 sec / page acceptable? Need to drill down to page level to ID key contributors, look at 90th or 95th percentiles (averages are misleading)
Two styles for system scalability; top graph shows load explicitly on its own y-axis
Note consistent 0.5 sec / page up to ~20 users
Above that, degrades steeply to 5x at max load
39
THROUGHPUT PLATEAU WITH LOAD RISING
= BOTTLENECK SOMEWHERE!• Note throughput
tracking load through ~45 users, then leveling off
• Culprit was an Intrusion Detection appliance limiting bandwidth to 60 Mbps
In a healthy system throughput should closely track load
40
BANDWIDTH TRACKING WITH LOAD = HEALTHY
All 3 web servers show network interface throughput tracking with load throughout the test
A healthy bandwidth graph looks like Mt. Fuji
41
ERRORS OVER LOAD MUST EXPLAIN!
• Note relatively few errors
• Largely http 404s on missing resources
Error rate of <1% can be attributed to “noise” and dismissed; >1% should be analyzed and fully explained
Sporatic bursts of http 500 errors near end of the test while customer was “tuning” web servers
42
END USER EXPERIENCE SLA VIOLATIONS
Outlier, not on VPN
43
SLA VIOLATIONS DRILL DOWN
Felipe B. (Brazil, Feb 28th, 7:19AM-1:00PM CST, 10.74.12.55): > 20 second response on page “Media Object Viewer”.
44
NETWORK THROUGHPUT – RAW GRAPH
45
NETWORK THROUGHPUT - INTERPRETED
46
CAPACITY: SYSTEM RESOURCES - RAW
47
CAPACITY: SYSTEM RESOURCES - INTERPRETED
Monitor resources liberally, provide (and annotate!) graphs selectively: which resources tell the main story?
48
ASSESSING• Objective: Turn conclusions into recommendations
• Tie conclusions back to test objectives – were objectives met?• Determine remediation options at appropriate level –
business, middleware, application, infrastructure, network• Perform agreed-to remediation• Re-test
• Recommendations:• Should be specific and actionable at a business or technical
level• Should be reviewed (and if possible, supported) by the teams
that need to perform the actions (nobody likes surprises!)• Should quantify the benefit, if possible the cost, and the risk of
not doing it• Final outcome is management’s judgment, not yours
49
REPORTING• Objective: Convey recommendations in
stakeholders’ terms• Identify the audience(s) for the report; write / talk in their
language• Executive Summary – 3 pages max
• Summarize objectives, approach, target load, acceptance criteria
• Cite factual Observations • Draw Conclusions based on Observations• Make actionable Recommendations
• Supporting Detail• Test parameters (date/time executed, business processes, load
ramp, think-times, system tested (hw config, sw versions/builds)
• Sections for Errors, Throughput, Scalability, Capacity• In each section: annotated graphs, observations, conclusions
• Associated Docs (If Appropriate)• Full set of graphs, workflow detail, scripts, test assets
50
REPORTING
• Step 1: *DO NOT* press “Print” of tool’s default Report • Who is your audience? • Why do they want to see 50 graphs and 20 tables? What
will they be able to see?• Data + Analysis = INFORMATION
51
REPORTING
• Step 2: Understand What is Important• What did you learn? Study your results, look for
correlations.• What are the 3 things you need to convey?• What information is needed to support these 3 things?• Discuss findings with technical team members: “What
does this look like to you?”
52
REPORTING
• Step 3: So, What is Important?• Prepare a three paragraph summary for email• Prepare a 30 second Elevator Summary for when
someone asks you about the testing• More will consume these than any test report• Get feedback
53
REPORTING
• Step 4: Preparing Your Final Report: Audience• Your primary audience is usually executive sponsors
and the business. Write the Summary at the front of the report for them.• Language, Acronyms, and Jargon • Level of Detail• Correlation to business objectives
54
REPORTING
• Step 5: Audience (cont.)• Rich Technical Detail within:
• Observations, including selected graphs• Include Feedback from Technical Team• Conclusions • Recommendations
55
REPORTING
• Step 6: Present!• Remember, no one is going to read the report.• Gather your audience: executive, business, and
technical. • Present your results• Help shape the narrative. Explain the risks. Earn your
keep.• Call to action! Recommend solutions
56
…REMEMBER: CAVIAR!
C ollectingA ggregating V isualizing I nterpretingA ssessingR eporting
57
A FEW RESOURCES• WOPR (Workshop On Performance and Reliability)
• http://www.performance-workshop.org• Experience reports on performance testing• Spring & Fall facilitated, theme-based peer conferences
• SOASTA Community• http://cloudlink.soasta.com• Papers, articles, presentations on performance testing
• PerfBytes Podcast• Mark Tomlinson’s blog
• http://careytomlinson.org/mark/blog/ • Richard Leeke’s blog (Equinox.nz)
• http://www.equinox.co.nz/blog/Lists/Posts/Author.aspx?Author=Richard Leeke • Data visualization
• Scott Barber’s resource page• http://www.perftestplus.com/resources.htm
• STP Resources• http://www.softwaretestpro.com/Resources• Articles, blogs, papers on wide range of testing topics
58
THANKS FOR ATTENDING
Please fill out an evaluation form
Top Related