Post on 16-Jul-2015
NOTICE: Proprietary and Confidential
This material is proprietary to Centric Consulting, LLC. It contains trade secrets and information which is solely the property of Centric Consulting, LLC. This material is
solely for the Client’s internal use. This material shall not be used, reproduced, copied, disclosed, transmitted, in whole or in part, without the express consent of Centric
Consulting, LLC.
© 2013 Centric Consulting, LLC. All rights reserved
Bad Metric. Bad!
Teaching an old dog, nothing new
What are some typical metrics that you measure?
Other Examples of Software Testing Metrics
• Test Case Counts by Execution Status
• Test Case Percentages by Execution Status
• Test Case Execution Status Trend
• Test Case Status Planned vs Executed
• Test Case Coverage
• Test Case Status vs Coverage
• Test Case First Run Failure Counts
• Test Case Re– Run Counts
Test Cases
• Automation Index (Percent Automatable)
• Automation Progress
• Automation Test Coverage
Automation extras
More Examples of Software Testing Metrics
• Defect Counts by Status
• Defect Counts by Priority
• Defect Status Trend
• Defect Density
• Defect Remove Efficiency
• Defect Leakage
• Average Defect Response Time
Defects
• Requirements Volatility Index
• Testing Process Efficiency
Other
Common Themes
Counts
Metric (Counts/Counts)
Trends
Other Examples of Software Testing Metrics
• Test Case Counts by Execution Status – Count
• Test Case Percentages by Execution Status – Count
• Test Case Execution Status Trend – Trend
• Test Case Executed vs Planned – Metric and Trend
• Test Case Coverage – Metric
• Test Case Status vs Coverage – Metric
• Test Case First Run Failure Counts – Count
• Test Case Re– Run Counts – Count
Test Cases
• Automation Index (Percent Automatable) – Metric
• Automation Progress – Count
• Automation Test Coverage – Metric
Automation extras
More Examples of Software Testing Metrics
• Defect Counts by Status – Count
• Defect Counts by Priority – Count
• Defect Status Trend – Trend
• Defect Density – Metric
• Defect Remove Efficiency – Metric
• Defect Leakage – Metric
• Average Defect Response Time – Trend
Defects
• Requirements Volatility Index – Metric
• Testing Process Efficiency – Metric
Other
The Problem We Typically Face?
They Fail to Communicate
• Present data instead of information
• Offer no interpretation, allow user to draw own conclusion
They Are Often Inaccurate
• The act of measuring lacks of consistency
• The measures themselves have inherent variability
• No one reports margin of errors
They Do Not Measure a Control
• Can’t make decision based on number
• The measurement isn’t a lever to introduce change
They Are Not Tied to Organizational Objectives
• No threshold set for desired goal
• No action or consequence if not achieved
Counting
Counting
Exercise #1
1. Need 3 volunteers
2. Assume 1 scoop equals 1 days worth of testing effort
3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are
bugs
4. Take a scoop
5. How many tests did you execute?
6. Based on how many tests you ran, how many more scoops
do you need to execute the rest (there are 180 total)?
Exercise #1 Questions
• Was the same scoop used? Were the results the
same?
• Was there variability in the number of tests run in
each scoop. • Is that typical in testing?
• Was there variability in the estimate of the number
of tests left?• Is this similar to guessing how much time is effort is left in
a test cycle?
• Are these numbers reliable?• Are they repeatable?
Exercise #2
1. Need 3 volunteers
2. Assume 1 scoop equals 1 days worth of testing effort
3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are
severe)
4. Take a scoop
5. How many tests did you execute?
6. How many defects did you find?
7. Based on how many tests you ran, how many more scoops do you need
to execute the rest?
8. Based on how much effort you put in, how many more scoops do you
need to find the rest of the defects?
Exercise #2 Questions
• Was the same scoop used? Were the results the same?
• With an estimate of the number of tests remaining, is it reasonable to
estimate the number of defects will be found?• Do people ask you to guess this type of information?
• If you know how many tests (Starbursts) are left and how many man-
hours you will use (scoop size), can you estimate how many scoops are
needed to execute all tests (find all Starbursts)?• Is it accurate? Is it close enough?
• Are these numbers reliable?• Are they repeatable?
• Does encountering defects (M&M’s) reveal anything about the overall
quality (how many M&M’s exist, or what it’ll take to find them)?
Challenges with Counting
Label does not equal content
Inherent variability
Not evenly spaced
Lacks reference for context
Lack of consistency
Metrics (Measure over Measure)
Sampling
Target Population
Matched Samples
Independent Samples
Random Sampling
Simple Random Sampling
Stratified Sampling
Cluster Sampling
Quota Sampling
Spatial Sampling
Sampling Variability
Standard Error
Bias
Precision
For each population there are many possible samples. A sample statistic
gives information about a corresponding population parameter
Sampling in Testing
Does testing use sampling?
Consider in most corporate environments:
• We never test the entire application
• It is not realistically possible to find
every defect
• So, does testing use sampling?
Ponder this as we discuss the next section…
Is Testing a Methodical Defect Searching Activity?
Sampling
Remember, We can’t test everything – not enough time/people/budget
So, which sample approach better approximates an actual measure (e.g.
dots per sq. inch?)
5.25 dots/sq. in. 6.5 dots/sq. in.
Ponder this as we discuss the next section…
Is Testing a Methodical Defect Searching Activity?
Sampling
Which sample approach better approximates an actual measure (e.g. dots
per sq. inch?)
• What is more accurate, random or methodical searching?
5.25 dots/sq. in. 6.5 dots/sq. in.
4.95 dots/sq. in. 6.3 dots/sq. in.
There are actually 6.6
dots/sq. in.
Exercise #3
Exercise #3
1. Need 3 volunteers
2. Assume 1 scoop equals 1 days worth of testing effort
3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are
severe)
4. Each volunteer grab 1 scoop of candy
5. How many (total) tests did you execute?
6. How many (total) defects did you find?
7. Log results
8. Repeat 2 more times
Exercise #3 Questions
• Does this graph represent anything useful?
• Does a trend line help or mean anything?
• Is it possible or reasonable to estimate the # of
defects you’ll see based on the number of
tests, from even 9 samples?
• Compare scoop 1 to scoop 9 – does any scoop
seem to be a reasonable estimate?
Challenges with Metrics (Measure over Measure)
Implied Derivations and Forecasting
Counts over Counts
Denominator Rules
Implies Velocity
Measure over Measure
Trends
Trend
Trend is a change in a measure (or metric) over time interval.
Has three components
Direction/Movement Speed/Size Cause (Implied)
Exercise #4
1. Need 3 volunteers
2. Assume 1 scoop equals 1 days worth of testing effort
3. Hershey Kisses and Tootsie Rolls are tests, Starbursts are bugs (Red are
severe)
4. Each volunteer grab 1 scoop of candy
5. How many of EACH type of tests did you execute?
6. How many of EACH type of defect did you find?
7. Log results
8. Repeat 2 more times
Exercise #4 Questions
• Does the graph line represent any information of value?
• Is there assurance (control) that simply taking a scoop (e.g.
executing tests in a given day) will result in defects being
found?
• Is the shape of the defect cumulative line representative of
anything?
• If we only look at scoops 1-3 or 7-9, does it tell us anything or
mislead us?
• What if we took 2 scoops per day (added a tester – but still
counted as 1 day), would that affect anything how things
look?
• Is M&M’s per scoop or M&M’s per skittles/starbursts mean
anything?
Challenges with Trends
Affected by challenges of counting
Affected by challenges of metrics
Time Based Series
Intervals and Activity Pause
Purpose of Metrics
Measure of Performance
Conformance to Best Practice
Deviation from Goal
Issues affecting purpose
Misaligned with strategy
Using metrics as outputs only
Too many metrics
Ease of measure does not equal importance
Lack of context
Limited dimensions
Lack behavioral aspects
Changing the World
How to Leverage Metrics
Explicitly link metrics to goals
Use trends over absolute numbers
Use shorter tracking periods
Change metrics when they stop driving change
Account for error and confidence
Q&A
Joseph Ours
Email:
Joseph.ours@centricconsulting.com
Company Website:
https://centricconsulting.com/technol
ogy-solutions/software-quality-
assurance-and-testing/
Twitter:
@justjoehere
LinkedIN:
www.linkedin.com/josephours
Personal Blog:
http://josephours.blogspot.com