Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During...

33
1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D. Tennant SEPG Lead Northrop Grumman Corporation A 5-Year Longitudinal Case Study 19 November 2008

Transcript of Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During...

Page 1: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

1 ISER-MLB-PR-08-137

Controlling Peer Reviews DuringSoftware Development

Richard L. W. Welch, PhDAssociate Technical Fellow

Steve D. TennantSEPG Lead

Northrop Grumman Corporation

A 5-Year Longitudinal Case Study

19 November 2008

Page 2: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

2 ISER-MLB-PR-08-137

Topics

• Background • Getting started (2004)

– Overcoming technical issues• Handling human issues and institutionalizing the process

(2005)• Growing the benefits (2006)

– More processes, projects, disciplines• Increasing our effectiveness (2007-2008)

– Exploring new techniques• Future pathways

Page 3: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

3 ISER-MLB-PR-08-137

Why Peer Reviews?

• Ubiquity– Many work products reviewed throughout software development life cycle

• System & software design artifacts• Source code• Test plan, procedures & reports

• Frequency– High data rates

• Influence– Approximately 10% of the software development effort is spent on peer

reviews and inspections – Code walkthroughs represent biggest opportunity & most advantageous

starting point• Serendipity

– All engineering disciplines peer review their design products– Techniques & lessons learned have demonstrated extensibility

Page 4: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

4 ISER-MLB-PR-08-137

Review Closed Date

LN (

Hou

rs p

er N

ew/M

od L

OC)

19-A

ug-0

4

29-Ju

l-04

22-Ju

l-04

13-Ju

l-04

28-Ju

n-04

23-Ju

n-04

8-Jun

-04

19-M

ay-04

28-A

pr-04

24-M

ar-04

ASU Log Cost ModelUsing Lognormal Probability Density Function

Why Statistical Process Control?Successful Quantitative Project Management

Average performance

Upper Control Limit

Lower Control Limit

Analysis of special cause variation focuses on recognizing &preventing deviations from this pattern Analysis of common cause variation focuses on improving the average and tightening the control limitsSPC offers opportunities for systematic process improvement thatNGC & industry benchmarks indicate will yield an ROI averaging between 4:1 & 6:1

A stable processoperates within the

control limits 99.7% of the time

Page 5: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

5 ISER-MLB-PR-08-137

Case Study Essentials

• Data represent software-related peer reviews conducted at Northrop Grumman’s Integrated Systems Eastern Region – Melbourne, Florida facility between March 2004 and October 2008

• One peer review process (now standard for Integrated Systems)– Covers the entire system life cycle from system requirements analysis &

architecture through maintenance– Requires the peer review of all major systems, software & test artifacts– Uses an automated data base tool that integrates data quality & process control

features• All peer review records captured in the data base

– > 5,700 source code peer reviews– > 1,100 other software-related peer reviews

• CMMI Level 5 appraisals– CMMI-SE/SW (V1.1) in 2005– CMMI-SE/SW/IPPD/SS (V1.1) in 2006– CMMI-DEV+IPPD (V1.2) scheduled for 2009

Page 6: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

6 ISER-MLB-PR-08-137

Getting Started - 2004Getting Started - 2004

Overcoming Technical Difficulties & Learning To Love Logarithms

ISER-MLB-PR-08-137

Page 7: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

7 ISER-MLB-PR-08-137

SW-CMM Level 4 Prior State (1998-2003)

• Software development baseline characterized by life cycle phase – SW Requirements-Design-Code & Verification-SW Integration-Software Test– 10+ year process improvement record resulted in costs reduced by over 67%

• But we had no CMMI “Gestalt”– No insight into the statistical behavior of lower level elements– No “above the shop floor” experience with statistical sub-process control– No insight into downstream behavior

• We wanted to control product quality, but were thwarted by issues with our process quality– Inconsistent data– Superficial results

• Root cause analysis traced this to indifferent attention paid tomanaging peer reviews– We realized we had to control the efficiency of our peer reviews in terms of the

effort spent (peer review cost), based on classic industry guidelines that efficient reviewers operate in a “sweet spot” of about 200 lines of code per review hour

Page 8: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

8 ISER-MLB-PR-08-137

Our Problem

• Anderson-Darling test p < 0.005

• Data non-normality & asymmetry violated probability model assumptionsMedian

Mean

95% Confidence Intervals

Summary for Cost/LOC

Cost/LOC

Perc

ent

99.9

99

95

90

80706050403020

10

5

1

0.1

Probability Plot of Cost/LOCNormal

Review Closed Date

Hou

rs p

er N

ew/M

od L

OC

17-A

ug-0

4

29-Ju

l-04

22-Ju

l-04

12-Ju

l-04

25-Jun

-04

23-Jun

-04

8-Jun-

04

18-M

ay-04

28-A

pr-04

24-M

ar-04

1

1

1

Raw Data for Cost/LOC

ASU Eng Chks & Elec Mtngs

ASU Cost ModelControl Chart Difficulties• 11% false alarm rate (Chebyshev’s inequality)

• Penalized due diligence in reviewing code• No meaningful lower control limit

• Did not flag superficial reviews• Arithmetic mean distorted the central tendency

• Apparent cost did not meet budget

Could We Control Our Peer Reviews?

Data Characteristics

Page 9: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

9 ISER-MLB-PR-08-137

Stabilizing the Data

• Senior author’s presentation at 2005 CMMISM Technology Conference demonstrated how a log-cost model can successfully control software code inspections

Logarithms Can Be Your Friends

November 16, 2005

Richard L. W. Welch, PhDChief StatisticianNorthrop Grumman Corporation

Controlling Peer Review Costs

– Peer review unit costs (hours per line of code) behave like commodity prices in the short term

– Short term commodity price fluctuations follow a lognormal distribution– As a consequence, commodity prices follow a lognormal distribution– Therefore, taking the natural logarithm of a sequence of peer review costs

transforms the sequence to a normally distributed series

Notes:• Details on the log-cost model, “one of the most ubiquitous models in finance,” can be found at riskglossary.com

(http://www.riskglossary.com/articles/lognormal_distribution.htm)• Prior CMMI Technology Conference & User Group papers are published on-line at: http://www.dtic.mil/ndia/

Page 10: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

10 ISER-MLB-PR-08-137

Our Data on Logs

A Textbook Demonstration of A Textbook Demonstration of an Inan In--control, Stable Processcontrol, Stable Process

Anderson-Darling test p < 0.759

ln (COST/LOC)

Perc

ent

99.9

99

9590

80706050403020

105

1

0.1

Probability Plot of ln(COST/LOC)Normal

ASU Eng Checks & Elec Mtngs

Median

Mean

95% Confidence Intervals

Summary for ln(COST/LOC)

ASU Eng Checks & Elec Mtngs

ASU Peer Reviews

• Impacts– False alarms minimized– Meaningful lower control limit– Geometric mean preserves the budget

• OK, you still have to find the antilog• Demonstrated utility & applicability

– > 6,800 peer reviews over 5 years provide large sample validation

Review Closed Date

ln (

Hou

rs p

er N

ew/M

od L

OC)

17-A

ug-04

29-Ju

l-04

22-Ju

l-04

12-Ju

l-04

25-Ju

n-04

23-Ju

n-04

8-Ju

n-04

18-M

ay-04

28-A

pr-04

24-M

ar-04

ASU Log Cost Model - Initial Release

ASU Eng Checks & Elec Mtngs

Using Lognormal Probability Density Function

Data Through Review 7351 on 8/25

Page 11: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

11 ISER-MLB-PR-08-137

The High Maturity Data DilemmaWhy Management Can’t Have It Tomorrow

time

Δt2Δt1Δt0

Improved performanceStable performanceUnstable performance

Δt1 & Δt2 :• Identify improvement proposals• Evaluate & prioritize proposals• Select improvement• Pilot improvement• Deploy improvement

Δt0 :• Process selection• Analysis of suitability

for SPC

We can minimize Δt0, Δt1 & Δt2 by careful management, but the length of the data runs will depend on the periodicity of the process itself

Page 12: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

12 ISER-MLB-PR-08-137

Handling Human Issues - 2005Handling Human Issues - 2005

And Institutionalizing the Process

ISER-MLB-PR-08-137

Page 13: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

13 ISER-MLB-PR-08-137

2005 Challenges

• First demonstration of CMMISM Level 4 and 5 capabilities focused on code inspections and parallel effort to control peer reviews of software test plans, procedures and reports– High data rates inherent in these back-end processes helped us to understand

and overcome statistical difficulties– We gained practical lessons learned on the obstacles that had to be overcome

• Desire to introduce successful SPC techniques for quantitative project management in the front-end system and software design phases

• When coding starts– Product development is one-half over – Opportunities to recognize and correct special & common cause variation in the

design process are gone

FirstFirst--year Decisions Determine up to 70% year Decisions Determine up to 70% of Total Life Cycle Cost on DoD Programs. of Total Life Cycle Cost on DoD Programs. Early, Effective Statistical Control Offers Early, Effective Statistical Control Offers

Great Practical BenefitGreat Practical Benefit

Page 14: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

14 ISER-MLB-PR-08-137

Practical Difficulties at Level 4

• Getting started– Selecting good candidates for

statistical management• Statistical innumeracy

– Discipline needs to own the right skill set

• Little historical data & inherently low data rates– Personnel need familiarity

with robust statistical procedures

• Cautionary note: you must also take care of the basics (CMMISM

Level 3)– Budget and charter

• Project impacts– Metrics infrastructure across

engineering• Metric definitions• Data collection mechanisms• Consistency of processes

across projects

AGS&BMS-PR-06-122

Statistical Control of System and Software Design Activities

November 15, 2006

Richard L. W. Welch, PhDChief StatisticianApril KingSystems EngineerNorthrop Grumman Corporation

–“Outstanding Presentation for High Maturity”–“Conference Winner”

Note: Prior CMMI Technology Conference & User Group papers are published on-line at: http://www.dtic.mil/ndia/

Page 15: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

15 ISER-MLB-PR-08-137

Getting StartedProcess Selection for Statistical Management

• Statistical control is imposed on sub-processes at an elemental level in the process architecture

• Processes are selected based on their– Business significance – “sufficient conditions”– Statistical suitability – “necessary conditions”

• Business checklist– Is the candidate sub-process a component of a project’s defined key process?

• Is it significant to success of a business plan goal?• Is it a significant contributor to an important estimating metric in the

discipline?– Is there an identified business

need for predictable performance?• Cost, schedule or quality

– How does it impact the business?• Need to map sub-process ↔

process ↔ business goal• Statistical checklist (table)

Page 16: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

16 ISER-MLB-PR-08-137

Overcoming Statistical InnumeracySuccess Factors

• Minitab• “Dark green belt” training

• Curriculum tailored to focus on applied statistical techniques and Minitab familiarity

• Deming principle applied in the class room– In God we trust, all others bring data

• Lean and process management training covered in other courses• Green belt community of practice• Chief statistician

Key Success Factors: Management Key Success Factors: Management Recognition & Support for the InvestmentRecognition & Support for the Investment

Page 17: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

17 ISER-MLB-PR-08-137

Post 2005 Follow-up

• Sector standards for certifying Green Belts, Black Belts & Master Black Belts (2006)– Training– Project portfolio

• Green Belt certification (2006)• Black Belt cadre (2007-2008)• Future Master Black Belt cadre (2009+)

Success Creates a Continuing Need to Grow the Infrastructure

Page 18: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

18 ISER-MLB-PR-08-137

Pitfalls

Date

Dev

iati

on -

Pla

nned

v. A

ctua

l (Sh

ifts)

06-Oct-0622-Sep-06

08-Sep-0625-Aug-06

11-Aug-0628-Jul-06

14-Jul-0630-Jun-06

16-Jun-0602-Jun-06

19-May-0605-May-06

21-Apr-06

_X

LCL

UCL

SIL Utilization Planning PerformanceTSSR - SW Development I Lab

Positive = Worked more Shifts than Planned Negative = Worked less Shifts than PlannedWorksheet: Worksheet 1; 10/9/2006

Build

dur

atio

n ti

me

_X

UCL

LCL

1

1

1

1

1

1

Build Times by Build DateLast 50 Builds are displayed.

Worksheet: Worksheet 1; 10/9/2006

Review Effort (Per Change)

Perc

ent

99

90

50

10

1Review Effort (Per Change)

Perc

ent

99

90

50

10

1

Review Effort (Per Change)

Perc

ent

99

90

50

10

1Review Effort (Per Change)

Perc

ent

90

50

10

1

3-Parameter LognormalAD = 0.337 P-Value = *

ExponentialAD = 1.657 P-Value = 0.018

Goodness of Fit Test

NormalAD = 0.832 P-Value = 0.025

LognormalAD = 0.398 P-Value = 0.329

Probability Plots for Physical Model DataNormal - 95% CI Lognormal - 95% CI

3-Parameter Lognormal - 95% CI Exponential - 95% CI

Review Effort (Per Change)

Perc

ent

99

95

80

50

20

5

1Review Effort (Per Change)

Perc

ent

99

95

80

50

20

5

1

Goodness of F it Test

LoglogisticAD = 0.313 P-Value > 0.250

3-Parameter LoglogisticAD = 0.299 P-Value = *

Probability Plots for Physical Model Data

Loglogistic - 95% CI 3-Parameter Loglogistic - 95% CI

Review Effort (Per Change)

Perc

ent 99

90

50

10Review Effort (Per Change)

Perc

ent

99

90

50

10

1

Review Effort (Per Change)

Perc

ent

99

90

50

10

1Review Effort (Per Change)

Perc

ent

99

90

50

10

1

3-Parameter GammaAD = 0.373 P-Value = *

LogisticAD = 0.641 P-Value = 0.055

Goodness of F it Test

Largest Extreme ValueAD = 0.380 P-Value > 0.250

GammaAD = 0.372 P-Value > 0.250

Probability Plots for Physical Model DataLargest Extreme Value - 95% C I Gamma - 95% C I

3-Parameter Gamma - 95% C I Logistic - 95% CI

Review Effort (Per Change)

Perc

ent

90

50

10

1Review Effort (Per Change)

Perc

ent

90

50

10

1

Review Effort (Per Change)

Perc

ent

90

50

10

1Review Effort (Per Change)

Perc

ent

90

50

10

1

3-Parameter WeibullAD = 0.420 P-Value = 0.351

Smallest Extreme ValueAD = 1.408 P-Value < 0.010

Goodness of Fit Test

2-Parameter ExponentialAD = 1.071 P-Value = 0.045

WeibullAD = 0.456 P-Value = 0.246

Probability Plots for Physical Model Data2-Parameter Exponential - 95% CI Weibull - 95% CI

3-Parameter Weibull - 95% CI Smallest Extreme Value - 95% CI

Dealing With What Was Taught – but Not Learned – in Green Belt Class

Page 19: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

19 ISER-MLB-PR-08-137

Growing the Benefits - 2006Growing the Benefits - 2006

More Processes, Projects, Disciplines

ISER-MLB-PR-08-137

Page 20: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

20 ISER-MLB-PR-08-137

Growth

• After our initial success, we aggressively expanded the use of SPC techniques in all engineering projects– Led by senior management– Clear expectations of significant benefit to the business– Particular focus on our hardware and logistics disciplines

• By year-end 2006, we had gone from the original 4 sub-processes under control in 2 Engineering homerooms to 30 sub-processes under control in 6 Engineering homerooms– Expect ~45 sub-processes that are significant to our business under active control

by year-end 2008

1 AGS&BMS-PR-06-107

Statistically Managing a Critical Logistics Schedule Using CMMI

Robert TuthillNorthrop Grumman Integrated Systems

November 2007

–“Outstanding Presentation for High Maturity”–“Conference Winner”

Note: Prior CMMI Technology Conference & User Group papers are published on-line at: http://www.dtic.mil/ndia/

Page 21: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

21 ISER-MLB-PR-08-137

A Humorous SidebarIdentifying Special Causes

• As part of this effort, our Test & Evaluation personnel analyzedsome 2004-2005 baseline data, and asked what has become one of our favorite statistical questions:

Can isolated points be considered as special cause points, and be deleted from a data set as outliers, even though they don’t falloutside of the 3-sigma control limits?

Week Ending (Date)

Dev

i at i

on-

Plan

ned

v .A

c tua

l(Sh

ifts)

01/1

4/200

5

12/2

4/200

4

12/0

3/200

4

11/1

2/200

4

10/2

2/200

4

10/0

1/200

4

09/1

0/200

4

08/2

0/200

4

07/3

0/200

4

5

Worksheet: MINITAB.MTW; 2/18/2005

Hurricanes

SIL Utilization Planning PerformanceOCTL Integration Lab

Positive = Worked more Shifts than PlannedNegative = Worked less Shifts than Planned

Week Ending (Date)

Dev

i at i

on-

Plan

ned

v .A

c tua

l(Sh

ifts)

01/1

4/200

5

12/2

4/200

4

12/0

3/200

4

11/1

2/200

4

10/2

2/200

4

10/0

1/200

4

09/1

0/200

4

08/2

0/200

4

07/3

0/200

4

Week Ending (Date)

Dev

i at i

on-

Plan

ned

v .A

c tua

l(Sh

ifts)

01/1

4/200

5

12/2

4/200

4

12/0

3/200

4

11/1

2/200

4

10/2

2/200

4

10/0

1/200

4

09/1

0/200

4

08/2

0/200

4

07/3

0/200

4

5

Worksheet: MINITAB.MTW; 2/18/2005

Hurricanes

SIL Utilization Planning PerformanceOCTL Integration Lab

Positive = Worked more Shifts than PlannedNegative =

5

Worksheet: MINITAB.MTW; 2/18/2005

Hurricanes

SIL Utilization Planning PerformanceOCTL Integration Lab

Positive = Worked more Shifts than PlannedNegative = Worked less Shifts than Planned

Francis Jeanne

Page 22: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

22 ISER-MLB-PR-08-137

Increasing Our Effectiveness – 2007-2008Increasing Our Effectiveness – 2007-2008

Exploring New Techniques

ISER-MLB-PR-08-137

Page 23: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

23 ISER-MLB-PR-08-137

• Organizational Impact• 100% of delivered code is peer reviewed• Average of 105 reviews completed per month• This activity affects all major development & test activities after software design

– SW Implementation– Software Test– System Test– Ground & Flight Test Support

• Code review effort constitutes a significant portion of the earned value credit in these phases

• Benefits• Increased early defect detection• Fewer delivered defects• Increased code maintainability, reduced cost on future sustaining programs

What Is the Benefit?

SW Life Cycle Phase Contribution Factors are Dependent on Project Type & ComplexitySystem Requirements Analysis & Allocation Implement Software

System Integration, Verification & Validation

System Integration &

Test

SW Requirements

SW DesignSW

IntegrationSW

Implementation SW Test

Peer Reviews Have a Significant Impact on Downstream Product Quality and Development Costs

Page 24: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

24 ISER-MLB-PR-08-137

Analytical Approach

• Use accumulated data to explore factors related to reviewer performance and experience

• Use a multivariate clustering procedure (agglomerative hierarchical method) to identify groups of reviewers with similar performance characteristics (initially not known)

• Decide how many groups are logical for the data and classify accordingly

Reviewer

Sim

ilari

ty

73863712131040911210980908479161415754399899210

178958572706987867110394678365622247534332292360276159247710

78264631059881573411

0919775106309676887411

210

210

05048108362010

411

16893662819581821425144265245413146493933352556551738541

0.00

33.33

66.67

100.00

Dendrogram with Complete Linkage and Pearson Distance

• Three reviewer performance categories support the needed level of insight– Group 1 reviewers have lots of review experience, review at the best rates, &

identify the most defects– Group 2 reviewers are newer & less experienced (reflected by the number of

reviews they have completed), with a wide range of rates and discovered defects– Group 3 reviewers have lots of experience, review at fast rates, but identify

significantly fewer defects

Page 25: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

25 ISER-MLB-PR-08-137

A Serious SidebarMeasuring Individual Performance

• It violates the peer review process to use numbers of defects to measure the Author’s performance (“killing the goose that lays the golden egg”)– Need reviewers free to report any issues they find, even if they are not totally sure

that the item is truly a defect– Even the very best and most conscientious engineers create defects – the primary

objective of the review is to find and remove any defects• Peer review database design enables study of individual reviewer

performance – with the express goal of increasing skills through vital, focused training– Good reviewers provide an essential contribution both for the author and the

company – reviewer diligence should be encouraged, recognized and rewarded– Cumulative data on reviewer performance provides valuable insight - similar to

measurements applied in sports

Reviewer Knowledge and Skill Are Key–Knowing What to Look for & How to Find It…

Page 26: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

26 ISER-MLB-PR-08-137

Causal Analysis & Resolution

• After the fact analysis by Group 1 reviewers indicated Group 2 and Group 3 reviewers consistently miss defects

• A retrospective study focused on the common types of defects being discovered

• An improvement team identified ways to increase the skills of Group 2/3 reviewers – Pair Group 2/3 reviewers with Group 1 mentors– Review and update coding standards to clarify descriptions or address missing

elements – Develop and deliver a technical-level review training course to provide refreshed

or deeper insight into ‘problematic’ programming issues• ‘Problematic’ programming issues were identified based on team member

experience and results from the retrospective study – Enhance checklists All Defects from August 2, 2006 - March 20, 2007

0

50

100

150

200

250

300

350

commen

tslogic

requirem

ent

unneeded

code

minor

header

problem typ

ostyl

e guide

initializ

ation

incorre

ct historyleg

acy

missing co

de

missing cas

es

coding st

anda

rds/c

onven

tions

code m

erge

error c

heckin

g

error h

andlin

ghard

code

d

naming co

nventi

on

duplic

ate co

de

bounds chec

king

code e

fficien

cyco

de librar

ies

Defect Types

Num

ber o

f Def

ectsCommon Issues Emerge

When We Examine Defect Types & Frequencies

Page 27: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

27 ISER-MLB-PR-08-137

System Requirements Analysis & Allocation

Peer Review Effectiveness Metrics

Implement SoftwareSystem Integration,

Verification & Validation

System Integration &

Test

SW Requirements SW Design

SW Integration

SW Implementation SW Test

1

Units1 Percentage of Group 1 Reviewers Percent of Total Reviewers2 Average Detected Defect Density Defects per Thousand SLOC3 Average Review Rate SLOC per Review Hour4 Log Cost Model Log(Total Hours per SLOC)

Peer Review Effectiveness Metrics

Code & Update Build Drivers & StubsCreate Test Cases

Software Implementation

Peer Review & Test Code

2 3 4

Group 350, 47.2%

Group 243, 40.6%

Group 113, 12.3%

Distribution of Reviewers by Group

Metrics:

1

Reviewer Category

Defe

cts

p er

1000

SLO

C

Group 3Group 2Group 1

95% CI for the MeanDefect Density vs Revie wer Gr oup

1 2 3 4

Review Rate (SLOC per Hour)

Defe

cts

p er

KSLO

C

10000100010010

125 200 350Group 1Group 2Group 3

Relationshi p Between Detected Defect Density and Review RateEffective Range of Review Pace is Between 125 and 350 SLOC per Hour

Data refects cumulative average of reviewer experience since March 2004

2 3

24-Oct-0723-Oct-07

19-Oct-0716-Oct-07

15-Oct-0710-Oct-07

9-Oct-078-Oct-07

3-Oct-071-Oct-07

26-Sep-07

Review Closed Date

ln (

Hou

rs p

er N

ew/M

odif

ied

LOC

)

Source Code Peer Review Cost Data

Worksheet: Worksheet; 11/26/2007

4

Units1 Percentage of Group 1 Reviewers Percent of Total Reviewers2 Average Detected Defect Density Defects per Thousand SLOC3 Average Review Rate SLOC per Review Hour4 Log Cost Model Log(Total Hours per SLOC)

Peer Review Effectiveness MetricsMetrics:

Note: SW Test & System Test are decomposed similarly

Page 28: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

28 ISER-MLB-PR-08-137

Group 350, 47.2%

Group 243, 40.6%

Group 113, 12.3%

Distribution of Reviewers by Group

We used four ways to measure changes from the initial March 2007performance baseline. Demonstrating skill development in review effectiveness does not lend to routine control chart monitoring

Forecasting the Outcome

Reviewer Category

Def

ect

sp

er

10

00

SLO

C

Group 3Group 2Group 1

95% CI for the MeanDefect Density vs Reviewer Group Detected defect

density should increase or remain the same

24-Oct-0723-Oct-07

19-Oct-0716-Oct-07

15-Oct-0710-Oct-07

9-Oct-078-Oct-07

3-Oct-071-Oct-07

26-Sep-07

Review Closed Date

ln (

Hou

rs p

er N

ew/M

odif

ied

LOC

)

Source Code Peer Review Cost Data

Worksheet: Worksheet; 11/26/2007

Process cost performance should remain stable

Review Rate (SLOC per Hour)

Def e

cts

per

KSLO

C

10000100010010

125 200 350Gr oup 1Gr oup 2Gr oup 3

Relationship Between Detected Defect Density and Review RateEffective Range of Review Pace is Between 125 and 350 SLOC per Hour

Data refects cumulative average of reviewer experience since March 2004

Population should cluster around ideal review rate of 200 LOC/Hr (industry std)

Population of Group 1 (Blue) should increase

Page 29: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

29 ISER-MLB-PR-08-137

Group 350, 47.2%

Group 243, 40.6%

Group 113, 12.3%

Distribution of Reviewers by Group

March 2007

Verifying the Outcome

Group3Group2Group1ReviewerCategory

Defe

ctsp

er1 0

00SL

OC

95%CIfortheMeanDefectDensityvsReviewerGroup

March 2007

Group3Group2Group1ReviewerCategory

Defe

ctsp

erKS

L OC

123

Group

95%CIfor theMeanIntervalPlotofDefectDensitybyReviewerGroup

UpdatedBaseline6/13/08

June 2008

Overall Discovered Defect Density Increased by 56%

Group 342, 46.7%

Group 231, 34.4%

Group 117, 18.9%

Distribution of Reviewers by GroupData as of June 2008

June 2008

Group 1 Reviewers Increased by 53%

Page 30: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

30 ISER-MLB-PR-08-137

24-Oct-0723-Oct-0719-Oct-07

16-Oct-07

15-Oct-07

10-Oct-079-Oct-078-Oct-073-Oct-071-Oct-0726-S

ep-07

ReviewClosedDate

ln(H

o urs

perN

ew/M

odifi

edLO

C)

SourceCodePeer ReviewCostData

Worksheet: Worksheet; 11/26/2007

March 2007

Checking the Control Variables

ReviewRate (SLOCper Hour)

Defe

cts

perK

S LO C

10000100010010

125 200 350Group 1Group 2Group 3

RelationshipBetweenDetectedDefect DensityandReviewRateEffectiveRange of Review Pace is Between 125 and350 SLOC per Hour

Data refects cumulative average of reviewer experience since March2004

March 2007

10-Jul-0808-Jul-08

08-Jul-0802-Jul-08

30-Jun-0826-Jun-08

26-Jun-0826-Jun-08

19-Jun-0818-Jun-08

16-Jun-08

ln( H

ours

per

New

/Mod

i fied

LOC)

11

EngineeringCheck& Electronic MeetingPeer ReviewCostData

ReviewClosedDateWorksheet: Source Code Reviews; 7/14/2008

June 2008

PredictableProcess

10000100010010ReviewRate (SLOC/Hours)

Defe

cts

per

KSLO

C

350125 200

123

Group

RelationshipBetweenDetectedDefectDensityandReviewRate

Data reflects cumulative review experience since March 2004 - as of 6/13/08

June 2008StableReview

Rate

Number of reviewers performing in the ideal range increased

Page 31: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

31 ISER-MLB-PR-08-137

Future Pathways

Maintain strategic focus to sharpen skills

Continue to support inexperienced developers with mentoring

Maintain periodic skill enhancement training

• Continue the quest to remove impediments

• Better integrated toolsets

• Improved coding standards

In Progress

Bottom Line Motivator:The 2007-2008 Initiative Has Resulted in a 12% Reduction in the Number of Software

Bugs per Release

In Progress

Page 32: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

32 ISER-MLB-PR-08-137

Questions

Richard L. W. Welch, PhDNorthrop Grumman Corporation(321) [email protected]

Steve D. TennantNorthrop Grumman Corporation(321) [email protected]

Page 33: Controlling Peer Reviews During Software …...1 ISER-MLB-PR-08-137 Controlling Peer Reviews During Software Development Richard L. W. Welch, PhD Associate Technical Fellow Steve D.

33 ISER-MLB-PR-08-137