12/07/2015Dr Andy Brooks1 Fyrirlestrar 17 & 18 Does Code Decay? “As part of our experience with...

Post on 22-Dec-2015

214 views 0 download

Transcript of 12/07/2015Dr Andy Brooks1 Fyrirlestrar 17 & 18 Does Code Decay? “As part of our experience with...

19/04/23 Dr Andy Brooks 1

Fyrirlestrar 17 & 18

Does Code Decay?

“As part of our experience with the production of software for a large telecommunications system, we have observed a nearly unanimous feeling among developers of the software that the code degrades through time and maintenance becomes increasingly difficult and expensive.”

Eick et al, 1998

MSc Software MaintenanceMS Viðhald hugbúnaðar

19/04/23 Dr Andy Brooks 2

Case StudyDæmisaga

ReferenceDoes Code Decay? Assessing the Evidence from Change Management Data, Stephen G Eick, Todd L Graves, Alan F Karr, J S Marron, and Audris Mockus, NISS-TR-81 (1998), National Institute of Statistical Sciences, 19 T. W. Alexander Drive, PO Box 14006, Research Triangle Park, NC 27709-4006, USAhttp://www.niss.org/technicalreports/tr81.pdf

“Whether this code decay is real, how it can be characterized, and the extent to which it matters are the questions we address in this paper.”

Eick et al, 1998

19/04/23 Dr Andy Brooks 3

Previous Work

“Early investigations of aging in large software systems by Belady and Lehman [2], [3], [4] reported the near impossibility of adding new code to an aged system without introducing faults.”

Eick et al, 1998

19/04/23 Dr Andy Brooks 4

Access To Large Data Set

• Entire change management history of a 15 year old, real-time, software system for telephone switches:– 100,000,000 lines of code

• C, C++, proprietary state description language

– 100,000,000 lines of header and make files– Some 50 major subsystems and 5,000 modules

• Here, a module is a directory containing several files.

– Each release is some 20,000,000 lines of code

• 10,000 developers have been involved.

19/04/23 Dr Andy Brooks 5

Categories Of Change

• Adaptive– new functionality (e.g. caller ID)– adaptions to new hardware or other changes

in environment

• Corrective– fixing faults

• Perfective– improve maintainability of software

• reengineering (refactoring)

19/04/23 Dr Andy Brooks 6

Change Process

• A new feature (e.g. call waiting) involves hundreds of Initial Modification Requests (IMRs).

• Each IMR results in a number of Modification Requests (MRs) .

• Developers open MRs, perform the changes and make limited checks that the changes are satisfactory.– Inspections and integration and system tests follow.

• An editing change to a single file is captured as a delta.– Lines added and deleted are tracked separately.– Line edits involve first deletion, then addition.

19/04/23 Dr Andy Brooks 7

Data Tracked By Version Management System

89 fields including priority, date opened, date closed

problem

solution

(change & reasons)

19/04/23 Dr Andy Brooks 8

Answering Questions About Change Data

D - directly from version management databaseA - by aggregation over constituent partsD* - problematic aspects

What files were changed, How many modules, files, and lines were affected?...

19/04/23 Dr Andy Brooks 9

What Is Code Decay?

• “Code is decayed if it is more difficult to change than it used to be.”– But increases in difficulty of making changes

may be as a result of an increase in the inherent difficulty of requested changes.

• Decayed code does not mean that the software fails to meet current requirements.– Decayed code means it is difficult to add new

functionality or make other changes.

19/04/23 Dr Andy Brooks 10

What Is Code Decay?• Decayed code may have increased value.

– The changes that have caused the decay mean more functionality for the customer.

• A code unit can decay as a result of changes elsewhere in the software.

• A code unit can be inherently complex and to attribute the difficulty of making a change to decay can be misleading.

19/04/23 Dr Andy Brooks 11

Individual Ability

• Making changes is less difficult for a more more able software maintainer.

• Making changes is more difficult for a junior software maintainer.

• “A definitive adjustment for developer ability has not been devised and usually we must relegate developer variability to ‘noise’ terms in our models.”

19/04/23 Dr Andy Brooks 12

Causes Of Decay

1. Inappropriate architecture• changes have wide scope

2. Violation of original design principles• fixed phone -> mobile/fixed phone

3. Imprecise requirements• ‘crisp code’ not produced

4. Time Pressure• short-cuts, sloppy code, kludges• limited code understanding

19/04/23 Dr Andy Brooks 13

Causes Of Decay5. Inadequate programming tools6. Organizational Environment

• excessive staff turnover• developers fail to communicate properly

7. Programmer variability• weak programmers may not understand

complex code written by more able colleagues

8. Inadequate change process• missing version control• handling changes in parallel

19/04/23 Dr Andy Brooks 14

Medical Metaphor• The software is a patient with a disease

called code decay.• What are the causes of the disease?

– changes made to the code

• What are the disease symptoms?• What are the prognoses if you have the

disease?• What are the relevant risk factors for the

disease?

Sjúkdómseinkenn batahorfur

19/04/23 Dr Andy Brooks 15

Symptoms Of Code Decay

1. Excessively complex code• useful metrics:

• standard software complexity metrics?• # loops & conditionals enclosing a line?

2. A history of frequent changes• also known as ‘code churn’

3. A history of faults• fault fixes themselves may not be examples

of good programming

19/04/23 Dr Andy Brooks 16

Symptoms Of Code Decay

4. Widely dispersed changes• Changes to well-engineered code tend to be

local (within a class).

5. Kludges• Changes made knowing it could have been

done more elegantly or more efficiently.

6. Numerous Interfaces (entry points)• Possible side-effects of changes elsewhere.

19/04/23 Dr Andy Brooks 17

Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect.

1. Size of module m• NCSL(m), number of noncommentary source lines

2. Age of Code• but very stable code might never be changed• variability of age within a code unit may be the key

characteristic3. Inherent Complexity

• real-time software is more likely to decay4. Organizational Churn

• company knowledge base degraded• inexperienced developers make changes

19/04/23 Dr Andy Brooks 18

Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect.

5. Ported or Reused Code• Ariane 5 crash was caused by reused code from Ariane 4

• http://edition.cnn.com/WORLD/9606/04/rocket.explode/

6. Requirements Load• very many requirements are difficult to understand and

implement

7. Inexperienced Developers• lack of knowledge• lack of understanding of system architecture

3-tier?

19/04/23 Dr Andy Brooks 19

Code Decay Indices (CDIs) notation

• c for changes (MRs)• l for lines of code• f for files• m for modules• c->m means ‘c touches m’

– Part of m is changed by c.

• 1{A}– equals 1 if event A occurs– equals 0 otherwise

19/04/23 Dr Andy Brooks 20

Code Decay Indices (CDIs) notation

• DELTAS(c)– number of deltas associated with c

• ADD(c)– number of lines added by c

• DEL(c)– number of lines deleted by c

• DATE(c)– date on which c is completed

• INT(c)– the calendar time required to implement c

• DEV(c)– number of developers implementing c

19/04/23 Dr Andy Brooks 21

Historical Count Of Changes

• The number of changes to a module m in the time interval I:

mc

IcDATEImCHNG })({1),(

• With |I| indicating length of time interval I, the frequency of changes is:

),(1

),( ImCHNGI

ImFREQ

19/04/23 Dr Andy Brooks 22

Span Of Changes Scope of Changes

• The span is the number of files touched by a change:

• Changes touching more files are more difficult because:i. The maintainer might have to spend time understanding unfamiliar files.ii. Code interfaces might have to be modified.

}{1)( fccFILESf

19/04/23 Dr Andy Brooks 23

Size

• The size of a module m is NCSL(m) summing over all files f in m.

• “most standard software complexity metrics are almost perfectly correlated with NCSL in our data sets”

19/04/23 Dr Andy Brooks 24

Age

• AGE(m)– the average age of constituent lines

• Variability in line ages is also of interest• The tool SeeSoft produces a visualization of the

variability in line ages:– files represented by boxes– lengths of lines in the boxes proportional to the

number of characters– files that change little have mostly a single colour– files that have been changed a lot are multi-colored

19/04/23 Dr Andy Brooks 25

SeeSoft View Of One Module

19/04/23 Dr Andy Brooks 26

SubSystem Under Analysis

• 100 modules

• 2,500 files

• 6,000 IMRs

• 27,000 MRs

• 130,000 deltas

• 500 different login names made code changes to the subsystem

X 100

19/04/23 Dr Andy Brooks 27

Temporal Behavior Of The Span Of Changes

• Probabilities that a change will touch more than one file doubles from less than 2% in 1989 to more than 5% in 1996.

• Ripples in the high resolution smooth are not statistically significant.

initial development

(different window widths)

Date

89 96

19/04/23 Dr Andy Brooks 28

Breakdown In Modularity?

• Alone, the increase in span of changes does not imply a breakdown in the modularity of the subsystem.

• The increase could simply reflect the growth of the subsystem and changes with a wide span need not cross module boundaries.

cc

cc

19/04/23 Dr Andy Brooks 29

Network Visualization Tool NicheWorks

• Each tadpole shape corresponds to a module.– The tadpole tail indicates the picture at the

end of the previous year.

• Pairs of modules are placed nearby if they have been changed together as part of the same MRs a large number of times.

19/04/23 Dr Andy Brooks 30

NicheWorks View Of The SubSystem Modules1988 1989

1996The architecture that separated the functionally of two clusters of modules is breaking down.

19/04/23 Dr Andy Brooks 31

Alternative Interpretation

The inherent difficulty of the desired changes could have been increasing.

The modification request data are not examined independently from this perspective.

provide an extra area-code digit implement caller-ID

19/04/23 Dr Andy Brooks 32

Prediction Of Faults Quality Prognosis• The best model derived from the data predicts numbers of

faults using numbers of changes to the module in the past.• Large recent changes add most to the fault potential.

• Parameter 0.75 was determined by statistical analysis.• The number of times a module has been changed is a

better predictor than size.• The number of developers working on a module had no

effect on fault potential.

),(),(log)( )(75.0 mcDELmcADDemFPmc

CxDATEWTD

19/04/23 Dr Andy Brooks 33

Prediction Of Effort Effort Prognosis

• “Can the effort required to implement changes be predicted from symptoms and risk factors for decay?”

• Effort data, available only at the feature level, displayed extreme variability, so suggestive results only:– A dependency on FILES(c) was discovered

supporting the idea that the span of changes is a symptom of decay.

• Some changes involved a small number of deltas but required close to maximum effort.

19/04/23 Dr Andy Brooks 34

Summary Eick at al

1. “The increase over time in the number of files touched per change to the code.

2. The decline in modularity of a subsystem of the code, as measured by changes touching multiple modules.

3. Contributions of several factors (notably, frequency and recency of change) to fault rates in modules of code, and

4. That span and size of changes are important predictors (at the feature level) of the effort to implement a change.”

Four analyses demonstrate:

19/04/23 Dr Andy Brooks 35

Summary Eick at al

• The system studied showed no evidence of dramatic, widespread decay:– In seven years, the probability of a change touching

more than 1 file increased only from 2% to 5%.– The architecture that separated the functionally of two

clusters of modules is breaking down.

• Can code decay prove fatal?– “there are anecdotal reports of systems that have

reached a state from which further change is not possible”

19/04/23 Dr Andy Brooks 36

Modification Request Difficulty

• Analysing the nature of the modification requests over time was not done and alternative interpretations of the data set cannot be rule out.

• How can you measure the inherent difficulty of a modification request?– By the span of changes?– By the complexity of the textual description & justification?

• The temporal behaviour of the span of changes could be due to the inherent difficulty of modification requests increasing with time.