Computing ripple effect for software maintenance

JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICEJ. Softw. Maint. Evol.: Res. Pract. 2001; 13:263–279 (DOI: 10.1002/smr.233)

Research

Computing ripple effect forsoftware maintenance

Sue Black∗,†

Centre for Systems and Software Engineering, South Bank University,London SE1 0AA, U.K.

SUMMARY

Recent software maintenance models have included impact analysis and accounting for ripple effect as oneof their stages. This paper describes and explains the reformulation of Yau and Collofello’s ripple-effectalgorithm and its validity within the software-maintenance process. Completely automatic computationof ripple effect has until now proved troublesome; we show how our approximation algorithm helps toovercome this. Our Ripple Effect and Stability Tool (REST) which uses our approximated algorithm tocompute ripple effect for C programs, is described. Eleven C programs are used in an initial investigationinto whether our approximated algorithm can replace Yau and Collofello’s original algorithm for thepurpose of automatic computation of ripple effect. The Pearson correlation coefficient for the two versionsof the algorithm across the eleven programs shows a high correlation. Copyright 2001 John Wiley & Sons,Ltd.

KEY WORDS: ripple effect; matrix arithmetic; impact analysis

1. INTRODUCTION

The ripple-effect measure has been identified as valid and necessary within several softwaremaintenance models, particularly the SADT model [1] and the Methodology for Software Maintenance[2]. Typically 70% [3] of software development budgets are spent on maintenance. Any measures ortools which can assist maintainers in their role by speeding up the rate at which changes can be made,or by enabling maintainers to make more well-informed decisions on code changes, can thus make animportant contribution.

Software maintenance was originally classified by Swanson in 1976 into three types [4]:

∗Correspondence to: Sue Black, Centre for Systems and Software Engineering, School of Computing, Information Systems andMathematics, South Bank University, 103 Borough Road, London SE1 0AA, U.K.†E-mail: [email protected]

Contract/grant sponsor: British Telecommunications Laboratories, Martlesham Heath, U.K.

Received 31 August 2000Copyright 2001 John Wiley & Sons, Ltd. Revised 4 June 2001

264 S. BLACK

• corrective maintenance—to address processing, performance or implementation failure;• adaptive maintenance—to address change in the data or processing environments; and• perfective maintenance—to address processing efficiency, performance enhancement and

maintainability.

The classification was redefined by the IEEE glossary [5] in 1990 to include

• preventive maintenance—to address activities aimed at increasing the system’s maintainability.

The IEEE redefinition causes some confusion because maintainability is included under bothpreventive and perfective maintenance. A discussion on the exact definition of preventive maintenanceis given in [6]. In [7] a software maintenance ontology is proposed: a specification of theconceptualization of software maintenance. It changes the focus from maintenance requirementsto maintenance activities. Maintenance types are divided into corrections that correct a defect,and enhancements that implement a change to the system which changes the behaviour orimplementation of the system. Enhancements can be further subdivided into those that change existingrequirements (perfective), those that add new system requirements (adaptive), and those that changethe implementation but not the requirements (preventive). Chapin et al. [8] give a detailed analysisand clarification of all the definitions used in software maintenance. They propose an evidence-basedclassification which divides software maintenance into four clusters: support interface, documentation,software properties and business rules. These clusters are then subdivided into type of maintenanceactivity associated which each cluster, for example, groomative, preventive, performance and adaptiveare types of maintenance in the software properties cluster.

As all types of maintenance involve making changes to source code, ripple effect can be used tohelp maintainers with all the above types of maintenance by highlighting modules which may causeproblems during the maintenance process. Ripple effect can show the maintainer how great the effectof a change will be on the rest of the program or system. It can highlight modules with high rippleeffect as possible problem modules, which may be especially useful in preventative maintenance. It canshow the impact in terms of increased ripple effect during perfective and adaptive maintenance wherethe functionality of a program is being modified or its environment has changed. During correctivemaintenance it may be helpful to look at the ripple effect of the changed program and its modulesbefore and after a change in order to ascertain whether the change has increased, or perhaps decreased,the stability of the program. Maintenance is difficult [9] because it is not clear where modificationshave to be made, or what the impact will be on the rest of the source code once those changes aremade; the ripple effect can certainly be used to help maintainers with the latter. Ripple effect, alongwith many other metrics, is not the answer to all maintainer’s problems, but used as part of a suite ofmetrics it can give maintainers useful information to make their task easier.

There is a strong link between software maintenance and ripple effect. Computation of ripple effectand logical stability of a module are based on a subset of maintenance activity: a change to a singlevariable definition within a module [2]. Regardless of the complexity of the maintenance activity beingperformed, maintenance fundamentally consists of modifications to variables within modules of code.Logical stability is computed based on the impact of these modifications. It can be used to predict theimpact of primitive modifications on a program, and so be used to compute the logical stability ofmodules with respect to those primitive modifications. The effect of modification may not be local tothe module but may affect other parts of the program; that is, there is a ripple effect from the location

Copyright 2001 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. 2001; 13:263–279

COMPUTING RIPPLE EFFECT FOR SOFTWARE MAINTENANCE 265

Determining

maintenance

objectives

Revalidation

Accounting for

ripple effect

Generating

maintenance

proposals

Understanding

program

Phase 1

Phase 2

Phase 3

Phase 4

Figure 1. A methodology for software maintenance [2].

of the modification to other parts of the program that are affected by the modification. If the stabilityof a program is poor, then the impact of any modification is large, hence the maintenance cost will behigh, and reliability may also suffer.

Several software maintenance models have been proposed in the past. Boehm’s model [10] consistsof three major phases:

• understanding the software;• modifying the software;• revalidating the software.

These are fundamental activities of the software maintenance process. With Yau’s model, AMethodology for Software Maintenance [2], impact analysis is introduced into the lifecycle. The modelconsists of four phases, and includes analysis and monitoring of the impact of change at phase threeaccounting for ripple effect (see Figure 1). The aims of the model are to assist in achieving cost-effective software maintenance and the development of easily maintainable software. Phase one isanalysing the program in order to understand it. The complexity of the program, the documentationand the self-descriptiveness of the program contribute to the ease of understanding of the program.


266 S. BLACK

Activity

Control

OutputInput

Corrective

Adaptive

Perfective

Legend

Schedule

Constraints

Objectives

ResourcesManage

Software

Maintenance

Analyse

Software

Change Impact

Understand

Software

Under Change

Implement

Maintenance

Change

Account

for Ripple

Effect

(Re)test

Affected

Software

Change Request

Existing System

Impact/Scope

Traceability

Complexity

Volume

Documentation

Self-Descriptiveness

Adaptability StabilityTestability

Verifiability

Completeness

New

System

Figure 2. SADT diagram of software maintenance activities.

Phase two is generating a particular maintenance proposal to accomplish the implementation of themaintenance objective. The third phase is accounting for all of the ripple effect as a consequenceof program modifications. The effect may not be local to the modification but may also affect otherportions of the program. The main attribute affecting the ripple effect as a consequence of a programmodification is the stability of a program, that is, the resistance to the amplification of changes in aprogram. In the fourth phase the modified program is tested to ensure that it has at least the samereliability as before.

The Pfleeger and Bohner model, SADT Diagram of Software Maintenance Activities [1] (seeFigure 2) has six phases, the main difference from Yau’s model being that it includes analyse softwarechange impact at phase two, i.e. much earlier in the lifecycle. The feedback paths in the SADT modelindicate attributes that must be measured; the results are then assessed by management before the nextactivity is undertaken. The metrics act as a controlling mechanism in the progression from existingsystem and change requests to new system. Manage software maintenance controls the sequence ofactivities by receiving feedback and determining the next appropriate action. Analyse software changeimpact evaluates the effects of a proposed change; it determines if change can be made withoutperturbing the rest of the software. Understand software under change involves source code andrelated analysis, i.e. of documentation to understand the system and the proposed change. Implementmaintenance change generates the proposed change. Account for ripple effect analyses the propagationof changes to other modules as a result of the change just implemented. The modifications are testedto meet new requirements and the overall system is subject to regression testing in the last phase: Testaffected software.



The aim of this paper is to describe and explain our reformulation of Yau and Collofello’s ripple-effect algorithm and its validity within the software maintenance process. To increase the speed ofcomputation our approximated algorithm does not take account of control flow within modules. Severalattempts have been made to produce fast, accurate ripple-effect measures automatically; we show howour approximation algorithm helps to overcome this. Our Ripple Effect and Stability Tool (REST)which uses our approximated algorithm to compute ripple effect for C programs is described. ElevenC programs are used in an initial investigation into whether our approximated algorithm can replaceYau and Collofello’s original algorithm for the purpose of automatic computation of ripple effect. Inour effort to automatically produce ripple effect which is close as possible to Yau and Collofello’swe have used four versions of the McCabe complexity factor used in the original algorithm: Control,Original, Loops and Branches. By introducing several variations of McCabe’s complexity measure weaim to show that we can counteract the effect that our approximation will have on the final ripple-effectmeasure. Yau and Collofello’s ripple effect and our reformulated ripple effect are computed for elevenprograms. The Pearson correlation coefficient for the Yau and Collofello algorithm and our algorithmusing the Original version of the McCabe complexity factor shows a high correlation.

In this section an introduction to ripple effect and logical stability research has been given alongwith an overview of the part that ripple effect plays in the software maintenance lifecycle. The restof this paper is organized as follows: the work of Yau and Collofello which underpins this researchis given in Section 2. Section 3 looks at ripple effect in detail; intramodule and intermodule changepropagation are introduced and explained, along with an explanation of the recasting of Yau andCollofello’s ripple effect as a matrix problem. Section 4 discusses previous attempts to produce rippleeffect automatically, and introduces the REST. In Section 4 we also explain our approximated algorithmfor computing ripple effect. Our recent research results are described in Section 5. We show thecorrelation of results gained using several versions of our reformulation with Yau and Collofello’smeasure. McCabe’s cyclomatic complexity measure is also discussed as it forms part of both theoriginal and the reformulated algorithms. Section 6 concludes the paper by summarizing the results,their implications and the limitations of our research.

2. BACKGROUND

The term ripple effect was first used in a paper by Haney [11] to describe the way that a change inone module would necessitate a change in any other module. He used a technique called ‘moduleconnection analysis’ which applied matrix algebra to estimate the total number of changes needed tostabilize a system. Myers [12] used matrices to quantify matrix independence. A complete dependencematrix was formulated describing dependencies between modules within a system and then used topredict the stability of the system. Soong [13] used the joint probability of connection of all elementswithin a system to produce a program stability measure. All of the aforementioned methods usematrices to measure the probability of a change to a variable or module affecting another variableor module. Yau and Collofello’s ripple effect uses ideas from Haney, Myers and Soong’s work, buttheir ripple-effect measure is not a measure of probability.

When Yau et al. first proposed their ripple-effect analysis technique in 1978 [14] they saw it as acomplexity measure which could be used during software maintenance to evaluate and compare variousprogram modifications to source code. Computation of ripple effect involved using error-flow analysiswhere all program variable definitions involved in an initial modification represented primary error


268 S. BLACK

sources from which inconsistency could propagate to other program areas. Identification of affectedareas could then be made by internally tracking each primary error source and its respective secondaryerror sources within the module to a point of exit. At each point of exit a determination was madeas to which error sources propagated across module boundaries. Those that did became primary errorsources within the relevant modules. Propagation continued until no new error sources were created.

An algorithm for computing design stability was presented in [15] which facilitated computation ofstability based solely on design information. It was proposed that a design stability measure would bemore useful than previous stability measures because it could be used at a much earlier stage in thesoftware lifecycle, before any code was produced, thus potentially saving time and money. Some of thedetailed algorithms used in the computation of logical ripple-effect analysis throughout the softwarelifecycle are given in [16]. The software maintenance process is further developed in [17] which gives amethodology for software maintenance using graph-rewriting rules. In [18] a method is proposed thatidentifies the side-effects that can be introduced to modules from system modification. The authorsclassify the relationships existing between components into both potential and actual relationships, thelatter being a subset of the former.

In [2] a software maintenance process is identified which includes accounting for the ripple effectas one of its phases (see Figure 1). A logical stability measure is proposed which uses a measure ofthe ripple effect in a module to predict the stability of a module or program: the resistance to thepotential ripple effect that a program would have if modified. If additional information is availableconcerning the type of maintenance to be performed, and therefore the modules most likely to bemodified, the probabilities of each module being used can be taken into account and the measuresadjusted accordingly. Taken on its own the logical stability measure could be misleading: a largeprogram with only one module will have no ripple effect between modules but maintainability willprobably be poor. Used in conjunction with other measures the logical stability measure can be usedto compare alternative versions of a program or to locate modules with poor stability perhaps with aview to reengineering. In general, the smaller the ripple effect or greater the logical stability measurethe more stable the program.

Computing the ripple effect for a small program manually is possible, if tedious, but for largeprograms it is completely infeasible. It is therefore desirable to automate this process to some extent.Even when automated, computation of ripple effect can be time consuming. Yau and Chang [19] givean example of a two-thousand-line Pascal program’s stability measure taking thirteen hours of CPUtime to compute. They also present an algorithm which can compute ripple effect faster than this butwhich treats modules as black boxes, thus not taking into account information from inside modules ofcode.

We have taken Yau and Collofello’s algorithm and reformulated it using matrix arithmetic. Ouraim is to exploit the clarity imparted by this mathematical formulation. In particular, it highlightsopportunities for approximation which significantly simplify the computation.

3. RIPPLE EFFECT AND LOGICAL STABILITY

This section is an explanation of the recasting of Yau and Collofello’s ripple-effect metric as a matrixproblem. Their original formulation is at times difficult to understand; its recasting as a matrix problemhas clarified the actual operations involved, thus making automation more straightforward.



1. a = d ;

2. d = a ;

3. return d ;

m1

x = m1 ();

Intramodule change propagation

Intermodule change propagation

Key

m3

Figure 3. Intramodule and intermodule change propagation.

The computation of ripple effect is based on the effect that a change to the value of a single variablewill have on the rest of a program. Matrices Vm and Zm are concerned with representing intramodulechange propagation, that is, propagation from one variable to another within a module. Given thepiece of code in Figure 3 a change to the value of d in (1) will affect the value of a in (1), which willsubsequently propagate to a in line (2). The value of a will then propagate to d in line (2) and then tod in line (3).

Matrix Vm represents the starting points for intramodule change propagation through a module.These starting points are called ‘definitions’ by Yau and Collofello and can be any of the following:

(1) the variable is defined in an assignment statement;(2) the variable is assigned a value which is read as input;(3) the variable is an input parameter to module m;(4) the variable is an output parameter from a called module; or(5) the variable is a global variable.

Each variable definition is uniquely defined in Vm; so, if the same variable is defined twice within amodule, then Vm contains a unique entry for each definition.

In matrix Vm variable occurrences which satisfy any of the above conditions are denoted by ‘1’ andthose which do not by ‘0’. Matrix Vm for the code in our example (where a is global) is therefore

Vm =( ad

1 du1 dd

2 au2 du

3

1 0 1 1 0)

(1)

The notation xdi (respectively, xu

i ) denotes a definition (respectively, use) of variable x at line i. Forexample, ad

1 means variable a is defined in line 1 and du2 means variable d is used in line 2. Note that

au2 is considered a definition because it is global. A 0–1 matrix Zm can be produced to show which


270 S. BLACK

variables’ values will propagate to other variables within module m. The rows and columns of Zm

represent each individual occurrence of a variable. Thus for the code in Figure 3 we get the matrix

Zm =

ad1 du

1 dd2 au

2 du3

ad1 1 0 1 1 1

du1 1 1 1 1 1

dd2 0 0 1 0 1

au2 0 0 1 1 1

du3 0 0 0 0 1

(2)

We observe that Zm is reflexive and transitive; every variable occurrence is assumed to propagate toitself, and if v1 propagates to v2 and v2 propagates to v3 then v1 also propagates to v3. In graph theoryterms we conclude that Zm represents the reachability matrix of some graph, an idea which we pursuein some detail in Section 4.

Propagation from one module to another is called intermodule change propagation. A change to avariable can propagate to other modules if the variable occurrence is

(1) a global variable,(2) an input parameter to a called module, or(3) an output parameter of module m.

Looking at the code in Figure 3 we can see that d clearly propagates to any module calling m1. Ifa is global then its occurrence could cause propagation to any other modules. Suppose that the abovecode constituting module m1 is called by a module m3, that a is global and that modules m2 and m3use a. We can represent the propagation of these variables using an n × 3 matrix Xm,

Xm =

m1 m2 m3

ad1 0 1 1

du1 0 0 0

dd2 0 0 0

au2 0 0 0

du3 0 0 1

(3)

We now observe that the intermodule change propagation of all variable occurrences in m is theboolean product of Zm and Xm:

ZXm =

1 0 1 1 11 1 1 1 10 0 1 0 10 0 1 1 10 0 0 0 1

0 1 10 0 00 0 00 0 00 0 1

=

0 1 10 1 10 0 10 0 10 0 0

(4)



The boolean product of Vm and ZXm shows how many variable definitions may propagate to eachmodule from module m:

VZXm = (1 0 1 1 0

)

0 1 10 1 10 0 10 0 10 0 0

(0 1 3

)(5)

In this instance we can see from matrix VZXm that there are 0 propagations to module m1, 1 tomodule m2 and 3 to m3.

A complexity measure is factored into the computation by Yau and Collofello so that the complexityof modification of a variable definition is taken into account. Matrix C, a 1 × m matrix, representsMcCabe’s cyclomatic complexity [20] for the modules in our code (the values for m2 and m3 havebeen chosen at random):

C =

m1 1

m2 1

m3 1

(6)

The product of VZXm and C is

(0 1 3

)

111

= 4 (7)

This number represents the complexity-weighted total variable definition propagation for modulem1. Dividing by the number of variable definitions in module m1, |Vm1 | we get the mean complexity-weighted variable definition propagation per variable definition in module m1. In our example |Vm1 | =3, and the logical ripple effect for module m1 is defined to be 4

3 = 1.33. The logical stability measurefor module m1 is defined to be its reciprocal, i.e. 3

4 = 0.75. The formula for computing Logical RippleEffect for a Program (LREP) is

LREP = 1

n

n∑m=1

Vm · Zm · Xm · C|Vm| (8)

where m is the module and n equals the number of modules.

4. COMPUTING RIPPLE EFFECT AUTOMATICALLY

Yau and Chang [19] found that techniques for performing ripple-effect analysis were taking too muchcomputation time to be practicable for large programs. They therefore presented a new algorithm whichthey put forward as being much faster at computing logical stability than previous versions. Theycompared the processor time for computation of logical stability for six programs along with anotherversion by Hsieh [21]. Yau and Chang’s algorithm does not include information from the intramodulephase as they felt that disregarding this information simplified the problem and also simulated the


272 S. BLACK

environment of the program design phase. The logical stability and the speed of its calculation forPascal programs between 684 and 1744 lines long were compared. Program 1 was 684 lines and 21procedures long, it had almost no parameters and all its interprocedural information changes were madeby global variables. Program 6 was 1115 lines long, including 31 procedures. Both algorithms foundprogram 1 to be very unstable, the correlation between the two algorithms results for this program werehigh 0.94. The correlation between the logical stability measure for program six was low 0.16. It couldbe concluded from this that the new algorithm’s results only correlated well with the old algorithm’sresults when the program in question was very unstable and had a lot of global variables. This couldbe a problem with the rejection of intramodular information when calculating logical stability. Yauand Chang’s algorithm only looks at the information flow between global variables as there are nolocal variables at the design stage. They had therefore improved the situation regarding problems withcomputation time but only for a limited version of the logical stability measure. Their algorithm wasmuch faster than Hsieh’s but the logical stability results only correlated with the much slower, moreaccurate version if the program was not very stable, i.e. it had a lot of global variables and thereforehigh ripple effect. Their approach of getting feedback at design level meant that steps could be taken tomake programs more stable or highlight specific problems from an early stage, but there is a trade-offin that the information gained was not as accurate as information derived from code level measurement.

Joiner et al. used ripple effect analysis along with dependence analysis and program slicing toproduce a Data-centred Program Understanding Tool Environment (DPUTE) [22]. DPUTE can beused during software maintenance to enhance program understanding and to facilitate restructuringand reengineering of programs. Problems were encountered during the automation of the intramodulechange propagation stage of ripple-effect analysis. Program slicing [23] was used to computeintramodule change propagation. They found that otherwise ripple-effect analysis could only besemi-automatic [24]. There was also the problem that automatic analysis of ripple effect led to theidentification of many spurious side effects. The work of Canfora et al. [18] investigates this problem.

SEMIT [25] is another ripple-effect analysis tool which is based on both semantic and syntacticinformation. It creates a syntax and semantics database for software which directly links the program’ssemantic information with its syntax. The aim of SEMIT is to provide maintainers with up-to-datesemantic information directly linked to the source code under observation and then express the meaningof that code, thus improving program understanding. Several algorithms for calculating the rippleeffect are presented in [16]; they are presented as not suitable for practical use, thus are for theoreticaluse only. These algorithms provide ripple-effect calculation for sub-sections of ripple-effect analysiscomputation, e.g. intermodule propagation.

REST has been developed at the Centre for Systems and Software Engineering to automateproduction of ripple-effect measures for C code. REST comprises four separate software modulesas detailed in Figure 4. The parser written in PCCTS [26] with embedded C, uses preprocessed Ccode to produce output files containing data on each variable within each module. This is used byLISTFUNS to produce function-list information and the matrix C. The output from LISTFUNS isused by FUNMAT alongside the parser output to produce the matrices Vm, Am, D′

m and Xm. Thesematrices are then manipulated by RIPPLE to compute ripple effect and logical stability measures foreach module within the program, and for the program as a whole. There follows a description of howmatrices Am and D′

m are produced and how they are used to produce matrix Zm.Matrix Zm, representing intramodule change propagation, can be constructed automatically as the

reachability matrix of a graph whose adjacency matrix is Bm. Bm, representing direct intramodule



Figure 4. Components of the REST software.

change, is obtained as the sum of two matrices: a definition/use association matrix Dm and anassignment matrix Am.

In the sample code (Figure 3) a takes its value from d in line 1, thus d, a is an assignment pair. Theassignment matrix Am is an n × n matrix,

Am =

ad1 du

1 dd2 au

2 du3

ad1 0 0 0 0 0

du1 1 0 0 0 0

dd2 0 0 0 0 0

au2 0 0 1 0 0

du3 0 0 0 0 0

(9)

which shows that a1 is assigned the value of d1, and d2 is assigned the value of a2. The definition/useassociation matrix Dm is an n × n matrix,

Dm =

ad1 du

1 dd2 au

2 du3

ad1 0 0 0 1 0

du1 0 0 0 0 0

dd2 0 0 0 0 1

au2 0 0 0 0 0

du3 0 0 0 0 0

(10)

which shows that a1 is associated with a2, and d2 is associated with d3. Replacing this matrix witha matrix of all possible definition/use pairs greatly simplifies the computation, since to produce acompletely accurate version of matrix Dm control-flow information has to be taken into account. Using


274 S. BLACK

the code in Figure 3 again, if we include all possible definition/use pairs, matrix D′m will be

D′m =

ad1 du

1 dd2 au

2 du3

ad1 0 0 0 1 0

du1 0 0 0 0 0

dd2 0 1 0 0 1

au2 0 0 0 0 0

du3 0 0 0 0 0

(11)

There is an invalid entry in this version of matrix Dm due to our approximation which causesthe approximated ripple effect to be greater than or equal to Yau and Collofello’s ripple effect; thisis discussed in more detail in Section 5. We will use the simplified version, matrix D′

m, for thecomputation of ripple effect. The sum of Am and D′

m give us the matrix Bm, representing directintramodule change propagation, i.e. all definition/use and assignment pairings:

Bm =

ad1 du

1 dd2 au

2 du3

ad1 0 0 0 1 0

du1 1 0 0 0 0

dd2 0 1 0 0 1

au2 0 0 1 0 0

du3 0 0 0 0 0

(12)

We can now find the reachability matrix for Bm, namely Zm using

Zm = I + B + B2 + · · · + Bn (13)

where n equals the number of variables, in this case five. This gives us

Zm =

ad1 du

1 dd2 au

2 du3

ad1 1 1 1 1 1

du1 1 1 1 1 1

dd2 1 1 1 1 1

au2 1 1 1 1 1

du3 0 0 0 0 1

(14)

Zm represents all possible intramodule change propagation within module m1, e.g. the 1 on row 1column 5 represents propagation from ad

1 through au2 and dd

2 to du3 .

5. ANALYSIS AND RESULTS

Using McCabe’s measure via matrix C introduces complexity into the computation of ripple effect.McCabe introduced his cyclomatic complexity measure as a graph-theoretic technique which might



identify software modules that would be difficult to test or maintain. He could see no obviousrelationship between length of module and module complexity, so suggested that the number of basicpaths through a module may be a better indicator. He took the idea of the cyclomatic number of a graph,calculation of which was seen as tedious for programmers and simplified it to ‘number of predicatesplus one’. This was further simplified for convenience to number of conditions plus one and namedcyclomatic complexity; so to calculate cyclomatic complexity of a module or program we need simplyto count the number of conditions, i.e loops and branches, and add one.

McCabe’s measure has been widely criticized [27–31] since its proposal in 1976. This may be partlydue to the fact that it is not appropriate for all third generation languages; McCabe was originallythinking in terms of a complexity measure for Fortran source code. Fenton and Pfleeger [32] point outthat cyclomatic complexity measurement is objective and useful when counting linearly independentpaths, but do not think that it gives an accurate picture of program complexity. They recommend thatit be used as an indicator of testability and maintainability, and perhaps as part of a quality assuranceexercise.

McCabe suggests that a cyclomatic complexity measure of greater than 10 in any one modulemeans problems, whereas Grady [33], who discovered a relationship between the number of updatesto software required and the cyclomatic complexity measure, thought that this limit should be set at15. During the quality assurance project on the software used for the channel tunnel, modules wererejected if their cyclomatic complexity measure was greater than 20, or the number of statements wasgreater than 50 [34]. There are many negative criticisms of McCabe’s measure which leaves one withthe impression that the cyclomatic complexity measure is not a good measure of general complexity.It must be taken into consideration, however, that it was one of the first software measures put forwardand, as such, we cannot expect it to have taken into account all of the advances that have taken placein measurement in the last 20 or so years. It does give us some information about the complexity ofsoftware, and can thus be used as long its limitations are appreciated.

Shepperd [35] criticized cyclomatic complexity for being based on poor theoretical foundations andfor being outperformed as a measure of general complexity by lines of code. Our concern is somewhatdifferent because our approximation will be least accurate when there are many branches but few loops.Thus we have a different reason for preferring a variant of McCabe; some alternatives are compared inthe next section.

5.1. McCabe variations and results

The ripple effect of several programs using four different versions of matrix C were used. Thefirst four programs are versions 1–4 of a mutation-testing software tool; the size ranges from20 modules/549 lines of code for the first version to 44 modules/923 lines of code for the final version.The fifth program genscrip is part of the Linux operating system [36]; the sixth program quadratic isgiven as an example program in Yau and Collofello’s 1980 paper [2]. The seventh and eighth programsare a part of the REST software and a small test program for the REST software respectively. conv andgensym are part of a half million lines of code (LOC) telephone switching system TXE4 [37] and bits isfrom the PCCTS parser software [26]. The programs vary widely in terms of length and also in purposeand style; this is intentional. It takes a long time to calculate the ripple effect by hand, producing Yauand Collofello’s ripple-effect measures for these programs to compare with the automated versions isa long process. We decided to use as varied a selection of representative programs as possible.


276 S. BLACK

Figure 5. Correlation for sample programs.

Table I. Correlation results: Y&C ripple effect vs. REST ripple effect.

No. Program Y&C Control Original Loops Branches LOC

1 allas1 18.27 2.15 17.34 6.01 13.48 4252 allas2 18.27 2.55 19.07 7.40 14.22 4773 allas3 20.29 3.77 21.77 10.77 14.77 7254 allas4 19.27 3.67 21.05 10.52 14.20 6595 genscrip 4.00 2.29 4.00 2.29 2.29 416 quadratic 2.47 1.52 2.47 1.52 2.47 417 funmat5 4.67 1.28 4.71 3.53 2.56 2458 arippletest 3.00 3.20 3.20 3.20 3.20 349 conv 8.43 0.89 8.43 3.86 5.46 54

10 gensym 16.91 1.89 17.61 5.22 14.28 23711 bits 5.15 0.96 5.77 1.97 4.76 371

Corr. 1.000 0.590 0.996 0.907 0.989 0.817

By introducing several variations of McCabe’s complexity measure we are trying to show that we cancounteract the effect that our approximation of matrix Dm will have on the final ripple-effect measure.Points to consider are

(1) code contained in a loop is accurately represented in matrix D′m,

(2) sequential code is less accurately represented than code within a loop, and(3) code containing branching is least accurately represented.



In our effort to produce a ripple-effect measure as close as possible to Yau and Collofellos’ originalmeasure we have used four versions of the complexity factor.

(1) Control McCabe: matrix C contains the value 1 for each module. The control McCabe versionof matrix C when multiplied with the other matrices will produce a measure which has nocomplexity element factored in. This means it can give us a ‘raw’ figure for the ripple effectof any particular module or program.

(2) Original McCabe: all conditions plus one [20]. The original McCabe is just that, the originalcyclomatic complexity measure. Original McCabe counts all conditions: all loops plus allbranches.

(3) Loops McCabe: number of loops plus one; branches are not counted. To counteract the effectthat D′

m will have on the ripple-effect measure we could consider reducing the control version ofmatrix C according to how much sequence and selection is present in the target code. This wouldintroduce further complexity into the computation, and not really be in keeping with the ethos ofcyclomatic complexity. Therefore instead of reducing the complexity factor for sequential andselective code we decided to count only loops and not branches.

(4) Branches McCabe: number of branches plus one; loops are not counted. Similar to the aboveversion but counting branches only and not loops.

The results of the computation for the 12 programs are shown graphically in Figure 5 and in Table I.It can be seen that the Pearson correlation coefficient for Yau and Collofello’s ripple effect is lowestfor the Control McCabe version of matrix C, where matrix C contains only 1s. This was to be expectedas this is a ripple-effect measure with no control-flow complexity. Correlation coefficient for the loopsand branches McCabe versions are high at 0.907 and 0.989 respectively, but the highest is the OriginalMcCabe with a correlation coefficient of 0.996.

Also shown is the LOC measure, in this instance defined as LOC not including comments orwhite space or declarations, i.e. imperative statements only. The correlation coefficient for Yau andCollofello’s measure with LOC is 0.817; this is included to show that the ripple-effect measure couldnot be replaced by LOC.

6. CONCLUSIONS AND FURTHER WORK

The ripple-effect measure has been identified as valid and necessary within several softwaremaintenance models, particularly the SADT Model [1] and Methodology for Software Maintenance[2]. This research builds on the work of Yau and Collofello and is concerned with the automaticcomputation of ripple-effect measures within a practicable timescale. Previous research in this areahas included automatic computation of ripple effect at design level [19] and a semi-automatic tool tocompute ripple effect [22].

Yau and Collofello’s original ripple-effect algorithm has been reformulated using matrix arithmetic.A tool, REST, has been produced which uses our approximated algorithm to compute ripple effect forC programs. Using the tool, differing versions of the complexity matrix C are used to counteract theeffect of not using control flow information. For the subset of programs for which ripple effect hasbeen computed, the measure produced using the Original McCabe version of matrix C is most highlycorrelated with Yau and Collofello’s original measure. As such we can conclude that, used as part ofour approximation of Yau and Collofello’s ripple effect, it provides the most accurate results for the


278 S. BLACK

programs included in our study. Further investigation now needs to be carried out with more programsso that the results can be applied generally. Programs from the Linux operating system software andother sources are currently being used to further validate the measure.

Most types of maintenance involve making changes to source code; ripple effect can show themaintainer what the effect of that change will be on the rest of the program or system. Maintenanceis difficult because it is not always clear where modifications will have to be made to code, or whatthe impact of any type of change to code may have across a whole system. Ripple effect can show themaintainer what the effect of any change will be on the rest of the program or system. It can highlightmodules with high ripple effect as possible problem modules and show the impact in terms of increasedripple effect where the functionality of a program is being modified or its environment has changed.The ripple-effect measure has been acknowledged as helpful to maintainers, and as such has beenincluded as part of several software maintenance process models which have been described in thispaper.

Future work will focus on making the tool more robust. At present there are limitations to the sizeof functions being processed; currently REST can process program modules each containing up to160 variable occurrences. For REST to be used as a measurement tool in an industrial context, thislimitation must be addressed.

Future work will also include the computation of ripple effect for object-oriented code. This andprevious papers have shown that measuring ripple effect for procedural programs is useful. We aim toinvestigate the validity and applicability of computing ripple effect for the object-oriented paradigm.

ACKNOWLEDGEMENTS

Many thanks to Robin Whitty, David Wigg, Lasitha Leelasena and other members of the CSSE for their helpwith my research. Thanks to Dave Homan and colleagues at Nortel Networks, New Southgate, U.K. for access tothe TXEE telephone switching system. Thanks also to the journal editors and referees for their guidance in thepreparation of this paper.

REFERENCES

1. Pfleeger SL, Bohner SA. A framework for software maintenance metrics. Proceedings Conference on SoftwareMaintenance—1990. IEEE Computer Society Press: Los Alamitos CA, 1990; 320–327.

2. Yau SS, Collofello JS. Some stability measures for software maintenance. IEEE Transactions on Software Engineering1980; SE-6(6):545–552.

3. Bennet KH. An introduction to software maintenance. Information and Software Technology 1990; 12(4):257–264.4. Swanson EB. The dimensions of maintenance. Proceedings 2nd International Conference on Software Engineering. IEEE

Computer Society: Long Beach CA, 1976; 492–497.5. IEEE. Standard Glossary of Engineering Terminology. Institute of Electrical and Electronic Engineers: New York NY,

1990.6. Kajko-Mattsson M, Chapin N, Vehvilainen R. Panel 2: Preventive maintenance! Do we know what it is? Proceedings

International Conference on Software Maintenance. IEEE Computer Society Press: Los Alamitos CA, 2000; 11–19.7. Kitchenham BA, Travassos GH, Mayrhauser Av, Niessink F, Schneidewind NF, Singer J, Takada S, Vehvilainen R, Yang H.

Towards an ontology of software maintenance. Journal of Software Maintenance 1999; 11(6):365–389.8. Chapin N, Hale JE, Khan KM, Ramil JF, Tan W-G. Types of software evolution and software maintenance. Journal of

Software Maintenance and Evolution: Research and Practice 2001; 13(1):3–30.9. van Zuylen HJ (ed.). The REDO Compendium. Wiley: Chichester, U.K., 1993.

10. Boehm B. Software engineering. IEEE Transactions on Computers 1987; 25(12):1226–1242.11. Haney FM. Module connection analysis—a tool for scheduling of software debugging activities. Proceedings of the AFIPS

Fall Joint Computer Conference. AFIPS Press: Reston VA, 1972; 173–179.



12. Myers GJ. A model of program stability. Composite/Structured Design. Van Nostrand Reinhold: New York NY, 1975;137–155.

13. Soong NL. A program stability measure. Proceedings of the 1977 Annual ACM Conference. ACM Press: New York NY,1977; 163–173.

14. Yau SS, Collofello JS, McGregor TM. Ripple effect analysis of software maintenance. Proceedings Computer Softwareand Applications Conference (COMPSAC ’78). IEEE Computer Society Press: Piscataway NJ, 1978; 60–65.

15. Yau SS, Collofello JS. Design stability measures for software maintenance. IEEE Transactions on Software Engineering185; SE-11(9):849–856.

16. Yau SS, Liu S. Some approaches to logical ripple effect analysis. Technical Report SERC-TR-24-F, University of Florida,1988.

17. Yau SS, Nicholl RA, Tsai JJP, Liu S. An integrated lifecycle model for software maintenance. IEEE Transactions onSoftware Engineering 1988; SE-14(8):1128–1144.

18. Canfora G, Di Lucca GA, Tortorella M. Controlling side-effects in maintenance. Proceedings of the 3rd InternationalConference on Achieving Quality in Software. Chapman and Hall: London, U.K., 1996; 89–102.

19. Yau SS, Chang SC. Estimating logical stability in software maintenance. Proceedings Computer Software and ApplicationsConference (COMPSAC ’84). IEEE Computer Society Press: Piscataway NJ, 1984; 109–119.

20. McCabe TJ. A complexity measure. IEEE Transactions on Software Engineering 1976; 2(4):308–320.21. Hsieh CC. An approach to logical ripple effect analysis for software maintenance. PhD Thesis, Department EECS,

Northwestern University, Evanston IL, 1982.22. Joiner JK, Tsai WT. Ripple effect analysis, program slicing and dependence analysis. Technical Report TR 93-84,

University of Minnesota, 1993.23. Weiser M. Program slicing. IEEE Transactions on Software Engineering 1984; SE-10(4):1352–1357.24. Joiner JK, Tsai WT, Chen XP, Subramanian S, Sun J, Gandamaneni H. Data-centered program understanding. Proceedings

International Conference on Software Maintenance. IEEE Computer Society Press: Los Alamitos CA, 1994; 272–281.25. Collofello JS, Wennergrund DA. Ripple effect based on semantic information. Proceedings of the AFIPS Joint Computer

Conference, vol. 56. AFIPS Press: Reston VA, 1987; 675–682.26. Parr TJ. Language translation using PCCTS and C++. http://www.antlr.org/buybook.html [8 July 2001].27. Leach RJ. Software metrics and software maintenance. Journal of Software Maintenance: Research and Practice 1990;

2(2):133–142.28. Prather RE. An axiomatic theory of software complexity measure. The Computer Journal 1984; 27(4):340–347.29. Myers GJ. An extension to the cyclomatic measure of program complexity. ACM SIGPLAN Notices 1977; 18(12):57–59.30. Hansen WJ. Measurement of program complexity by the pair (cyclomatic number, operator count). ACM SIGPLAN Notices

1978; 13(3):29–33.31. Evangelist WM. Relationships among computational, software and intuitive complexity. ACM SIGPLAN Notices 1983;

18(12):57–59.32. Fenton N, Pfleeger SL. Software Metrics: A Rigorous and Practical Approach (2nd edn). Chapman and Hall: London,

1996.33. Grady RB. Successfully applying software metrics. IEEE Computer 1994; 27(9):18–25.34. Bennett PA. Software development for the channel tunnel: A summary. High Integrity Systems 1994; 1(2):213–220.35. Shepperd M. A critique of cyclomatic complexity as a software metric. Software Engineering Journal 1988; 3(2):30–36.36. Peterson R. Linux: The Complete Reference (3rd edn). Osborne McGraw-Hill: Maidenhead, U.K., 1999.37. Homan D. Private correspondence to Sue Black, 1999. Copy available on request from Sue Black.

AUTHOR’S BIOGRAPHY

Sue Black is a Senior Lecturer in the School of Computing, Information Systems andMathematics and Director of the Centre for Systems and Software Engineering at SouthBank University, London. Sue’s research interests include software measurement andsoftware maintenance. She is a member of the British Computer Society. Sue receiveda BSc in Computing Studies in 1993 from South Bank University.


Computing ripple effect for software maintenance

Documents

Transcript of Computing ripple effect for software maintenance