Deriving an approximation algorithm for automatic computation of ripple effect measures

14
Deriving an approximation algorithm for automatic computation of ripple effect measures Sue Black * Department of Information and Software Systems, Harrow School of Computer Science, University of Westminster, Watford Road, Northwick Park, Harrow HA1 3TP, UK Received 25 July 2006; received in revised form 2 July 2007; accepted 17 July 2007 Available online 31 August 2007 Abstract The ripple effect measures impact, or how likely it is that a change to a particular module may cause problems in the rest of a program. It can also be used as an indicator of the complexity of a particular module or program. Central to this paper is a reformulation in terms of matrix arithmetic of the original ripple effect algorithm produced by Yau and Collofello in 1978. The main aim of the reformulation is to clarify the component parts of the algorithm making the calculation more explicit. The reformulated algorithm has been used to imple- ment REST (Ripple Effect and Stability Tool) which produces ripple effect measures for C programs. This paper describes the reformu- lation of Yau and Collofello’s ripple effect algorithm focusing on the computation of matrix Z m which holds intramodule change propagation information. The reformulation of the ripple effect algorithm is validated using fifteen programs which have been grouped by type. Due to the approximation spurious 1s are contained within matrix Z m . It is discussed whether this has an impact on the accuracy of the reformulated algorithm. The conclusion of this research is that the approximated algorithm is valid and as such can replace Yau and Collofello’s original algorithm. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Software measurement; Ripple effect; Matrix algebra 1. Introduction Measurement of ripple effect forms part of an area of fundamental importance to software engineering, that of impact analysis, a type of software measurement. Software measurement as a software engineering discipline has been around now for some thirty years [49]. Its purpose is to provide data that can be used either for assessment of the system in terms of complexity, good structure etc. or pre- diction of, for example, the total cost of a system during the software lifecycle. Typically, it is used for assessment either during the initial development of software, or during maintenance of software at a later date. It can help to show how effective existing practices are and highlight where improvements are needed [30]. A full description of soft- ware measurement and its use is given in [20]. Most software undergoes some change during its life- time; upgrades to software are common as are changes made to amend or adjust the functionality of a piece of software. For example the software used within mobile phones is upgraded over time to make sure that customers’ expectations are met and that particular models of mobile phones can maintain or gain competitive advantage. Soft- ware change impact analysis [12] estimates what will be affected in software if a change is made. This information can then be used for planning, making and tracing the effects of changes before the changes are implemented. Examples of impact analysis include [12]: Using cross referenced listings to see what other parts of a program contain references to a given variable or procedure. 0950-5849/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.infsof.2007.07.008 * Tel.: +44 20 7911 5000x4207. E-mail address: [email protected] www.elsevier.com/locate/infsof Available online at www.sciencedirect.com Information and Software Technology 50 (2008) 723–736

Transcript of Deriving an approximation algorithm for automatic computation of ripple effect measures

Page 1: Deriving an approximation algorithm for automatic computation of ripple effect measures

Available online at www.sciencedirect.com

www.elsevier.com/locate/infsof

Information and Software Technology 50 (2008) 723–736

Deriving an approximation algorithm for automatic computationof ripple effect measures

Sue Black *

Department of Information and Software Systems, Harrow School of Computer Science, University of Westminster,

Watford Road, Northwick Park, Harrow HA1 3TP, UK

Received 25 July 2006; received in revised form 2 July 2007; accepted 17 July 2007Available online 31 August 2007

Abstract

The ripple effect measures impact, or how likely it is that a change to a particular module may cause problems in the rest of a program.It can also be used as an indicator of the complexity of a particular module or program. Central to this paper is a reformulation in termsof matrix arithmetic of the original ripple effect algorithm produced by Yau and Collofello in 1978. The main aim of the reformulation isto clarify the component parts of the algorithm making the calculation more explicit. The reformulated algorithm has been used to imple-ment REST (Ripple Effect and Stability Tool) which produces ripple effect measures for C programs. This paper describes the reformu-lation of Yau and Collofello’s ripple effect algorithm focusing on the computation of matrix Zm which holds intramodule changepropagation information. The reformulation of the ripple effect algorithm is validated using fifteen programs which have been groupedby type. Due to the approximation spurious 1s are contained within matrix Zm. It is discussed whether this has an impact on the accuracyof the reformulated algorithm. The conclusion of this research is that the approximated algorithm is valid and as such can replace Yauand Collofello’s original algorithm.� 2007 Elsevier B.V. All rights reserved.

Keywords: Software measurement; Ripple effect; Matrix algebra

1. Introduction

Measurement of ripple effect forms part of an area offundamental importance to software engineering, that ofimpact analysis, a type of software measurement. Softwaremeasurement as a software engineering discipline has beenaround now for some thirty years [49]. Its purpose is toprovide data that can be used either for assessment of thesystem in terms of complexity, good structure etc. or pre-diction of, for example, the total cost of a system duringthe software lifecycle. Typically, it is used for assessmenteither during the initial development of software, or duringmaintenance of software at a later date. It can help to showhow effective existing practices are and highlight where

0950-5849/$ - see front matter � 2007 Elsevier B.V. All rights reserved.

doi:10.1016/j.infsof.2007.07.008

* Tel.: +44 20 7911 5000x4207.E-mail address: [email protected]

improvements are needed [30]. A full description of soft-ware measurement and its use is given in [20].

Most software undergoes some change during its life-time; upgrades to software are common as are changesmade to amend or adjust the functionality of a piece ofsoftware. For example the software used within mobilephones is upgraded over time to make sure that customers’expectations are met and that particular models of mobilephones can maintain or gain competitive advantage. Soft-ware change impact analysis [12] estimates what will beaffected in software if a change is made. This informationcan then be used for planning, making and tracing theeffects of changes before the changes are implemented.Examples of impact analysis include [12]:

• Using cross referenced listings to see what other parts ofa program contain references to a given variable orprocedure.

Page 2: Deriving an approximation algorithm for automatic computation of ripple effect measures

Fig. 1. A methodology for software maintenance [45].

724 S. Black / Information and Software Technology 50 (2008) 723–736

• Using program slicing [43] to determine the programsubset that can affect the value of a given variable.

• Using traceability relationships to identify software arte-facts associated with a change.

Typically seventy percent of software development bud-gets are spent on software maintenance [4]. Thus, measuresor tools that can speed up the rate at which changes can bemade, or facilitate better informed decisions on codechanges, can make an important contribution. All typesof maintenance involve making changes to source code orits documentation; change impact analysis can show whatthe effect of that change will be on the rest of the programor system. Software maintenance is difficult because it isnot always clear where modifications will have to be madeto code or what the impact of any type of change to codemay have across a whole system. Change impact analysisvia the ripple effect measure has been acknowledged ashelpful during software maintenance [9] and as such hasbeen included as part of several software maintenance pro-cess models. The usefulness of metrics and models in soft-ware maintenance and evolution is described in [15].

In this introduction, ripple effect and change impactanalysis and their use during software maintenance havebeen mentioned, a fuller description is given in Section 2which describes the background to this work, particularlythe history of the ripple effect measure to date. Section 3discusses measuring complexity and shows where rippleeffect fits in. A description of the reformulated ripple effectalgorithm is given in Section 4 with details of intramoduleand intermodule change propagation and the reformula-tion’s component matrices. Differing versions of matrix C

which is used to factor in complexity to the measurementof ripple effect are explained in Section 5 and a validationof the approximated algorithm is provided in Section 6.Section 7 is a description of the programs used in thisstudy. Section 8 summarizes the results of this researchand conclusions and further work are put forward in Sec-tion 9.

2. Background and related work

The ripple effect measures impact, or how likely it is thata change to a particular module may cause problems in therest of a program. It can also be used as an indicator of thecomplexity of a particular module or program. Rippleeffect was one of the earliest metrics concerned with thestructure of a system and how its modules interact [39].The first mention of the term ripple effect in software engi-neering is by Haney in 1972 [23]. He uses a technique called‘module connection analysis’ to estimate the total numberof changes needed to stabilise a system. Myers [34] usesthe joint probability of connection between all elementswithin a system to produce a program stability measure.A matrix is set up to store the weighting of each possibleconnection within a system, then another matrix is derivedestimating the joint probability density for any two states

in the first matrix. The limit probability vector is foundusing these matrices and used to calculate the stability ofthe system. Soong [40] also used joint probability of con-nection to produce a program stability measure. Haney,Myers and Soong’s methods are all measures of probabil-ity, the probability of a change to a variable or moduleaffecting another variable or module. Yau and Collofello’sripple effect uses ideas from this research but their rippleeffect is not a measure of probability.

2.1. Yau and Collofello’s ripple effect

When Yau and Collofello first proposed their rippleeffect analysis technique in 1978 [47] they saw it as a com-plexity measure that could be used during software mainte-nance (see Fig. 1) to evaluate and compare variousmodifications to source code. This work was carried fur-ther in 1980 to produce a logical stability measure that isdefined as [45, p. 547]:

‘‘a measure of the resistance to the expected impact of a

modification to the module on other modules within the

program.’’

In [45] a software maintenance process is identified (seeFig. 1) where accounting for ripple effect is Phase 3. Othersoftware maintenance models that include ripple effect aspart of their lifecycle are detailed in [9]. In the 1980s thegeneral emphasis for software measurement extended fromsource code measurement to measurement of design. Thethinking behind this was that as design measurement givesfeedback earlier in the software lifecycle, problems could beidentified and eliminated or controlled before the sourcecode was actually written, thus saving time and money.

Page 3: Deriving an approximation algorithm for automatic computation of ripple effect measures

S. Black / Information and Software Technology 50 (2008) 723–736 725

Yau and Collofello published a paper applying the sameideas that they had used in producing their code level sta-bility measure [45] to produce a design level stability mea-sure [46]. The design measure analyses the moduleinvocation hierarchy and use of global data referenced ordefined in modules to produce the design stability of a pro-gram. The main difference between code level stability anddesign level stability is that the design stability algorithmdoes not consider intramodule change propagation. It pro-duces a measure of ripple effect between modules withouttaking into account what happens inside them. This pre-supposes that information about parameters passedbetween modules, global variables etc. is already known.Yau and Collofello recommended that their measure beused to compare alternative programs at the design phaseand to identify which portions of the program may causeproblems due to ripple effect during the maintenance phase.It is shown in [8] that the approximated computation ofcode level ripple effect described in this paper is based onmaking a general assumption about intermodule flow. Itmight therefore be seen as sitting mid-way between Yauand Collofello’s original algorithm and their proposal fora design level measure.

2.2. Object-oriented ripple effect

More recently research has focused on using ripple effectto measure object-oriented source code. Elish and Rine [19]present an algorithm for computing ripple effect for object-oriented programs at the design level, i.e. at a more coarsegrained level than the ripple effect presented in this paper.They also investigate to what extent several object-orientedmetrics are correlated with ripple effect measures for threesoftware systems with the purpose of investigating whetherChidamber and Kemerer’s metrics provide a good indica-tion of the stability of classes. Coupling Between Objectclasses (CBO) and Response For a Class (RFC) were foundto be good indicators of logical stability (the reciprocal ofripple effect) of a system. Chaumun et al study the impactof changes across an object-oriented system [16] written inC++ by making one change at a time and studying theresulting impact. Results for the study suggest that itmay be possible to use well chosen, conventional designmetrics as indicators of the changeability of systems. Liand Offut [32] carry out impact analysis for object-orientedprograms with the aim of highlighting modules that need tobe retested when regression testing. They analyse a numberof possible maintenance changes to object-oriented soft-ware, how the changes affect the classes in the system anddescribe algorithms that determine which classes will beaffected by the changes. A more recent paper [31] detailsthe use of a tool that implements these algorithms. Theapplicability of ripple effect to measuring object-orientedcode, particularly C++ is described in [36,5]. Extensionsare proposed to the computation of the ripple effect toaccommodate aspects of the object-oriented paradigm.Kabaili et al [29] define a change impact model for

object-oriented systems and report on an experiment ontwo industrial systems to test the relationship betweenwell-known coupling metrics and changeability. The resultsobtained recommend using coupling as a predictor ofchangeability.

2.3. Automation of ripple effect

Computing the ripple effect for a small program manu-ally may take several hours, computing ripple effect for alarge program manually may take weeks. Accuracy is alsocritical, manual computation of ripple effect measurescould be erroneous. Even when automated, computationof ripple effect can be time consuming. Yau and Chang[44] give an example of a two thousand line Pascal pro-gram’s ripple effect taking thirteen hours of CPU time tocompute. As that particular research was carried out in1984 the computation time should be put into context:PC processors have dramatically improved their speedand capability since then.

The tool REST (Ripple Effect and Stability Tool) [9] hasbeen developed, that computes ripple effect measures auto-matically but which uses an approximation of intramodulechange propagation. Previous attempts at computing rip-ple effect have suffered from slow computation times, there-fore when implementing REST the decision was made notto take control flow into account within source code mod-ules [10]. Automation of ripple effect can take two forms:

• the computation of ripple effect measure for a given pro-gram or

• the tracing of ripple effect on variables through a pro-gram or system.

Tracing of ripple effect through a program starts withone variable occurrence in a particular module and tracesthe impact of that variable upon other variables until theimpact terminates. Tools have been developed for both ofthese categories of ripple effect, REST falls under the firstcategory, i.e. it computes the ripple effect measure. Othertools that produce a ripple effect measure include: a pro-totype tool for ripple effect analysis of Pascal programs[27] which consists of three subsystems: an intramoduleerror flow analyser, an intermodule error flow analyserand a logical ripple effect identification subsystem. Thedevelopers could not identify primary error sources auto-matically, thus some user input was required. Anothertool that produces a ripple effect measure was producedby Chang [14]. It does not consider intramodule informa-tion for computing ripple effect and is thus presented as adesign level ripple effect tool. The approach of gettingfeedback at design level meant that steps could be takento make programs more stable, or highlight specific prob-lems from an early stage. But there is a tradeoff in thatthe information gained is not as accurate as informationderived from code level measurement. Tools that traceripple effect through a system include: DPUTE (Data-

Page 4: Deriving an approximation algorithm for automatic computation of ripple effect measures

726 S. Black / Information and Software Technology 50 (2008) 723–736

centered Program Understanding Tool Environment )developed by Joiner and Tsai [28] that uses ripple effectanalysis along with dependence analysis and program slic-ing. DPUTE can be used during software maintenance ofCOBOL systems to enhance program understanding andto facilitate restructuring and reengineering of programs.Program slicing [43] is used to compute intramodulechange propagation. SEMIT [17] is a ripple effect analysistool that is based on both semantic and syntactic informa-tion. It creates a syntax and semantics database forsoftware which directly links the program’s semanticinformation with its syntax. All possible ripple effect pathsare identified by SEMIT, interaction with an expert main-tainer is then needed to define which are the more proba-ble paths. ChAT is a tool that traces ripple effect forobject-oriented programs [31] and is implemented inC++ and Java. It comprises three components: parser,analyser and viewer. Users specify changes that they wantto make to a program then ChAT calculates the impact ofthe change and displays the affected classes.

REST comprises four separate software modules: Par-ser, Listfuns, Funmat and Ripple. The three modulesinvolved in the actual calculation of ripple effect: Listfuns,Funmat and Ripple took approximately 1 person year tobuild, and in total comprise 3000 lines of code. A detaileddescription of the REST tool and its implementation isgiven in [10]. The Parser was developed separately, firstlyas part of the X-RAY tool [7], and then adapted for usewith the other three modules in REST. X-RAY is a toolthat analyses program structure. The initial aim of RESTwas to produce ripple effect measures as an addition toBritish Telecommunication’s (BT) comprehensive suite ofmeasurement tools the Code Measurement Toolkit(CMT). The CMT [21] is an integrated environment forthe code analysis and maintainability assessment of Cand COBOL code. It was developed after BT carried outan analysis of their software, the result of which indicatedthat it should be possible to predict with seventy to eightypercent accuracy which source code files in a system arelikely to require changing. The CMT also uses X-RAYand QUALMS [2]: a tool that produces control flow graphsand related software measures.

3. Measuring complexity

Complexity as a measure can be split into the followingtypes [20]:

• Computational complexity the complexity of the under-lying problem.

• Algorithmic complexity the complexity of the algorithmwhich has been implemented.

• Structural complexity the complexity of the softwareused to implement the algorithm.

• Cognitive complexity the effort required to understandthe software.

Ripple effect and logical stability are measures of struc-tural complexity. The next section looks at complexitymeasures in more detail, focusing on structural complexity.

3.1. Complexity measures

Early measurement of software complexity focusedentirely on source code with the simplest complexity mea-sure being LOC. In 1983 Basili and Hutchens [3] suggestedthat LOC be used as a baseline or benchmark to which allother complexity metrics be compared, i.e. an effective met-ric should perform better than LOC so LOC should beused as a ‘null hypothesis’ for empirical evaluation. Muchempirical work has shown it to correlate with other metrics[39], most notably McCabe’s cyclomatic complexity whichis discussed in more detail in Section 6. The earliest codemetric based on a coherent model of software complexitywas Halstead’s software science [22]. Early empirical eval-uations produced high correlations between predicted andactual results but later work showed a lower correlation.Bowen [13] found only modest correlation, with softwarescience being outperformed by LOC. According to Shep-perd [39] the most important legacy of software science isthat it attempts to provide a coherent and explicit modelof program complexity as a framework within which tomeasure and make interpretations. Software science is alsoimportant because, since it deals with tokens, it is fairlylanguage independent.

After a concentration on code level measurement forsome years, focus widened to include measurement duringthe earlier stages of the software development lifecycle.Design level metrics can in theory be obtained much earlierin the development of a project thus providing informationthat can be used for more informed resource management.

Structural complexity of software can be broken downinto the following types:

• Data flow structure the way that data flows through aprogram and its behaviour as it interacts with theprogram.

• Control flow structure the order in which programinstructions are executed taking into account whetherthere are any loops or branches.

• Data structure the organisation of the data itself, inde-pendent of the program. Data structure will not be dis-cussed further in this paper (see [20], p. 317).

The ripple effect measure is fundamentally concernedwith the data flow structure of a program. Control flowissues are also relevant to its computation, this is a keyissue in our approximation of ripple effect measurement.

3.2. Measuring data flow structure

As mentioned previously, data flow structure concernsthe way in which data flows around a program. An impor-

Page 5: Deriving an approximation algorithm for automatic computation of ripple effect measures

S. Black / Information and Software Technology 50 (2008) 723–736 727

tant factor to consider when looking at the data flow struc-ture is the scope of variables. In some programming lan-guages, a variable within a program can be local to aprocedure/function or global, i.e. be available for usethroughout the program. Some programming languagesalso allow for recursive definitions of local and global,for example Pascal which allows nesting of procedures[6]. When considering data flow within a program, scopeof variables is important as it can dramatically affect theeffect that a variable on other variables across a program.

As well as considering the scope of a variable data flowneeds to be considered from two different perspectives,namely: intramodule and intermodule data flow. Intramod-ule data flow concerns data flow within a module. A mod-ule is taken here to mean a function or procedure but canbe used at different levels of abstraction, e.g. a programwithin a system or a subsystem within a larger system. Amodule is defined by Yourdon [48] as:

‘‘a contiguous sequence of program statements bounded

by boundary elements, having an aggregate identifier’’.

There are several well-known measures of data flowstructure. These include coupling, cohesion and informa-tion flow. Coupling and cohesion were first proposed in1974 [41] as a measure of design quality. Coupling concernsthe degree of interdependence between modules [48], it isdependent on the type of connections and how complicatedthey are. Thus, coupling is a measure of intermodule dataflow. Cohesion is a measure of the relationship of the ele-ments within a module: ideally a module should performa single function [42]. Yourdon and Constantine proposedseven classes of cohesion which range from:

• Functional – the module performs a single well definedfunction. through to

• Coincidental – the module performs more than one func-tion and they are unrelated.

As modules may exhibit more than one type of cohe-sion they are categorised by the lowest type that theyexhibit. When measuring the coupling and cohesion of asystem it is common to aim for high cohesion and lowcoupling.

Information flow [25] is a measure of the total level ofinformation flow between individual modules and the restof a system. The two fundamental concepts within theinformation flow measure are fan-in and fan-out.

Definition: The fan-in (fan-out) of a procedure is the

number of local flows terminating at (emanating from) that

procedure.

Information flow complexity was defined by Henry andKafura [25] as:

Information flow complexity (M) = length(M) · (fan-

in(M) · fan-out(M))2

This has been refined by Shepperd [38] to produce thefollowing measure (the name is from [20, p. 203]):

Shepperd complexity (M) = (fan-in(M) · fan-out(M))2

which Shepperd claims is an improvement over the originalHenry and Kafura measure because it eliminates the con-trol flow element present in the original. Shepperd disre-gards module length and indirect information flow in hismeasure along with the distinction between local and glo-bal information flow. This is to eliminate the blurring ofinformation and control flow. Shepperd complexity isfound to correlate highly with development time, whereasHenry and Kafura’s information flow has no suchrelationship.

Coupling and information flow relate to intermoduledata flow, cohesion relates to intramodule data flow. Theripple effect measure concerns itself with both intramoduleand intermodule data flow. This will be described andexplained in greater detail in Section 4. The measurementof ripple effect also traditionally involves a control flowcomplexity component namely: McCabe’s cyclomaticcomplexity.

3.3. Measuring control flow structure

Control flow structure is concerned with the order inwhich program instructions are executed, taking intoaccount whether there are any loops or branches. The con-trol flow of a module can be modelled using directedgraphs called flowgraphs. There are many measures thatare associated with control flow structure, the best knownof which is McCabe’s cyclomatic complexity.

Measuring control flow complexity is important herebecause ripple effect has a control flow complexity compo-nent. The original Yau and Collofello algorithm uses McC-abe’s cyclomatic complexity to add a complexity weightingto their ripple effect measure. In our reformulated algo-rithm control flow inside modules (intramodule changepropagation) is approximated and is liable to be exagger-ated, some variations of the complexity component havetherefore been suggested to counteract any effect this mayhave had. The variations are discussed in more detail inSection 5.

4. Description of the approximated algorithm

This section describes the reformulation of the rippleeffect algorithm.

4.1. Intramodule change propagation

The computation of ripple effect is based on the effectthat a change to a single variable will have on the rest ofa program. Given the three lines of code contained in mod-ule m1 shown in Fig. 2: a change to the value of b in (1) willaffect the value of a in (1), that will propagate to a in (2). In(2) a will affect b which will then propagate to b in (3).Propagation of change from one line of code to anotherwithin a module is called intramodule change propagation.Starting points for intramodule change propagation such

Page 6: Deriving an approximation algorithm for automatic computation of ripple effect measures

Fig. 2. Module m1.

728 S. Black / Information and Software Technology 50 (2008) 723–736

as a in line (1) can be thought of as ‘definitions’, the vari-able is being defined or given a value. Propagation thenemanates from the defined variable through the moduleacross the interface variable (those variables through whichpropagation can affect other modules) at the moduleboundary and into other modules, this is known as inter-module change propagation. Yau and Collofello treat ‘def-initions’ as any of the following occurrences:

1. The variable is defined in an assignment statement.2. The variable is assigned a value which is read as input.3. The variable is an input parameter to module m.4. The variable is an output parameter from a called

module.5. The variable is a global variable.

Intuitively only globals on the right-hand side of assign-ments should count. Any variable occurrence on the left-hand side is receiving a value from the variable on theright-hand side of the assignment, thus whether it is globalor not is irrelevant. A 0–1 vector Vm1 can be used to repre-sent the variable definitions in module m1. Variable occur-rences that satisfy any of the above conditions are denotedby ‘1’ and those that do not by ‘0’. The notation xd

i (xui ) will

be used to denote a definition (use) of variable x at line i.For example, ad

1 means variable a is defined in line 1 andau

2 means variable a is used in line 2. Vector Vm1 for thecode in the example in Fig. 2 (where a is assumed global)is therefore:

V m1 ¼ 1 0 1 1 0ð Þad

1 bu1 dd

2 au2 du

3

A 0–1 matrix Zm1 can be produced to show which vari-ables’ values will propagate to other variables within mod-ule m. The rows and columns of Zm1 represent eachindividual occurrence of a variable. Propagation is shownfrom row i to column j. For example the propagation froma in line 2 to d in line 2 is shown at row 4 column 3 and notat row 3 column 4. For the above code the following matrixis derived:

Zm1 ¼

ad1

bu1

dd2

au2

du3

1 0 1 1 1

1 1 1 1 1

0 0 1 0 1

0 0 1 1 1

0 0 0 0 1

0BBBBBB@

1CCCCCCA

ad1 bu

1 dd2 au

2 du3

It is observed that Zm1 is reflexive and transitive; thatis, every variable occurrence is assumed to propagate toitself, and if v1 propagates to v2 and v2 propagates to v3

then v1 also propagates to v3. Zm1 therefore representsthe transitive closure of variables within module m. Ingraph theory terms it is concluded that Zm1 representsthe reachability matrix of some directed graph. This factwill be crucial to the simplification which is describedlater in this section.

4.2. Intermodule change propagation

Propagation from one module to another is called inter-

module change propagation. A change to a variable canpropagate to other modules if:

1. The variable is a global variable.2. The variable is an input parameter to a called module.3. The variable is an output parameter of module m.

If the code in Fig. 2 is part of module m1 then d clearlypropagates to any module calling m1. If a is global thenits occurrence on the left-hand side of the assignment inline (1) will cause propagation to any modules using a.Suppose that the code constituting module m1 is calledby a module m2, that a is global and module m2 uses aand that a further module m3 uses a and d. The (i, j)thentry is 1 iff variable i propagates to module j. Thepropagation of these variables is represented using a fur-ther 0–1 matrix Xm1:

X m1 ¼

ad1

bu1

dd2

au2

du3

0 1 1

0 0 0

0 0 0

0 0 0

0 0 1

0BBBBBB@

1CCCCCCA

m1 m2 m3

Note that there is no propagation from any variableoccurrence to m1, i.e. column 1 is all zeros, because inter-module change propagation involves flow of programchange across a module boundary.

The approximated matrix formulation described herecontinues by specifying various matrix products. Thechoice is made between using the Boolean product (� Æ �)and the standard product (+ Æ ·). This is done to capture,as closely as possible, what Yau and Collofello appear tobe saying in their original algorithm. However, the prod-ucts will be written out in the usual notation (A · B = AB)both to avoid excessive notation and to emphasise the factthat other choices are possible for each product (it wouldbe possible to follow [18] and define a whole ‘minimax alge-bra’ of different versions of Yau and Collofello’salgorithm).

It is now observed that the intermodule change propaga-tion of all variable occurrences in m can be found by find-ing the Boolean product of Zm1 and Xm1 giving:

Page 7: Deriving an approximation algorithm for automatic computation of ripple effect measures

Fig. 3. Assignment and definition/use.

Fig. 4. Assignment and definition/use information held in Matrix B.

S. Black / Information and Software Technology 50 (2008) 723–736 729

Zm1X m1 ¼

1 0 1 1 1

1 1 1 1 1

0 0 1 0 1

0 0 1 1 1

0 0 0 0 1

0BBBBBB@

1CCCCCCA

0 1 1

0 0 0

0 0 0

0 0 0

0 0 1

0BBBBBB@

1CCCCCCA¼

0 1 1

0 1 1

0 0 1

0 0 1

0 0 1

0BBBBBB@

1CCCCCCA

The Boolean product of Zm1 and Xm1 is used to maintainconsistency with intramodule change propagation compu-tation which is also Boolean. The standard matrix productof Vm1 and Zm1Xm1 shows propagation to each modulefrom variable occurrences in module m:

V m1Zm1X m1 ¼ 1 0 1 1 0ð Þ

0 1 1

0 1 1

0 0 1

0 0 1

0 0 1

0BBBBBB@

1CCCCCCA¼ 0 1 3ð Þ

In this instance it can be seen from matrix Vm1Zm1Xm1

there are 0 propagations to module m1, 1 to module m2

and 3 to m3.

4.3. Complexity and logical stability

A complexity measure is factored into the computationby Yau and Collofello so that the complexity of modifica-tion of a variable definition is taken into account. Matrix C

represents McCabe’s cyclomatic complexity [33] for themodule m1 in the code, shown in Fig. 2. The values shownbelow for m2 and m3 have been chosen at random:

C ¼m1

m2

m3

1

1

1

0B@

1CA

The product of Vm1Zm1Xm1 and C is:

V m1Zm1X m1C ¼ 0 1 3ð Þ1

1

1

0B@

1CA ¼ 4

This number represents the complexity-weighted totalvariable definition propagation for module m. The recipro-cal of the number of variable definitions in module m: 1

jV m1jis now multiplied to give the mean complexity-weightedvariable definition propagation per variable definition inmodule m. In the example |Vm| = 3 (number of 1s inVm1), the ripple effect for module m is defined to be:43¼ 1:33. The logical stability measure for module m is

defined to be its reciprocal: 34¼ 0:75.

4.4. Decomposition of matrix Zm

Matrix Zm represents intramodule change propagationwhich is clearly a transitive relation. Thus, as observed ear-lier, Zm is the matrix of a transitive relation and represents

the reachability matrix of some basic relation Bm. To deter-mine Bm is not difficult: change propagates either from theright-hand side of an assignment to the left-hand side, or itpropagates from the definition of a variable to a subse-quent use of that same variable. These two modes of prop-agation are referred to as ‘assignment’ and ‘definition/use’,respectively [24]. If the two are treated as different rela-tions, represented by matrices Am and Dm, respectively,then it follows that Bm = Am + Dm.

Fig. 4 where information flow from one variable occur-rence to another is shown using arrows, variable occur-rence x takes its value from y in line 1, thus x,y is anassignment pair. Information about such pairings is heldin matrix Am. The definition of x in line 1 is used by x inline 2. This is a definition/use association. Informationabout definition/use associations is held in matrix Dm.

The combination of information from assignment anddefinition/use gives us information about the flow of valuesfrom one variable to another within a module. From thisinformation it can now be computed which variables wouldbe affected if any particular variable occurrence were chan-ged. The assignment matrix Am that holds informationabout all assignment pairings for the example code is asfollows:

Am ¼

xd1

yu1

yd2

xu2

0 0 0 0

1 0 0 0

0 0 0 0

0 0 1 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

The definition/use association matrix Dm is as follows:

Dm ¼

xd1

yu1

yd2

xu2

0 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

Page 8: Deriving an approximation algorithm for automatic computation of ripple effect measures

730 S. Black / Information and Software Technology 50 (2008) 723–736

Matrix Am and matrix Dm have all variable occurrences asrows and columns, even though in Am only defined vari-ables are needed as columns and in Dm only defined vari-ables are needed as rows. The sum of these matrices givesmatrix Bm representing direct intramodule change propa-gation. The information now held in matrix Bm is alsoshown in Fig. 4.

Bm ¼

xd1

yu1

yd2

xu2

0 0 0 1

1 0 0 0

0 0 0 0

0 0 1 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

The reachability matrix (equivalent to the transitive clo-sure) for Bm, namely Zm can be found using:

Zm ¼ I _ B _ B2 _ . . . _ Bn

n, number of variable occurrences, in this case four.The reachability matrix shows all possible links between

any variable occurrence and any other variable occurrencewithin the module. From the information now containedwithin matrix Zm any change to a variable occurrencecan be tracked throughout the module and the ramifica-tions of its change calculated.

Zm ¼

xd1

yu1

yd2

xu2

1 0 1 1

1 1 1 1

0 0 1 0

0 0 1 1

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

(1) x := y;

(2) y := x + 1;

If TRUE

(1) x := y;

else

(2) y := x + 1;

end if

sequence selection iteration

loop

(1) x := y;

(2) y := x + 1;

end loop

Fig. 5. Code fragment displaying sequence, selection and iteration.

Fig. 6. False definition/use pairing in matrix D 0.

4.5. Approximated matrix D

To produce matrix Dm (the definition/use matrix) con-trol flow information has to be taken into account. Thethree basic control flow constructs are:

• sequence,• selection or branching,• iteration or looping.

The code fragment used so far in this paper has con-tained only sequential code, code containing loops andbranches also needs to be inspected and how this affectsthe accuracy of Dm. Replacing Dm with a matrix of all pos-

sible definition/use pairs greatly simplifies the computation,this matrix is named D0m. By defining D0m to ignore controlflow information the code is essentially being treated as if itwere contained within a loop, except in the case of one def-inition being killed by another (see for further details,Fig. 7). This means that the more code within a modulethat is contained within loops the closer the match betweenDm and D0m. If all possible definition/use pairs are included,matrix D0m for the example code in Fig. 3 will become:

D0m ¼

xd1

yu1

yd2

xu2

0 0 0 1

0 0 0 0

0 0 0

0 0 0 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

Note that there is one false entry in matrix D0m: yd2 is

paired with yu1, see Fig. 6. Note also that matrix D0m for each

module in Fig. 5 will be identical for all the code fragmentexamples as control flow is not taken into account.

Both Dm (below) and D0m for the code contained in aloop are identical, i.e.

Dm ¼ D0mThis is because all definition/use pairings are true in

both cases.

Dm ¼

xd1

yu1

yd2

xu2

0 0 0 1

0 0 0 0

0 1 0 0

0 0 0 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

Matrix Dm for code that contains selection or branching isaffected as follows. Consider the same source code exampleused above but now containing selection (see Fig. 5). Notethat the if statement does not contain a variable and both if

and end if statements are not given line numbers to enableeasy comparison with previous matrices. Both lines 1 and 2cannot be true at the same time. Dm for this version of thecode is:

Page 9: Deriving an approximation algorithm for automatic computation of ripple effect measures

S. Black / Information and Software Technology 50 (2008) 723–736 731

Dm ¼

xd1

yu1

yd2

xu2

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0BBB@

1CCCA

xd1 yu

1 yd2 xu

2

Sequential code also needs to be inspected where the def-inition of a variable occurrence is followed by another def-inition of that variable, see Fig. 7. The value of x in line 1will be killed when control flow reaches line 2 and x isassigned the value 2. On examination of the programs usedin this study killed definitions do not occur. As such theyare not considered further in this paper.

As matrix D 0 includes all definition/use pairs regardlessof control flow there is an extra ‘1’ highlighted below atrow 1, column 5.

D0m ¼

xd1

yu1

xd2

zd3

xu3

0 0 0 0

0 0 0 0 0

0 0 0 0 1

0 0 0 0 0

0 0 0 0 0

0BBBBBB@

1CCCCCCA

xd1 yu

1 xd2 zd

3 xu3

It has been demonstrated in this section that the use of ma-trix D0m in computing ripple effect can cause false relationswhen dealing with sequential and selective code. In generalif FI (iterative), FS (sequential) and FB (branching) areidentical code fragments except that FI contains some loopsand FB some branching and U(F) = number of 1s in Dm

then:

UF I P UF S P UF B

Intuitively, the approximation will therefore be leastaccurate when there are many branches and no loops.

Using McCabe’s measure via matrix C introduces com-plexity into the computation of ripple effect. Yau and Col-lofello include McCabe’s cyclomatic complexity as a partof their algorithm. In the reformulation used thus far thecyclomatic complexity measure for each module is therow value in matrix C:

C ¼m1

m2

m3

1

1

1

0B@

1CA

Fig. 7. Code fragment highlighting one definition being killed by another.

Shepperd [37] criticised cyclomatic complexity for beingbased on poor theoretical foundation and being outper-formed as a measure of general complexity by lines of code.The concern here is somewhat different because it is ex-pected that the approximation will be least accurate whenthere are many branches and few loops (see previous sec-tion). Thus, there is a different reason for preferring a var-iant of McCabe. As mentioned previously McCabe’scyclomatic complexity measure is a count of all conditionswithin the module plus one. Four versions of matrix C arepresented in response to the question of accuracy of matrixD0m.

5. Versions of the McCabe complexity factor

In Yau and Collofello’s [45] ripple effect algorithmMcCabe’s cyclomatic complexity is used: ‘‘. . .[to] providemore realistic measures of the amount of effort requiredto analyze the program to ensure that inconsistencies arenot introduced’’ [45, p. 548]. Several variations of McCa-be’s complexity measure are presented here to show thatthe effect that the approximation of matrix Dm will haveon the final ripple effect measure can be counteracted.Points to consider are: code contained in a loop is accu-rately represented in matrix D0m, sequential code is lessaccurately represented than code within a loop and codecontaining branching is least accurately represented. It isinteresting to note that the highest correlation with Yauand Collofello’s original ripple effect in the sample pro-grams was found using the Original version of McCabe.The four versions of the complexity factor are as follows:

(1) Control McCabe – matrix C contains the value 1 foreach module. The control McCabe version of matrixC when multiplied with the other matrices will pro-duce a measure that has no complexity element fac-tored in. This means it can give a baseline figure forthe ripple effect of any particular module or program.

(2) Original McCabe – all conditions plus one.(3) Loops McCabe – number of loops plus one, branches

are not counted.(4) Branches McCabe – number of branches plus one,

loops are not counted.

6. Validation of the approximation algorithm

This section provides validation of the approximatedversion of the ripple effect measure by computing Yauand Collofello’s ripple effect for several programs and thencomparing this with the approximated ripple effect for theprograms. Four versions of the C complexity matrix: Ori-

ginal, Control, Loops and Branches are used to computediffering versions of approximated ripple effect. The Pear-son correlation coefficient is used to show the relationshipbetween the Yau and Collofello ripple effect and thefour versions of the approximated ripple effect. Fifteen

Page 10: Deriving an approximation algorithm for automatic computation of ripple effect measures

732 S. Black / Information and Software Technology 50 (2008) 723–736

programs in total were used in this experiment, their detailsare given below.

Name

Type From LOC Mods RE

funmat5.c

Fileprocessing

REST tool

245 19 Yes

conv.c

Stringprocessing

TXEEa

53 3 Yes

quadratic.c

Simplecalulation

Yau andCollofellob

23

3 Yes

genscrip.c

Stringprocessing

Linuxoperatingsystem

41

3 Yes

tamilize.c

Stringprocessing

Tamil toTeXconverterc

363

14 Yes

gensym.c

Stringprocessing

TXEEa

237 3 No

error.c

Stringprocessing

TXEEa

173 12 No

bits.c

Creation/output ofbit sets

PCCTS d

371 20 No

arippletest.c

Testingprogram

REST tool

34 5 No

fcomp.c

Flowgraphcompiler

QUALMSe

282 10 No

metrics.c

Flowgraphcompiler

QUALMSe

722 25 No

allas1.c

Mutationtestingtool

CSSEf

425 20 No

allas2.c

Mutationtestingtool

CSSEf

477 27 No

allas3.c

Mutationtestingtool

CSSEf

725 44 No

allas4.c

Mutationtestingtool

CSSEf

659 42 No

a A telephone switching system produced by Nortel for British Telecom[26].

b C version of the original Fortran program used to demonstrate rippleeffect [45].

c University of Washington Humanities and Arts Center, USA, 1990.d (Purdue Compiler Construction Tool Set) [35] from Purdue University,

Indiana, USA.e [1].f Centre for Systems and Software Engineering, London South Bank

University, UK.

Four of the programs: allas1.c–allas4.c were used for aseparate study [11] to look at the product-process link.The programs discussed in this paper are divided intotwo groups: those programs where the Yau and Collofelloripple effect is identical to the approximated ripple effect,

and those where it is not. Killed definitions (see Fig. 7)are not discussed as they do not arise in any of the pro-grams used in this study. This section concludes with a dis-cussion of spurious 1s and their effect on the approximatedversion of ripple effect.

6.1. Programs with identical ripple effect

Five of the fifteen programs under study producedexactly the same ripple effect for the approximated versionas for the Yau and Collofello version. This section exam-ines the programs and explains why they produce the sameresults. The five programs under investigation in this sec-tion are: funmat5.c, conv.c, quadratic.c, genscrip.c andtamilize.c. Funmat5.c is a file processing program in whichmost of the code within any given module is containedwithin a loop. Conv.c, tamilize.c and genscrip.c arestring/character processing programs with many of theirmodules containing large loops also. The other modulescontain several spurious 1s but because these 1s do notaffect the variable occurrences that propagate via inter-module propagation there is no change in the ripple effect.Program quadratic.c contains no iteration, only sequenceand selection. There are no spurious definition/use pairingsand thus no spurious 1s. Spurious 1s are extra 1s thatappear in matrix Zm due to the approximated algorithmnot being as accurate as the original algorithm.

6.1.1. Reasons for identical ripple effect

Overall the reasons for these programs not differing intheir ripple effect for the approximated version and theYau and Collofello version seems to be the followingfactors:

(1) All code within the module/program is within a loop,e.g.

loop

1 a=b;

2 b=c;

end loop

b is used in line 1 and then defined in line 2, as thecode is within a loop b in line 2 can affect b in line1. So Matrix D 0 for this code would be correct.

(2) Code is sequential and there are no variables usedbefore being defined, e.g.

1 a=c;

2 b=a;

a is defined in line 1 before being used in line 2, thusthere is no false definition/use pairing, so Matrix D 0

for this code would also be correct.

(3) The spurious 1s represent variable occurrences that

do not propagate to any of the interface variables,those variables through which propagation can affectother modules within the program.

Table 1 shows ripple effect measures for all the programsdescribed in this section. Pearson’s correlation coefficient

Page 11: Deriving an approximation algorithm for automatic computation of ripple effect measures

Table 1Description and correlation for programs with identical ripple effect

Program Y & C Control Original Loops Branches No.Mods

LOC

funmat5 4.7 1.3 4.7 3.5 2.6 19 245conv 8.4 0.9 8.4 3.9 5.5 3 53quadratic 2.5 1.5 2.5 1.5 2.5 3 23genscrip 4.0 2.3 4.0 2.3 2.3 3 41tamilize 9.2 0.9 9.2 1.7 8.3 14 363

Correlation 1.00 �0.74 1.00 0.29 0.92 0.24 0.56

S. Black / Information and Software Technology 50 (2008) 723–736 733

for Yau and Collofello’s ripple effect and the approximatedripple effect using the variants of the complexity factor(Section 4.3) are shown. Correlation with Number of mod-

ules and Lines Of Code are shown as a benchmark. If thecorrelation of either of these with Yau and Collofello’s rip-ple effect were high, it would be easier to produce and usethem instead. It can be seen from the table that correlationbetween Yau and Collofello’s ripple effect and Original is 1.This is to be expected as the reason these programs havebeen grouped together is because the ripple effect is thesame for the Original version. Correlation with Branches

is high and Loops is low because these programs containlots of selective code and not much iteration. Correlationwith Control is negative which suggests that in computingripple effect a complexity factor of some sort does need tobe included.

6.2. Programs with differing ripple effect

For six of the eleven programs in this study Yau andCollofello’s ripple effect is different to the approximatedripple effect. A summary is given of the reasons for the dif-ference. In particular, modules where the Yau and Collo-fello ripple effect is different from the approximatedripple effect are inspected to see if there are particular typesof programs where the approximated ripple effect is moreor less accurate (see Table 2).

6.2.1. Reasons for differing ripple effectFrom the investigation of the six programs it is apparent

that the main cause of difference between Yau and Collo-fello’s ripple effect and the approximated ripple effect isthat all definitions of variable occurrences within a module

Table 2Correlation for programs with differing ripple effect

Program Y & C Control Original Loops Branches No.Mods

LOC

gensym 16.9 1.9 17.6 5.2 14.3 12 237error 2.6 0.7 3.4 1.3 2.8 12 173bits 5.7 1.0 5.8 2.0 4.8 20 371arippletest 3.0 3.2 3.2 3.2 3.2 5 34fcomp 12.3 1.7 12.4 5.2 8.9 10 282metrics 12.4 0.9 12.9 1.9 12.0 25 722

Correlation 1.00 �0.05 1.00 0.67 0.98 0.26 0.44

are linked to all uses of variable occurrences, i.e. there isbackward propagation. As most modules within programscontain sequential code which is not contained within aselection or iteration and variable occurrences are oftenused before being defined there are likely to be many mod-ules with spurious 1s. In the programs considered so farthough, this does not seem to cause too much of a differ-ence between the Yau and Collofello ripple effect and theapproximated ripple effect (using the Original variant ofMcCabe). Another factor causing this difference is thenumber of spurious 1s in matrix D 0 when variable occur-rences are defined and then used in mutually exclusive partsof the modules, e.g. when a variable occurrence is definedin the if part of a selection and then used in the else partof a selection. This can also occur within the different casesof a switch statement.

7. Discussion of all eleven programs

7.1. Spurious 1s

The amount of spurious 1s contained within each pro-gram module was investigated in order to see if the amountof spurious 1s affect the accuracy of the approximated rip-ple effect algorithm. Fig. 8 shows all programs in decreas-ing order of spurious 1s as a percentage of the correctamount of 1s for the module. The Pearson correlation coef-ficient between % spurious 1s and % difference is low at0.34. The mean percentage of spurious 1s in these elevenprograms is 24.27. The program with the highest numberof spurious 1s is metrics.c at 73%. Correlation betweenthe Yau and Collofello ripple effect and the approximatedripple effect for metrics.c is 0.996. This seems surprisinglyhigh given the number of spurious 1s within matrix D 0.On investigation it is found that 618 of these spurious 1sare contained within one module, metric_main, module 1.They are all due to backward referencing, many variablesin this module are repeatedly used therefore there are alot of definition/use pairings which are not correct. Mostof the ripple effect in module 1 is due to intermodulechange propagation caused by the global variable s_struc-

tured which is used several times in module 13 is_s_struc-

tured. This does not affect the ripple effect for thismodule because the spurious 1s removed are spurious def-inition/use pairings for variables that do not propagateoutside of the module. None of these affect the ripple effectmeasure for this module, the Yau and Collofello measure isthe same as the approximated measure. Table 3 shows thecorrelation for all programs.

Gensym.c has the second highest amount of spurious 1s312 of 350 spurious 1s are contained within its largest mod-ule main which is module 12. Global file pointers infile andoutfile cause most of the ripple for this module. Theapproximated ripple effect (17.7) is slightly different tothe Yau and Collofello ripple effect (16.6) for this moduledue to some spurious definition/use pairing between theseglobal variable occurrences. Arippletest has the lowest

Page 12: Deriving an approximation algorithm for automatic computation of ripple effect measures

program name

% o

f S

pu

rio

us 1

s

% of Spurious 1s % Difference Y&C and Approximated Ripple Effect

Fig. 8. Programs in decreasing order of % of spurious 1s.

Table 3Correlation for all eleven programs

Program Y & C All 1s Original Loops Branches Modules LOC

genscrip 4 2.29 4 2.29 2.29 3 41funmat5 4.67 1.28 4.71 3.53 2.56 19 245arippletest 3 3.2 3.2 3.2 3.2 5 34gensym 16.91 1.89 17.61 5.22 14.28 12 237bits 5.68 0.96 5.77 1.97 4.76 20 371conv 8.43 0.89 8.43 3.86 5.46 3 54quadratic 2.47 1.52 2.47 1.52 2.48 3 23error 2.6 0.72 3.35 1.27 2.8 12 173tamilize 9.21 0.87 9.21 1.74 8.34 14 363fcomp 12.34 1.73 12.38 5.2 8.91 10 282metrics 12.42 0.94 12.86 1.86 11.95 25 722

Correlation 1.000 �0.129 0.998 0.616 0.968 0.334 0.534

Table 4Correlation and description for all programs

Program Y & C Control Original Loops Branches Modules LOC

funmat5 4.7 1.3 4.7 3.5 2.6 19 245conv 8.4 0.9 8.4 3.9 5.5 3 54quadratic 2.5 1.5 2.5 1.5 2.5 3 23genscrip 4.0 2.3 4.0 2.3 2.3 3 41tamilize 9.2 0.9 9.2 1.7 8.3 14 363

gensym 16.9 1.9 17.6 5.2 14.3 12 237error 2.6 0.7 3.4 1.3 2.8 12 173bits 5.7 1.0 5.8 2.0 4.8 20 371arippletest 3.0 3.2 3.2 3.2 3.2 5 34fcomp 12.3 1.7 12.4 5.2 8.9 10 282metrics 12.4 0.9 12.9 1.9 12.0 25 722

allas1 15.2 2.2 17.3 6.0 13.5 20 425allas2 18.3 2.6 19.1 7.4 14.2 27 477allas3 20.3 3.8 21.8 10.8 14.8 44 725allas4 19.3 3.7 21.1 10.5 14.2 42 659

Correlation 1.00 0.53 1.00 0.85 0.98 0.75 0.77

734 S. Black / Information and Software Technology 50 (2008) 723–736

correlation coefficient: 0.89 and 14% spurious 1s whichputs it below the mean amount of spurious 1s: 24.2%.There is low correlation between the amount of spurious1s contained within a program and the accuracy of theapproximated ripple effect for that program. Therefore, itcan be concluded that there is no obvious link betweenthe two. Of course if there are no spurious 1s then theapproximated ripple effect will be identical to the Yauand Collofello ripple effect.

7.2. McCabe variants and their correlation

The Pearson correlation coefficient is used to see whichof the four complexity factors (Control, Original, Loops

or Branches) used as part of the approximated ripple effectmeasure correlate most highly with the Yau and Collofelloripple effect. Correlation is found to be highest using theOriginal variant of the McCabe measure (Section 5). Thecorrelation is surprisingly high (0.9974), see Table 4 whichgives correlation details for all programs used in this study.Correlation for the Branches variant is also high, but lowerfor Loops. The correlation for Control is the lowest whichshows that a complexity factor does need to be included

in the calculation of ripple effect. It seems that the approx-imated version using Original McCabe, whilst not yieldingexactly the same ripple effect as Yau and Collofello, doesproduce a valid alternative. The spurious 1s produced bynot looking at control flow seem not to have much effecton the final result, thus a counterbalance to the approxima-tion is not necessary.

8. Summary

One of the aims of this paper was to put forward andvalidate approximated ripple effect as a replacement forYau and Collofello’s original ripple effect measure. It hasbeen shown that the approximated measure is highly corre-lated with the original ripple effect and as such can replaceit. Eleven programs have been used to correlate the twomeasures: five of these programs had identical Yau andCollofello and approximated ripple effect, six had differingripple effect and were examined to see what caused the

Page 13: Deriving an approximation algorithm for automatic computation of ripple effect measures

S. Black / Information and Software Technology 50 (2008) 723–736 735

difference in the two measures. The number of spurious 1sin each program was studied to see if there is a relationshipbetween spurious 1s and the accuracy of the approxima-tion. As programs with high amounts of spurious 1s hadvery high correlation between Yau and Collofello’s andthe approximated ripple effect there seems to be no obviouslink. Pearson correlation coefficient for all versions of thecomplexity factor were calculated, correlation was highestat 0.9974 for the Original variant of McCabe’s cyclomaticcomplexity. It can be concluded that the approximatedversion using Original McCabe, whilst not yielding exactlythe same ripple effect as Yau and Collofello, does producea valid alternative.

9. Conclusions and further work

In this paper, impact analysis and ripple effect have beendescribed and their importance shown in the areas of soft-ware measurement and software maintenance. Backgroundwork in ripple effect measurement has been detailed to aidcomprehension of the research described here. Two funda-mental ideas in the computation of ripple effect measure-ment: intramodule and intermodule change propagationare explained and details are given of how they are calcu-lated with example code fragments to clarify their meaning.Central to this paper is a reexpression in terms of matrixarithmetic of what Yau and Collofello meant in their origi-nal ripple effect algorithm. The intramodule change propa-gation matrix Zm has been introduced and a descriptiongiven of how it is produced from matrices Am and Dm.The approximated version of matrix Dm, namely matrixD0m, has been described with examples illustrating the effectthat control flow has on its accuracy. Matrix D0m is moreaccurate when code contains loops, less accurate if it con-tains only sequential or selective code; reasons for thisare given. Matrix C introduces a complexity factor to thealgorithm.

Four versions of the complexity matrix C were intro-duced in this paper as an attempt to compensate for theramifications of not including control flow when comput-ing matrix D automatically. The approximated algorithmusing the original version of McCabe’s cyclomatic complex-ity was found to produce the closest match to Yau and Col-lofello’s ripple effect algorithm.

The REST (Ripple Effect and Stability Tool) has beenproduced to support the approach. Further work is cur-rently underway to make REST more useful in practice.The research described in this paper needs to be validatedfor REST to be a useful tool that is used in an industrialsetting. As REST is a prototype tool it also needs to bemade more robust allowing it to compute ripple effect forall C source code, it can currently handle only a subset ofprograms.

Research is underway to produce a ripple effect algo-rithm to facilitate automatic computation of ripple effectfor object-oriented software. Recent research has producedan algorithm for computing design level ripple effect for

object-oriented software but code level measurement ofobject-oriented ripple effect has not yet been automated.Research is also underway into the applicability of rippleeffect measurement for the aspect oriented paradigm.

References

[1] L. Bache, R. Leelasena, Qualms – A Tool for Control Flow Analysisand Measurement, South Bank Polytechnic, London, UK, 1991,May, 9 pp..

[2] R. Bache, L. Leelasena, QUALMS – User guide, CSSE/QUG/DOC/OwUG/1.0a, CSSE, South Bank University, London SE1 0AA, UK,1990.

[3] V.R. Basili, D.H. Hutchens, An empirical study of a syntacticcomplexity family, IEEE Transactions on Software Engineering 9 (6)(1983) 664–672.

[4] K.H. Bennett, An introduction to software maintenance, Informationand Software Technology 12 (4) (1990) 257–264.

[5] H.Z. Bilal, S.E. Black, Computing ripple effect for object orientedsoftware, Quantitative Approaches in Object-Oriented SoftwareEngineering (QAOOSE) workshop (July 3rd 2006).

[6] S. Black, F.H. Clark, Measuring the ripple effect of Pascal programs,in: R. Dumke, A. Abran (Eds.), New Approaches in SoftwareMeasurement, Springer-Verlag, New York, LLC, 2000, pp. 161–171.

[7] S. Black, J.D. Wigg, X-RAY: A multilanguage, industrial strengthtool, in: 9th International Workshop on Software Measurement,1999.

[8] S.E. Black, Computation of ripple effects for software, Ph.D. thesis,School of Computing, Information Systems and Mathematics, SouthBank University, London, UK, September 2001, 124 pp.

[9] S.E. Black, Computing ripple effect for software maintenance,Software Maintenance: Research and Practice 13 (4) (2001).

[10] S.E. Black, Automating ripple effect measurement, in: 5th WorldConference on Systemics, Cybernetics and Informatics (22–25th July,2001).

[11] S.E. Black, Is ripple effect intuitive? A pilot study, Innovations inSystems and Software Engineering: A NASA Journal (2006).

[12] S.A. Bohner, R.S. Arnold, Software Change Impact Analysis, IEEEComputer Society Press, Los Alamitos, CA, 1996.

[13] J.P. Bowen, Are current approaches sufficient for measuring softwarequality? Proceedings Software Quality Assurance Workshop (1978)148–155.

[14] S.C. Chang, A unified and efficient approach for logical ripple effectanalysis, Ph.D. thesis, Department of EECS, Northwestern Univer-sity, Evanston, Illinois, June 1984, 94 pp.

[15] N. Chapin, Usefulness of metrics and models in software maintenanceand evolution, IEEE Conference on software maintenance, San Jose,CA, WESS position paper, 2000.

[16] M.A. Chaumun, H. Kabaili, R.K. Keller, F. Lustman, A changeimpact model for changeability assessment in objectoriented softwaresystems, in: Software Maintenance and Reengineering, 1999. Pro-ceedings of the Third European Conference on, 1999, pp. 130–138.

[17] J.S. Collofello, D.A. Wennergrund, Ripple effect based on semanticinformation, Proceedings AFIPS Joint Computer Conference 56(1987) 675–682.

[18] R. Cunningham-Green, Minimax Algebra, vol. 166, Springer-Verlag,1979.

[19] M.O. Elish, D. Rine, Investigation of metrics for object-orienteddesign logical stability, in: Software Maintenance and Reengineering,2003. Proceedings of the Seventh European Conference on, 2003, pp.193–200.

[20] N. Fenton, S.L. Pfleeger, Software Metrics: A Rigorous and PracticalApproach, Chapman and Hall, London, 1996.

[21] R. Hall, S. Lineham, Using metrics to improve software maintenance,BT Technology Journal 15 (3) (1997) 123–129.

[22] M.H. Halstead, Elements of Software Science, Elsevier North-Holland, New York, 1977.

Page 14: Deriving an approximation algorithm for automatic computation of ripple effect measures

736 S. Black / Information and Software Technology 50 (2008) 723–736

[23] F.M. Haney, Module connection analysis – a tool for scheduling ofsoftware debugging activities, in: Proceedings Fall joint ComputerConference, 1972, pp. 173–179.

[24] Mary Jean Harrold, Mary Lou Soffa, Efficient computation ofinterprocedural definition–use chains, ACM Transactions on Pro-gramming Languages and Systems 16 (2) (1994) 175–204.

[25] S. Henry, D. Kafura, Software structure metrics based on informa-tion flow, IEEE Transactions on Software Engineering SE-7 (5)(1981) 510–518.

[26] D. Homan, 10 years of software maintenance or crawling through themire!, in: Workshop on Empirical Studies of Software Maintenance,September 1999.

[27] C.C. Hsieh, An approach to logical ripple effect analysis for softwaremaintenance, Ph.D. thesis, Department of EECS, NorthwesternUniversity, Evanston, Illinois, June 1982, 206 pp.

[28] J.K. Joiner, W.T. Tsai, Ripple effect analysis, program slicing anddependence analysis, TR 93-84, University of Minnesota technicalreport, 1993.

[29] H. Kabaili, R.K. Keller, F. Lustman, Assessing object-orientedsoftware changeability with design metrics, in: IASTED InternationalConference on Software Engineering, 2005.

[30] D. Kelly, T. Shepard, A little knowledge about software, IEEESoftware 21 (2) (2004) 46–48.

[31] M. Lee, A.J. Offutt, R.T. Alexander, Algorithmic analysis of theimpacts of changes to object-oriented software, in: Technology ofObject-Oriented Languages and Systems, 2000. TOOLS 34.Proceedings of the 34th International Conference on, 2000, pp.61–70.

[32] L. Li, A.J. Offutt, Algorithmic analysis of the impacts of changes toobject-oriented software, in: Software Maintenance 1996, Proceedingsof the International Conference on, 1996, pp. 171–184.

[33] T.J. McCabe, A complexity measure, IEEE Transactions on SoftwareEngineering 2 (4) (1976) 308–320.

[34] G.J. Myers, A model of program stability, Van Nostrand ReinholdCompany, NY, 1980 (Chapter 10, pp. 137–155).

[35] T.J. Parr, Language translation using PCCTS and C++, AutomataPublishing Company, San Jose, CA, USA, 1996.

[36] P.E. Rosner, S.E. Black, Measuring ripple effect for the objectoriented paradigm, in: IASTED International Conference on Soft-ware Engineering, 2005.

[37] M. Shepperd, A critique of cyclomatic complexity as a softwaremetric, Software Engineering Journal 3 (2) (1988) 30–36.

[38] M. Shepperd, Design metrics: an empirical analysis, SoftwareEngineering Journal 5 (1) (1990) 3–10.

[39] M. Shepperd, Software Engineering Metrics Volume I: Measures andValidations, McGraw-Hill International, London, UK, 1993.

[40] N.L. Soong, A program stability measure, in: Proceedings 1977Annual ACM conference, 1977, pp. 163–173.

[41] W. Stevens, G. Myers, L. Constantine, Structured design, IBMSystems Journal 13 (2) (1974) 89–129.

[42] D.A. Troy, S.H. Zweben, Measuring the quality of structured design,Journal of Systems and Software 2 (1981) 113–120.

[43] M. Weiser, Program slicing, IEEE Transactions on Software Engi-neering SE-10 (4) (1984) 1352–1357.

[44] S.S. Yau, S.C. Chang, Estimating logical stability in softwaremaintenance, in: Proceedings COMPSAC ’84, 1984, pp. 109–119.

[45] S.S. Yau, J.S. Collofello, Some stability measures for softwaremaintenance, IEEE Transactions on Software Engineering SE-6 (6)(1980) 545–552.

[46] S.S. Yau, J.S. Collofello, Design stability measures for softwaremaintenance, IEEE Transactions on Software Engineering SE-11 (9)(1985) 849–856.

[47] S.S. Yau, J.S. Collofello, T.M. McGregor, Ripple effect analysis ofsoftware maintenance, in: Proceedings COMPSAC ’78, 1978, pp. 60–65.

[48] E. Yourdon, Structured Design, Prentice-Hall, Englewood Cliffs, NJ,1979.

[49] H. Zuse, Software measurement: research and practice, in: Dumke/Abran (Eds.,) Deutscher UniversitAats Verlag, Wiesbaden, Germany,1998, pp. 3–37.