*2fflfllflflfnlflllf - apps.dtic.mil · ad-a.96 452 maryland univ college park dept of computer...
Transcript of *2fflfllflflfnlflllf - apps.dtic.mil · ad-a.96 452 maryland univ college park dept of computer...
AD-A.96 452 MARYLAND UNIV COLLEGE PARK DEPT OF COMPUTER SCIENCE F/B 9/2AN EXPERIMENTAL INVESTIGATION OF COMPUTER PROGRAM DEVELOPMENT A-ETC(U)DEC 79 R W REITER AFOSR-77-3181
UNCLASSIFIED TR-853 AFOSR-TR-81-0214*2fflfllflflfnlflllfIIEIIIIEEIIIIEEIEEEIIEIIIIIUllllllllulllEIIIEEEIIIIIEEIIIIIIIIIIIIIE
AFosR-Th- 8 1-0214
LEV. .F.VE
V COMPUTER SCIENCE
TECHNICAL REPORT SERIES
RTICMAR 1 7181
E
UNIVERSITY OF MARYLANDCOLLEGE PARK, MARYLAND
20742
Approved for plbi elaSO;
81 1ibutio u0limit 03
/1
Technical Report TR-853 December 1979
An Experimental Investigation of
Computer Program Development Approaches
and Computer Programming Metrics*
by
Robert William Reiter, Jr.
AIR FCRCE C,'FVICE 0 S( [!-;T]FIC RESEARCH (AFSC)
Tz t . - ie.ed and is, , ' , ,,. A, AiR 190-12 (7b).
D .,t iL'. : :L ; :l '. e .
A. D.
Techoical infc.-ration Officer
Dissertation submitted to the Faculty of the Graduate School
of the University of Maryland in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
1979
*Research supported in part by the Air Force Office of ScientificResearch Grant AFOSR-77-3181. Computer time supported in partthrough the facilities of the Computer Science Center of theUniversity of Maryland.
* ~ SE uRITY SFICATIOI OF THIS IPArCF ( ' l. Itro ntHr,,)
Ri" L" D I,'I' R-) I[' Vi I (,NI .' M(J9 *PORT DOCUMENTATION PAGE )PA I, (-( :T~~.4' IR -- 2 GOVT ACCESI1ON NO. 3 -EI IPIEtT'S CATALGr, NUV~fiLRI _ ,1 : /)( /&L. ___/__
4 T1LE (1-d Sohrtil) 5 TYPE OF REPORT a tImYFc I" "1- FR
~N £XPE ITAL ;14VESTIGATION OF 4PUMER - / '" ROGRAM EVELMDIENT ,PPROACHES AND 4CXIPU''ER
RCOGRAMTING W7PCSaI-I____________
7. AUTHOR(.) 9 C'NT' A-T OR GRANT N JME'
( Robert William/Reiter, J r"-",FOSR-77-381
9. PERFORMING ORGANIZATION NAME AND ADC POK S ., A. A 4A , EL r -- j- .7 3
University of Maryland IDepartment of Mathematics r
College Park, Md. 20742 611 -- ,2304/ A211 CONTROLLING OFFICE NAME AND ADDRESS . L. Rt EPQ
# ' S A
Air Force Office of Scientific Research/NM , Dccec f7 /Bolling AFB, Washington, DC 20332 N'" - E -E .
150 __"74 MOJ'jT tRING AGENC NAME A ADDR SS(tI dtf~er om C',ntr, lrhf I-, T SE'uNIIy C_ -15.. "
UNC ASSIFIEDI -Ci .L-f.I
0,1 T I A - 2,7. .
-, ' SCHED LE
16. DISTRIRJTION STATEMENT (of this Report)
Approved for public release; distribution unlimited.
17. DISTRIBUTION STATEMENT (of rhe ah.tr.ct entled i f [31-k 20, ! dil(enot fo, ).R--j
IS SUPnLEMENTARY NOTES
19, KEY WOC)-S ''tn, nrv se,eI, it ars r d drl I-f/tv n, block or m- r - -
0L .A- - R AC T (('n-1r,' .. . r . idr If . . . .t ... f ," 17t - . r..
7here is a need in the emerging field Of software cngineerin: forempirical study of software development approaches and software mitrics.An experiment has been conducted to coimpare three programing envLromcntsindividual programming under an ad hoc approach, team programning under anad hoc approach, and team prograTming under a disciplined methodology. Thisdisciplined methodology integrates the use of top-down design, process designlanguage, structured programming, codQrading, and chief programer teamorganization. Data was obtained for 4.1A*ni,4iiber of automatable software
DD 1 N, 1473 UNCLASSIFIED
,- ,ECL.R.-Y CLASSII ,N C .
i.. . .
UNCLASSIFIEDSEC HITY CLASSIF ICA TION OF THIS PAGE(V471.I V~ata Fr.fterd)
", metrics characterizing the software development process and thedeveloped software product. The results reveal several statisticallysignificant differences among the progrrmling enviroments on the bais,of the metrics. These results are interpreted as demonstrating theadvantagcs of disciplined team progranming in reducing software dleveloup-ment costs relative to ad hoc approaches and improving software productquality relative to undisciplined team programiiing.
SECURITY CLASSIFICATION OF THIS PAGECW47ion Dora ErIeted)
ABSTRACT
Title of Dissertation: An Experimental Investigation of
Computer Program Development Approaches
and Computer Programming Metrics
Robert William Reiter, Jr., Doctor of Philosophy, 1979
Dissertation directed by: Dr. Victor R. Basili
Associate Professor
Department of Computer Science
There is a need in the emerging field of software engineering
for empirical study of software development approaches and software
metrics. An experiment has been conducted to compare three
programming environments: individual programming under an ad hoc
approach, team programming under an ad hoc approach, and team
programming under a disciplined methodology. This disciplined
methodology integrates the use of top-down design, process design
language, structured programming, code reading, and chief program-
mer team organization. Data was obtained for a large number of
automatable software metrics characterizing the software development
process and the developed software product. The results reveal
several statistically significant differences among the programming
environments on the basis of the metrics. These results are
interpreted as demonstrating the advantages of disciplined team
programming in reducing software development costs relative to ad
hoc approaches and improving software product quality relative to
undisciplined team programming. A~c o o
fii"c-io For
LTTS 7fk&
J/orDist K
ACKNOWLEDGMENTS
This work was supported in part by the Air Force Office
of Scientific Research through grant AFOSR-77-3l81A to the
University of Maryland. Computer time was provided in part
through the facilities of the Computer Science Center of the
University of Maryland.
This work couldi not have been accomplished without the
cooperation and assistance of others. To students who
participated in the experiment, colleagues who offered
helpful suggestions, and faculty who reviewed the work
critically, I am most grateful. Drs. Richard G. Hamlet and
Ben A. Shneiderman critiqued this manuscript thoroughly on
the basis of "programming sense," experimental procedure!
terminology, and writing style. Drs. Marvin V. Zelkowitz
and John D). Gannon imparted a healthy sense of reality and
provided an appropriate measure of stimulation/inspiration
throughout their lengthy service as members of my study
committee.
I am indebted beyond measure, however, to two people
whose professional contribution and personal sacrifice have
continually enriched my work as well as my life. I thank my
advisor, Dr. Victor R. Basili, for his expert guidance and
patient encouragement. I thank my wife, Lowrie Ebbert
Reiter, for her unselfish support and unfailing love.
FF"I.
TAPL6 OF CONTEN4TS
C ha &t e r
i. INTRODUCTION AND OVERVIEe * . . . . . . . . . . . 1
I. BACKGROUND AND RELATED RESEARCH . . . . . . . a . 9
Software evetopment Approaches . . * * . . . 9Software Metrics *. ..... .. 11EmoiricaIEpe rimenlai Study ......... 13
li*. INVESTIGATION SPECIFICS . . . . . . . . . . 17
Surroundins . ... , . ,.,. , 17Exoe rimental Des ian . . .. : 0 : 19ProQramminq Metho otoQies . . . . . . . . . . 24
Data Collection and Reduction . . . . . . . . ?6Programming Aspects and Metrics . . . . . . . 27
i-. GLOSSARY OF PROtRAMING ASPECTS * . . . .. t.. .
We DISCUSSION OF ELABORATIVE METQICS . . . . . . 51
rogram Chanoes . , * . : . . . : . . . . .Cew c Copext c !. 01 a I C . .' .) t e x5tData aindinqs . . . . . . . . . . .Software Science uni ..... . ..
VI. INVESTIGATIVE TECHNIQUE . . . . . . . . . . .. . 75
Step 1. Questions of Interest . , . . . . . . 76Step 2: Research Hyootheses ....... , . 77Step 3: Statistical Model 78Step 4: Statistical Hypotheses F : . : : .. 0Step r: Research Frdmeworiks • . . . . . . 92Stec 6: Experimental Desiqn e . . P . . . 4St~i;7: Collected Data. so ... 5Stepj 6: Statistical Test Procedurs .?.5...
* 9. Statistical Results . . . . . a . PSSteo 10: Statistical Conclusions . c . . ,
: 11: ese, rch Interpretations • • • 02
T f 7 j 1 T S . . . . . . . . . . . 0 0 a ¢q4
r. t a t ~i on . 9 5Imoact Evatuation • ....... 95; . '.ced Diff-rentiation View. . . . . . . . 08A Directionless Vie* 101:nbividual frijol iohts . . . . . . IC2
IN E P F TIv RESULTS . . . . . . . . . . . . . . 1 C7Accordinq to Lasic SuLpositions .. 7According to Proqramming-Aspect Cassif ication
M s e n . . . . . . . . . . . . . 113
IA. SUMMARY AND CONCLUSIONS 1 • . . • • • • . . . . , 129
Aovenai x
i. Statistical bescriotion of Raw Scores . . . . . . 11
keIerences . . . , . . . . o . . . . . . . . . . . . . . 17
iv
LIST OF TABLES
Table
1. Programming Aspects . . . . . . . . . . . a * . . . 30a
2. Statistical Conclusions . . . . . . . . . . . .. . oSa
3. Statistical Impact Evaluation . . . v . . . . 7
4.1 Non-Null Conclusions, for Location Comparisons.arranqed by outcome . . . . . . .98 . . . . . . . 98a
4.2 Non-Nutl Conclusions, for Dispersion Comparisons,arranced by outcome 9.8...... . . . . . .
5.1 Relaxed Differentiation for Location Comparisons 10a
5.2 Relaxed Differentiation for Dispersion Comparisons
6.1 Conclusions for Class I,Effort (Job Steps) . . . . . . . . . . . . . • . • 114a
6.2 Conclusions for Class 11,Errors (P'roqram Chanqes) . . . . . . . . . . . . 116a
6.3 Conclusions for Class III,Gross Size . . . . . . . . . . , . . . . . . . . 117a
6.4 Conclusions for Class IV,Control-Construct Structure * * * * e 9 e * . e * o 120a
6.5 Conclusions for Class V,Data Variable Organization . . . . . . . . . . . a 121a
6.6 Conclusions for Class VI,PackaainQ Structure . . . . . . . . . . . . . . . . 122a
6.7 Conclusions for las$ VII,invocation Organization . . . . . . . . . . . a . . 123a
6.8 Conclusions for Class VIII,Communication via Parameters . . . . . . . . a & . 124a
6.9 Conclusions for Class IX,Communication via Global Variables . . . . . . . . 125a
LIST OF FIGURES
F iQure
1. Frequency Distribution of Cyctomatic Complexity . . 58a
2. Investioative Methodoloqy Schematic * e a . . * * * 76a
3.1 Lattice of Possible Directional Outcomesfor Three-way Comoarisun . . a 0 . * . * a * * a * 83a
3.2 Lattice of Possible Nondirectional Outcomesfor Three-way Comoarison , . . . . a a . . * a 3a
4. Association Chart for Results and Conclusions . . . Q1a
V
CHAPTER I
n the evolution of a systematic body of knowLedte,
there are ceneraILy three phases of validation. The first
phase is the Logical aeveLopment of the theory based on a
set of sound principles. This is followed by the
3aplication of the theory and the gathering of evidence that
the tneory is applicable in practice. This usually involves
so-ie qualitative assessment in the form of case studies.
The final phase is the elpirical ano experimental analysis
of the applied theory in order to further unJerstand its
effects and better Oemonstrate its advantages in a
cotrolLed manner. This usually requires quantitative
measurement of the relevant phenomena.
.uch has been written about methodologies for
develooing computer softdare [Wirth 71; Dahl, Dijkstra
Hoare 72; Jackson 75; Myers 75; Linger, Mills v Witt 79).
Most of these methodologies are based on sound Logical
principles. Case studies have been conoucted to demonstrate
their effectiveness (Baker 75; 3asiLi & Turner 75). Their
a option within production ("real-world") environments has
generally ben successful, having practiced adaptations of
these methodologies, software designers and programmers have
asserteu that they got the job done fastert -ade fewer
errors, or produced a better product. Onfortunately, solid
quantitative evidence that comparatively assesses any
particular methodology is scarce [Shneiderman et al. 77;
"yers 76]. This is due oartialty to the cost and
imoracticality of a valiI experimental setup within a
oroauct ion environment.
Thus the question remains, are measurable benefits
derived from proarammin3 methodologies, with respect to
1l
CHAPTER I
-itner the software development Process or the developeosoftware Prodluct Even if the benefits are real, it is not
clear that they can be quantified and effectively monitored.
0 ft.are development is still too artistic, in the aesthetic
or spontaneous sense. In order to understand it more fully,
maiaoe it more effectively, and adapt it to particular
3polications or situations, software development must become
more scientific, in the engineering and calculateo sense.
vore empirical study, data collection, and experimental
-4nalysis re required to achieve this goal.
This dissertation strives to contribute to software
enlineering research in this vital thiri Phase of
validation. The dissertation reports on an original
research project dealing with three "dimensions" of software
enq ineering:
Software development approaches, i.e., programmingnethodologies and environments for developing software;
Software metrics, i.e., quantifiable aspects of
or,)Lrafing and measurements of software characteristics;-
Empipical/experimentat study, i.e., the collection and
,tistir-4l ,naiysis of empirical data about software
!henomena inrl, dinq controlled psychological
exoerimentation.
The immediate godis of the project were
(a) to investigate the effect of certain programming
methodoloqies ano environments upon software
aevelopment phenomena,
(o) to investigate the behavior of certain quantifiable
programming aspects and software measurements under
'jifferent approaches to software development, anu
(c) to devise and apply an investigative methodology,
founded on established crinciples of expcerimental
2
CHAPTEP I
research, but tailored for application to software
eniineering.
The oroject employed the investigative methodology to
conouct and analyze a controlled experiment with soft.are
development approaches as independent vdriables and software
metrics as dependent variables. In this way, both the
effect of the software aeveLopment approaches and the
behavior of the software metrics were investigated
In regara to software development approaches, the
croject focused on three distinct approaches, or programming
environments: single programmers using an ad hoc approach,
urDramminj teams using 3n ad hoc approacht and programming
teams using a disciplined methodology. These approacnes may
be cnaracterized according to two human-factors issues: the
size of the orogramming "team" deployed and the degree of
methodological discipline employed.
in terms of team size, individual programmers working
aIL:ne were compared to teams of three programmers worKing
tj .ether. In terms of methodological discipline, an ad hoc
.0oroach allowing programmers to develop software without
externally imposed methodological constraints was comparea
to a aisciolineo methodology obliging programmers to fotLow
certain modern programming practices and procedures. This
disciplined methodology consisted of an integrated set of
software development techniques and team organizations
including top-down design, process Cesign language,
structured programming, code reading, and chief programmer
tea1nS.
It snoula be noted that the terms 'methooology' and
,etnodoloyicalV (in reference to software development) are
uspo to connote an intejrated set of development techniques
CHAPTEP I
as .ell as team organizations, rather tnan a particular
technique or organization in isolation. Part of the
philosophy behind the project is the belief that. while
i rticuLar techniques or organizations may generate marginal
"e-efits individually, only a comprehensive ensemble can
ensure significant gains in software aevelopment
Pr3ouctivity and reliability.
in relard to software metrics, the project focused on
tte direct quantification of software development Dhenomena
Ji3 a host of nearly two hundred programming aspects ano
"teasurements. Attention was consciously restricted to
*,etrics exhibiting certain desirable characteristics; all of
tne software metrics examinea in the study are quantitative
(on it least an interval scale [Stevens 46)), objective
(free from inaccuracy due to human subjectivity),
unobtrusive (to those developing the software), and
autonatable (not dependent on human agency for computation).
This large set of programming aspects may be
C.Jcl timrizad on tne oasis of other criteria. Some of the
.::ects pertain to the software development Qro Iess; others,
c Lhe developed sottware po2dut. For example, the number
ct times that source code mooules are compiled during the
development period is a process measure, while the number of
IF statements in the deLivered program source code is a
craduct measure. Some of the aspects are rudimrnentarX, in
that they pertain to very si.mple surface features or lack
theoretical models to motivate intuitive appeal; others are
Ptc9ofaIive, in that they aim at more complicated underlying
features or possess provocative theoretical models. For
examole, the measurements mentioned above are both
rudimentary, while the program changes metric [Dunsmore &
cnnon 77) and the cyclomatic complexity metric [McCaoe 76)
are elaborative.
4
CHAPTEP I
in regard to emoirical/experimental study, the project
coitbined both emcirical data collection and controlleo
Psychological experiment3tion in a Laboratory-like setting.
The project involved extensive observation of forty-
ive programmers developing working software systems,
avera3ing twelve hundred Lines of code each, from scratch
lurin, a five week period. These programmers were divided
into three disjoint groups of "teams," each following one of
the three software development approaches mentioned above.
Vultiple raplications of a specific software oevelopment
task .ere cerformed indeoenuently and concurrently within
each group unuer conditions as otherwise identical as
coss ic I e.
In addition to some subjective qualitative observation
via Questionnaires, interviews, etc., objective quantitative
coserv3tion was achieved by automatically and unobtrusively
monitoring the computer activites of the programming
"ta'dS." For each replication, successive versions of the
softare oeing developed by that "team" were captured in an
historical data bank that recorcied details of the
levelocment process and oroouct. Raw scores for the
scft.are metrics mentioned above were extracted from the
Jata cank and summa.-ized via simple descriptive statistics.
S ecifically, the mean values and standard deviations
ocserved within each group on the various quantifiable
orP;rdmminV aspects constitute the i-emediate results of the
Droject as an emoirical data cotlection effort.
The project followed a rreptannd experimental oesign in
which extraneous factors were held .onstant wherever
possiole, to insure that differences in the software metrics
would be attributable to the aifferent software development
aporoaches. The metrics' raw scores were analyzed using
CHAPTEP I
noparametric inferential statistics to obtain an objective
conclusion for each measureo aspect. As precise statements
.of the statistically significant differences observed among
zhe three programming environments on the basis of the
-.easureu aspects, these objective conclusions constitute the
immiediate results of the project as a controlled experiment.
"-y testing for differences in either the location (expected
value) or the di1Qersion (variability) of the software
"etrics, the experiment adoressed both the expectancy and
-)reaictability of software development phenomena.
The experiment combined elements of both confirmatory
'=i exoloratory data analysis. Some so-called o2Ej.rMtIDr Z
"r~rammin.3 aspects had been earmarked as promising
indicators of important software characteristics in aovance
-f conducting the exoerivient. Hypotheses had been
'Sormulated, on the oasis of the programming environments'
,,.scected effects, regarding the expected objective
-onctusions for these confirmatory aspects. The projectr. cluoed other so-called exp2tga!t2rx programming aspects in
',r-er tO ir.A.stiqate the software development process and
.. . - m ore thnrouf hly.
The project ,as concerned with investigating an entire
rot.ere oevelop'nent project of nontrivial size in a Quasi-
-eatistic setting. The experiment was conducted within an
,c3upfmic environment in a laboratory or proving-grouna
4ashion so that an adequate experimental design could be
Ach.evea ohile simulating a production environment. In this
wdy. the project reached a reasonable compromise between
"toy" experiments, which facilitate etaoorate experimental
desi,.ns but often suffer from artificiality, and
production" experiments, which offer industrial realism but
inCrr orohibitively high costs.
CHAPTER I
The project's oasic premise was that distinctions among
these programming environments exist both in the process and
in tne product. With respect to the ueveLopea software
orZcuct, the oisciolined teac should approximate the
individual programnmer or at Least Lie somewhere between the
individual programmer and the ad hoc team, with regar to
oroauct characteristics (such as number of decisions coded
3nd global data accessibility). This is because the
disciplined methodoLogy should help the team act as a
mentaLLy cohesive unit during the design, coding, and
testing phases. .ith respect to the software development
orocess, the uiscipLined team should have advantages over
both indivicuats and ad hoc teams, displaying superior
oerformance on cost-reLated factors such as computer usage
and number of errors made. This is because of the
discipline itself and because of the ability to use team
menuers as resources for vaLidation.
The study's findings revealed several programming
ch3racteristics for which statisticalL, significant
differences do exist among the groups. The disciplined
te3ms used fewer computer runs and apparently made fewer
errors curing software deveLopment than either the
individual programmers or the ad hoc teams. The individual
programmers and the disciplined teams both produced software
with essentially the same nuMoer of decision statements, out
soltware produced by the ad hoc teams contained greater
numoers of decision statements* For no characteristic was
it concluded that the disciplined methodology impairea the
effectiveness of a programminy team or diminished the
au3Lity of the software orocuct.
The remainder of this oissertation is a comprehensive
reuort on the software engineering research project
introouced above. Chapter iI reviews appropriate background
7
CHAPTEP I
and retdted research tor' published literature. Chapter III
recounts specific details of the experiment itself. Chapter
V riefly describes all of the orogramming aspects and
neasurements, while Chapter V discusses the elaborative ones
in oepth. Chapter VlI depicts the investigative methodology
useJ to plan, execute, and analyze the experiment. Chapters
VII and VIII present the experiment's results, segregated
into objective findings and interpretative discussion,
respectively. Chacter IX summarizes the completed project,
"rqws generaL conclusions regarding its contribution to
software engineering, and mentions possible directions for
continued research in this area.
VI
CHAPTER II
This chapter reviews the general background for this
research project and surveys relateo work published in the
open Literature. For each of the three "dimensions" of
softwiare engineering outlined in Chapter I, specific
instances of research in that area will be mentioned and
loosely characterized, in order to show appropriate
similarities and constrasts with this work. As a catalog of
reLated research, the chapter is intended to be merely
reoresentative, not exhaustive.
There has been considerable concern regarding
orogramming methodologies over the past decade since the
advent of structured programming and the dawning of software
cost consciousness. Software "practitioners" (i.e.,
programmers, designers, systems analys t s, and managers) have
scght oetter ways to channel their energies toward
proJjcing cost-effective, reliable software. ALthougn a
braaj spectrum of concerns--spanning all phases of the
softdare life-cycle and covering the full range of system
size ana performance constraint--could be considered here,
attention has been restricted to methodology for
rogrdmming-in-the-smal1*: designing, implemertinG, and
testing computer programs to solve problems small enough to
be well-understood by a suitably trained individual. In
other worost the focus is on approaches for the kind of
software development that typical programmers/analysts in
tyoical software shops are accustomed to doing.
* 4s used here (and below), the meanings.of the terms;.roramming-in-the-small' and "provramming-in-the-large'
are clear from the context, but they differ slightly fromthe Teanings popularized by Dr. H.D. MiLLS.
9
CHAPTER It
A numher of good ideas on how to develop software,
covering techniques for how to proceed as weLt as
or dnizations for managing people and communicating
inform'tion, have been (or are being) devised, demonstrated,
cerfected, anc accepted into everday practice. Popular
exam;les include the following:
structured orogramming [Dahl, Dijkstra Hoare 7Z;
r'itLs 72; 93asiti & Baker 77; Linger, Mills a witt
79),
stepwise refinement [irth 713,
chief programmer teams (Baker 72; 8aker 75; arooKs 75),
process design language (DL) [Linger, vitts & Witt
793,
top-down design,
funct ional expansion,
oesign/code reading and walk-throughs [Fagan 76),
oata aostractionlencapsulation and information hiding,
iterative enhancement EBasiLi & Turner 75; Turner 76),
the Michael Jackson method [Jackson 75; Hughes 79J, and
composite design (yers 75].
These approaches and their highly touted benefits have
een the subject o' nuch written promotion and verbal
discussion. Indeed, several can boast of mathematical
foundations or format explication to support their
underlying principles or mechanisms; for others, there are
extensive tutorials on how to apply them in oractical
situations; and some have been embodied in programminj
languages or packaged into automated tools. AtL of this
attention, plus the favorable experiences of software
cractitioners, seems to indicate that these software
devetooment approaches do succeed in improving the
efficiency of the development Process or the Quality of the
devetoped product to some degree.
CHAPTER II
out there is Little empirical evidence to confirm the
aivantajes of these approaches or measure their benefits.
In several instances, case studies nave been performeu,
often in a pioneering spirit, to demonstrate particular
aparoaches; these case studies have usually involved
qu3litative assessment, with only limited or uncontrolled
forms of quantitative assessment. Comparative assessment of
software development approaches is even rarer: only a few
controlled experiments [Shneiderman et al. 77; Myers 78)
have oeen conducted, and they have generally focused on the
use of particular techniques in isolation. The difficulty
of investigating the effects of software development
aporoaches stems precisely from the fact that they pertain
to tne Least understood and most expensive elements in
software engineering: human beings.
So:tir& Metrics
There has ueen considerable interest in software
metrics over the past half decade in response to a growing
reaLliation of how "invisible," imponderable, and Iurcontrollaole software can be. Software "scientists" have
teen teeking ways to measure software phenomena. 9roadly
interpretea, tneir efforts may be characterized as
attempting to Quantify process efficiency and product
au3llty.* The software measurement domain extends from the
curicrete details of a program, including its fine structure
anJ the resource expenditure required to produce it, to its
aostract characteristics: reliabilityt cost-effectiveness,
t his concept of product quality is meant to includein:i.intaneous, as welt as evoLutionary, consioerations. Theformer considerations pertain to both static (at compile,iis) and dynamic (at execution time) features of a program,as it exists at a given point along its life-cycle. The6, tpr consideratigns pertain to issues of software
maintenance and software management throughout the life-,r I.. The suftware measures in this dissertation adaressproduct quality only in its instantaneous, static sense.
11
CHAPTER II
corplexity, modutarity, comprehensibity, modifiability, etc.
Because measurement is essential to most forms of
engineering, software metrics rightfully deserve a central
clacp within the emerging discipline of soft.dare
engineering. As in other technologies, the underlyiny
assunption is that approoriate measurement is the key to
Pffective control. It has Peen demonstrated [Gilb 77] that
the general concept of software measurement can be appliea
to a variety of programming issues: many interesting
suglestions were made regarding how and why to measure
software. But the metrics discussed by Gilb are vaguely
defined and superficial. The problem is that meaningful
Teisurement of software is extremely difficult, because of
software's intricate structure of concrete detail and
because of the tenuous relationship between its concrete
details and abstract characteristics. An additional problem
is the lack of well-understood and commonly accepted
terminology to describe the software phenomena to be
measured.
However, a numoer of well-defined and fairly credible
software metrics have been proposed and evaluated, usually
in conjunction with a motivating model or some intuitional
underpinnings. The program changes metric [Dunsmore a.
rainon 77; Dunsmore 761 extracts an error coant
algorithmicalLy from the textual revisions made to source
code during proyram development. The cyclomatic complexity
metric [vcCabe 76] counts the number of "basic" control-flow
paths in a program. The data bindings metric [Stevens,
-yers & Constantine 74; 3asili & Turner 75; Turner 76]
coints commmunication paths between code segments via data
variatles. The various metrics from software science theory
r .,lstead 771--program length, program volume, language
Level, effort, etc.--provide a unified syste- of
CHAPTER II
-e3surements for the size of a program, the amount of
information it contains, the level of abstraction it
exoressesv the amount of mental effort required to produce
or comorehend it, etc. The error-day metric [mills 76) is
an inoex of how early errors are detected and corrected
Jurin, software development. The span metric CElshoff 760]
is an inoex of the extent to dhich a program's data
variaules remain "live" (i.e., continue to affect control
flow and data value determination).
Each metric mentioneo above has been examined
emoiricalLy to one degree or another; but few software
metrics have oeen investigated in controlled experiments,
anJ there is Little research comparing metrics or examining
their interrelationships empirically. Further elaboration
and discussion of individual software metrics is deferred to
Chapters IV and V since many were examined in this reseach
oroject.
First-hand observation of software phenomena in the
",il'i," so to speakt has Long been regarded as a unique
source of information and the ultimate form of validation.
Ever since Knuth rummaged through wastebaskets at computer
centers for discarded listings of Fortran programs [Knuth
713, software "technicians" have been interested in watching
softiware be developed, to see how the Latest intuitive
opinions or theoretical modeLs fare against reality.
Ideally, it is useful to distinguish between data collection
efforts (with descriptive statistical analyses) and
controlled experimentation efforts (with inferential
statistical analyses); but, in practice, elements of both
are sometimes combined within the same empirical study.
1.3
LT
CHAPTER II
ueneralty streaking, the purpose of data collection
effurts has been to examine the behavior of software metrics
anj -odets under realistic conditions. A number of data
collection efforts have Deen aimed at progamming-in-the-
targe,* focusing on models of gross behavior (i.e., cost,
oroauctivity, resource estimation) during large- to medium-
sc3le software development. At IBM [walston & Felix 77)
ata was collected via project reporting forms in order to
Te3sure productivity on oroduction software developments.
At NASA/Goidard [3asili et aL. 77J data is being collected
vi3 information forms in order to evaluate cost or resource
sstimation models and to study software error phenomena.
Other data collection efforts, focusing on small- to
,nelium-scale sof tware development, have been aimed at
ouantitatively characterizing software's fine structure. In
stidies at G* [Elshoff 76b; ELshoff 76a], a large set of
commercial PL/1 programs was collected and measured
accoraing to a host of quantifiable programming aspects and
,,oftware metrics, including the span metric and the software
-cience metrics.
Generally speaking, the purpose of controlled
exoerimentation efforts has been to evaluate the effects of
rprogramming language features, human factors issues, and
:rro~ramming methodologies upon software phenomena and
aostract characteristics. Usually, the language features
exoeriments are done from a computer scientist's viewpoint,
while the human factors experiments are done from a
csychologist's viewpoint. However, because of areas of
natural overlap between these two concernst some
exoreriments fall into both categories. Together they
coRprise the bulk of controlled experimentation in software
* See earlier footnote.
14
CHAPTEk I
engineering.
There are several well-known examples of controlled
exDerinentation on programming language features. vieeissman
[weissman 74a; weissman 74b] conducted experiments on how
prorramming features affect the psychological complexity of
soft.ware; the -eatures included commenting, indentation,
mnemonic variable nam-eis, anu control structures. Gannon
[Gannon 75; Gannon & Horning 753 conducted an experiment on
how programming language features affect software
reliaoility and the presence/persistence of errors; the
features included statement vs. expression orientation, data
variaole scooe conventions, and expression evaluation order.
Later, Gannon [Gannon 771 ran experiments to examine how
data typing conventions affect software reliability. Using
the same empirical data, Dunsmore [Dunsmore 9 Gannon 77;
Dunsmore 7.9) examined how programming "complexity" is
affected by programmer-controllable variations in
orogramming features. "Complexity" was measured
algorithmically by the program changes metric; the features
included statement nesting oepth, frequency of data
references, and data communication mechanism preference.
There are several well-known examples of controlled
exoerimentation on human factors issues. Several
Pxoeriments [Sime, Green & Guest 73; Green 77) have been
conducted on the comorehensibility of different mechanisms
for imolementin4 conditional oranching. Several experiments
[She pard et at. 79J have been conducted on the effect of
moJern coding practices, such as structured coding, mnemonic
variaule names, and style of commenting upon the ease of
Performing comprehensiont modification, and debugging tasks.
Finally, there are a fe6 well-known examples of
controlled experimuentation on programming methodologies.
15
CHAPTER II
Several experiments were conducted LShneiderrman et at. 77J
to evailuate the utility of detailed flowcharting (as a
Iesign tool and documentation aid) in program composition,
coiprehensiont oeougging, and modification tasks; novice
orogamming students were employed as subjects, with short
(i.e., less than 150 Lines) Fortran programs as test
materials. Some experiments were also conducted [4yers 76]
to evaluate the utility of code reading and walkthrou~hs in
deougging tasks; experienced professional orogrammers were
evoloyed as subjects, with a short PL/1 program as test
material. To date, however, controlled experimentation on
programmina methodologies has been limited in scope.
vxoerimentaL studies have not involved programming
activities spanning multiple phases of the software life-
cycle and requiring the natural integration of multiple
crogramming tasks. Nor have experimental studies useo
nontrivial test materials requiring sustained effort Lasting
severaL weeks and involving several hunored lines of code.
16
CHAPTER III
This chapter outlines the surroundings in which the
experiment was conducted, the experimental design that was
emalcyea, the programming methodologies that were compared,
the Jata coLtection and reduction that was performed, and
the rrogramming aspects that were measured.
L 2r rugn ijs
',overdl circumstances surrounding the experiment
D -'- ,tute significantly to the context in which its results
7,ust Le appraised. Tnes - 4nclude the setting in which the,
experiment was conducted, the people who participated as
subjects, the software development project that served as
the experimental task, the computer programming language in
which the software was written, and the computer system and
access mode that were used during development.
The experiment was conducted during the Spring 1976
setester, January through Mayt within regular academic
coirses given by the Department of Computer Science on the
CotLete Park campus of the university of Maryland. Two
cotparabte advanced elective courses were utilizeo, edch
with the same academic prerequisites. The experimental task
anJ treatments were built into the course material ana
assijnments. Everyone in the two classes participated in
the experiment; they cooperated willingly and were aware of
being monitored, out had no knowledge of what was being
observed or why.
rhe participants were advanced undergraduate and
orrduate students in the Department of Computer Science. On
tne wnole, they were reasonably competent computer
17
CHAPTER III
rD rammers, atL having completed at least four semesters of
programming course work and some having as much ds three
years' professional programming experience in government or
industry. Generally speaking, they were familiar with the
implementation Language and the host computer system, but
inexoerienced in team programming and the disciplined
net hodo logy.
The programming application was a simple compiler,
involving string processing and translation (via scanning,
narsing, code generation, ano symbol table management) from
an ALgo-like Language to zero-address code for a
hyoothetical stack machine. The total task was to design,
ivolement, test, and debug the complete computer software
system from given specifications. The scope of the project
excluded both extensive erro- handling and user
documentation. The project was of modest but nonnegtigibLe
difficulty, requiring between one and two man-months of
effort. The size of the resulting systems averaged over
12)0 Lines of high-level Language source code. ALL facets
of the project itself were fixed and uniform across all
development "teams." Given the same specifications,
conputer resource allocation, catenoar time allotment, host
machine, implementation language, debugging tools, etc.,
each "team" worked independently to build its own system.
The deLivered systems each ran (i.e., they worked) anu
passea an independent acceptance test.
The implementation language was the high-Level,
structured-programming language SIMPL-T [BasiLi g Turner
'6). This language was designed and developed at the
University of Maryland where it is taught and used
extensively in regular Department of Computer Science
coirses. SIMPL-T contains the following control constructs:
se:uence, ifthen, ifthenelse, whiLedo, case, exit from loop,
,' a
CHAPTER III
and return from routine (but no goto). SlmPL-T allows
essentially three levels of data declaration scope (i.e.,
local to an individual routine, global across the several
roitines of an individual module, or entry-global across the
routines of several modules), but routines may not be
nesteu. Adhering to a philosoohy of "strong typing," the
la ua'e supports integer, charactert and string oata types
and single dimension array data structures. It provices the
rrogrammer with automatic recursion and PL/1-like string-
orocessing capabilities. (Additional details regarding the
'71PL-T orogramming language are interspersedl among the
exolanatory notes in Chapter IV.)
The host computer system was the campus-wide computing
facility, a Univac 1100 machine with the usual Exec 8
operating system. This system supports, in its fashion,
both patch access (via punch cards) and interactive time-
sh3ring* access (via TTY or CRT terminals). The
participants were well acquainted with the system and
accustomed to either access mode. During the experiment,
the participants were allowed to choose whichever access
moJe they preferred and could switch freely between modes.
Almost everyone consistently preferred the interactive
access mode; only one person--in the Al group (see below),
by the way--usea the batch access mode extensively.
Exoerimental .ie ian
The major elements of an experimentaL design are its
units, treatment factors, treatment factor Levels, observed
variables, local control, and management of extraneous
factors. (Cf. JOstle and Mensing 75, chap. 0 for a general
treatment of these elements..)
* CatLed "demand" in Univac terminology.
CHAPTER III
An experimental unit s that object to which a single
treatment is apptied in one replication of the event Known
as the "basic experiment." In this study, the "basic
exoeriment" was the accomplishment of a specific software
development project (see above), ano the experimental unit
was tne software development team (ioe., a small group of
people working together to develop the software). A total
of 19 replications of this "basic experiment," each
performed concurrently and independently by a separateexoerimentaL unit, were involved in this experiment.
Most experiments are concerned with on one or more
indecendent variables and the behavior of a one or more
deoendent variables as the inaependent variables are
permitted to vary. These independent variables are known as
exoerimental treatment factors. This experiment focused on
the approach used to develop software, as the singleexperimental treatment factor.
Experiments usually involve some deliberate variwtionin the experimental treatment factor(s). Different values
or classifications of the factor(s) are known as the
exoerimentaL treatment factor levels. In this experiment,
three levels were selected for the software development
aporoach factor. Conceived as variations in two human-
factors-in-programming issues, size of oevelopment "team"
and degree of methodological discipline, the exoerimental
treatment factor Levels are denoted by the following
mnemonics:
Al -- individual programmers working alone, following
an ad hoc approach (see below);
AT -- teams of three programmers working together,
following an ad hoc approach (see below); and
DT-- teams of three programmers working together,
following a disciplined methodotogy (see below).
20
CHAPTER III
Durinq an experiment, observations of the dependent
variisle(s) are made for each experimental unit. An
exaeriment's immedidte objective is to ascertain the
relationship between the experimental treatment factor
levels and the experimental observed variables. In this
exoeriment, the observed variables were quantifiable
progr.nming aspects, or netrics (see below), of the software
development process or the developed software product. A
large set of such aspects were considered in the study.
Technically speaking, this amounted to conducting a series
of simultaneous univariate experiments, one for each
programning aspect, all sharing a common experimental design
and all based on the same empirical data sample.
Experimental local control addresses the configuration
by which (a) experimental units are obtained, (b) units are
La3cec into groups, and (c) groups are subjected to
different experimental treatments (i.e., specific
comt inations of experimental treatment factor levels).
Local control is employed in the design of an experiment in
order to increase its statistical efficiency or to improve
the sensitivity/oower of statistical test procedures.
Fxperimental local control usually incorporates some form of
ranoomization--a basic principle of experimental desitn--
since it is necessary for the validity of statistical test
orocedures.
For this experiment, subjects were obtained on the
basis of course enrollment: since the experiment was
emoedded aithin two academic courses, every student enrolled
in those courses automatically participated in the
exoeri-ment. Software development "teams" were formed among
these subjects. In the one course, the students were
allowed to choose between segregating themselves as
individual programmers or combining with two other
21
CHAPTER, III
ct, ssmates as three-person programming teams. In the other
cujrse9 the students were assigned (by the researcher) into
three-person team's. The two academic courses themselves
provided the variation in methodological discipline. The
atmosphere of the first course was conducive to an ad hoc
aporodch to programming, while the disciplined methodology
was stressed in the second course. In this manner, three
exoerimental treatments (corresponding to the three
exoerimentat treatment factor levels AI, AT, and DT) were
created, and three groups of 6, 6, and 7 units
(respectively) were exposed to them.
There are usually several extraneous factors, other
th3n the ones identified as experimental treatment factors,
that could influence the behavior being observed in an
exoeriment. many experivents (inctudings this one) follow a
reductionist paradigm, which seeks to control for alL
variaotes except a select few, so that the effect of the
inderendent variables upon the oependent variables can be
isolated and measured. in this experiment, a variety of
Drogramming factors which do affect software development
were given conscious consideration as extraneous variables:
- programming application and/or project
- project specifications
- implementation language
- calendar schedule
- avaitaoLe computer resources
- available automated tools
wherever possible, these variables were held constant by
exDlicitly treating all experimental units in the same
manner.
unfortunately, the ideal reductionist paradigm can only
he approximated, because of factors which arm suspected of
stron influence on the behavior of interest, but which
22
CHAPTFR Ill
cannot Cue exolicitly controlled within the experimental
desi ;n. In this experiment , there were two such factors:
the personal abi lity/experience of the participants and the
3mount of actual time/effort they (as students with other
cl3sses ano responsibilities) chose to devote to the
roject. However, information from a pretest questionnaire
was used to balance the Dersonal ability/experience of the
oarticipants in the disciplined teams (only), by first
c rtitioning the group DT students into three equal-sized
cate;>.ries (h gn, medium, low) oaseo on their grades in
zrevious computer courses and their extracurricular
progr.-ming experience, and then randomly selecting one
stjoent from each category to form each team.
For t Oe statistical mocel employed to analyze this
exoeriment, it was necessary to assume homogeneity among the
norticipants with respect to personal factors such as
aoility and/or experience, motivation, time and/or effort
devoted to the project, etc. As a reasonable measure of
individual programmer skill levels under the circumstances
of this study, the participants' grades from a particularly
pertinent prerequisite course provioed a post-experimental
co firmation of at least one facet of this assumed
honuloeneity: the distrioution of these grades among the
three experimental groups would have aisplayed the same
degree of homogeneity as was actually ooserved in over 9 out
of 17 ourety random assignments of the participants to the
groups. If anything, in the researcher's opinion, the
participants in group Al seemed to have a slight edge over
those in groups AT and DT with respect to native programming
aoility, while groups Al and AT seemed slightly favored over
croup ST with respect to formal training in the application
3rP a.
23
CHAPTER III
The discipLined methodoLogy imposed on teams in group
')T consisted of an integrated set of state-of-the-art
techniques, including too-down design, process design
tanguaqe (PDL), functional expansion, design and code
re3ding, walk-throughs, and chief programmer team
orjanization. These techniques and organizations were
taight as an integral oart of the course that the subjects
were taking, using [Linger, MiLs & Witt 79), [Basili &
"aker 75], and EBrooks 75] as textbooks. Since the subjects
were novices in the methodology, they executed it to varying
legrees of thoroughness and were not always as successful as
se3soned users of the methodology would be.
The discipLineo methodology prescribed the use of a PDL
+or exoressing the design of the problem solution. The
design was elaborated in a top-down manner, each Level
reoresenting a solution to the problem at a particular Level
ot abstraction and specifying the functions to be expanded
at the next level. The PDL consisted of a fixed set of
structured control and data structures, PLus an open-enoed
desiiner-defined set of operators and operands corresponding
to the Level of the solution and the particular application.
Vesign and code reading involved the critical review of each
team member's POL or code by at least one other member of
the team. walk-throughs representeo a more formalizeo
presentation of an individual's work to the other members of
the team in which the PDL or code was explained step oy
step. Under the chief programmer team organization, the
chief orogrammer defined the top-level solution to the
prooLem in POL, designed and implemented key portions of
cole himself, and assigned subtasks to the other two team
meroerse Each of these orogrammers, in turn, code-read for
the chief programmer, designed or coded their assigneu
24
CHAPTER III
sujcieces, anc performed liorarian activities (i.e.,
entering or revising code stored on-Line, making test runs,
etc*).
Two variants of chief programmer team organization,
lenoted CP and M, were employed. In both cases, one member
of the team (the chief programmer or the manager) was
responsible for designing and refining the top-level
solution to the proolem in PDL, identifying system
corponents to be implemented, and defining their interfaces.
rhe two other team members (the programmers) were each
responsibLe for designing or coding various system
corponents, as assigned by the chief programmer or manager.
In the CP case, the chief programmer maximized his cooing
duties oy implementing the key code himself, and the
programmers performed Librarian activities (i.e., entering
or revising code stored on-line, making test runs, etc.).
In the M case, the manater minimized his coding duties by
acting as Librarian and yielding greater responsibility for
ioementation to the programmers. Although there were
(sipposedly) four CP teams and three M teams in group DT,
this distinction between the CP and M variants of chief
pragrammer team organization is not utilized in the pre.ent
stidy, since it is believed that the impact of their common
*e3tures transcends any impact due to their differences.
• oreover, in actual practice, it was observed that the CP
anJ 11 variants are only identifiable extrema along a
continuum and that the group DT teams all gravitated toward
a romfortable compromise in this respect.
Each individual or team in groups Al or AT was allowed
to develop the software entirett in a manner of his or their
own choosing, which is herein referred to as an ad hoc
ap:roach. No methodology was taught in the course these
suojects were taking. Informal observation by the
25
CHAPTER III
researcher coni rmeo that approaches used by the individuals
and au hoc teams were indeeo Lacking in discipline anu aio
not utilize the key elements of the disciplined methouology
(e.,., an individual working alone cannot practice cooe
reading, and it was evident that the ad hoc teams did not
emotoy a PDL or a formal top-down design).
Due to the Partially exploratory nature of the
ex~eriment in terms of differences to be discovered in the
nroject and orocess, as much information was collecteo as
couli oe done in an efficient and unootrusive manner. A
variety of information sources was used. Individual
Questionnaires revealed the personal background and
programming experience of each participant. Private team
interviews and in-class team reports provided information
regarding individual performance on the project. "Run Logs"
and computer account bilting reports gave a record of the
computer activity during the project. Special module
compilation and program execution processors (invoked on-
line via very slight changes to the regular command
tngualie) created an historical data bank of source code and
test data accumulated throughout the project development.
The data bank provided the principal source of
information analyzea in the current investigation and other
information sources have been utilized only in an auxiliary
manner (if at all). Thus, data collection for the
exneriments themselves was automated on-line, with
essentially no interference to the programmer's normal
catt-rn of actions during computer (terminal) sessions. The
final products were isolated from the data bank and measured
for various syn-tactic and organizational aspects of the
fiiishea oruduct source code. Effort'and cost data were
k6
CHAPTER III
also extracted from the data bank. The inputs to the
analysis, in the form of scores for the various programming
asoects, reflect the quantitatively measured character of
the crouuct and the process. Much of the data reduction was
done automatically within a specially instrumented compiler.
Some was done manually (e.g., examining characteristics
across modules). Due to the underlying collection ana
reduction mechanism, which was uniformalty applied to all
exoerimental units, the data used in the analysis has the
characteristics of objectivity, uniformity, and
quantitativeness and is measured on an interval scale of
measurement [Stevens 46]. The raw scores for the measured
programming aspects are summarized in Apppenlix 1.
The dependent variaotes studiee in this experiment are
catleo programming aspects. They represent specific
isaLataute and observable features of programming phenomena.
rurthermore, they are measured in an objective ancd
automatable manner (i.e., they could be extracted or
couputea directly on-line from information readily
ootainabte from operating systems and compilers). For each
programming aspect there exists an associated metric, a
specitic algorithm which ultimately defines that aspect and
by which it is measured.
The programming aspects may be categorized as either
process- or product-related, on the basis of what they
measure. Process aspects represent characteristics of the
development process, in oarticuLart the cost and required
effort as reflected in the number of computer job steps (or
runs) and the amount of textual revision of source cooe
durir development. Product aspects represent
cn3racteristics of the final product that was developed, in
27
CHAPTER III
oarticular, the syntactic content and organization of the
symuoLic source code. Examples of product aspects are
numtber of lines, frequency of particular statement types,
average size of data variables' scope, etc.
The programming aspects may also be categorized as
either rudimentary or eLauorative, on the basis of their
conceptual nature. The rudimentary aspects are conceptually
quite simple, reflecting ordinary surface features of the
crocess or product. For example, the numbers of data
variatles and routines in a program are rudimentary aspects;
tney pertain to the sheer size of the software and are
somewhat uninteresting in themselves. The elaborative
,isoects are conceptually more subtle, reflecting deeper
ch3racteristics of the process or product. For example, the
numper of times Pairs of routines communicate via data
variables (see the data bindings metric below) is an
el3oorative aspect; it pertains to the software's modularity
and is intuitively appealing.
Finally, the programming aspects may be categorized as
either confirmatory or exploratory, on the basis of the
notivation for their inclusion in the study. The
confirmatory aspects had ueen consciously planned in ddvance
of colLecting and extracting the data, oecause intuition
su;gested that they would serve well as quantitative
inJicators of imoortant juatitative characteristics of
sofware development phenomena. It was predicted a priori
that these confirmatory aspects would verify the study's
basic premises regarding the programming environments being
investigated in the experiment. The exploratory aspects
were considered mainly because they could be collectea and
extracted cheaply (even as a natural by-product sometimes)
aLony with the confirmatory aspects. There was little
serious expectation that these exploratory aspects would be
CHAPTER III
useful indicators of differences among the groups; but they
were included in the study with the intent of observing as
many aspects as possible on the off chance of discovering
any unexpected tendency or difference. Thus, this study
conoines elements of both confirmatory and exploratory oata
analysis within one common experimental setting [Tukey 69].
The confirmatory programming aspects are identified in the
accompanying taoles by being flagged with asterisks; the
exoloratory rroiramming aspects are unflagged.
it should oe noted that a Large percentage of the
oroduct aspects fall into the rudimentary-exoloratory
category. On the whoLe, these product aspects represent a
fairly extensive taxonomy of the surface features of
software. The idea that important software qualities (e.g.,
"complexity") could be measured by counting Such surface
features has generally been disregarded by some researchers
as too simplistic (e.g., [Mills 73, p. 232)). A resclve to
study these surface features empirically, to see if
soretning might turn up, before rejecting the underlying
idea, was partially responsible for their inclusion in the
study.
The particular programming aspects examined in tnis
investigation are presented in Chapters IV and V. A
covplete list of aspects, together with explanatory notes,
is given in Chapter IV, with definitions for the nontrivial
or unfamiliar metrics. Chapter V contains a in-depth
discussion of the elaborative aspects.
29
CHAPTER IV
This chapter presents all of the programming aspects
examineo in the study. The goal of this chapter, in
conjunction with the next, is to describe each programming
asoect and, where appropriate, to motivate its intuitive
aDoeal as a software metric. 9ecause the brief explanatory
notes within this chapter do not adequately aescribe a
certain subset of the aspects (namely, the elaborative
a.ects)v they are further discussed within the next
(ia3 p t e r.
Table 1 Lists the programming aspects examined in this
investigation. They appear grouped according to
definitionaLLy related categories, with indented qualifying
phrases to specify variants of general aspects. When
referring to an individual aspect, a concatenation of the
major phrase with any qualifying phrases (separated by \
symbols) is used.* For example, the aspect Label
COMPUTER JOB STEPS\qODULE COMPILATION\UNIQUE
refers to a metric involving computer b steps that are
ncIuLe compilations in which the sourc code is unique from
.3Ll other compilea versions.
In oroer to avoid any misunderstanding, a reoundancy
issue must be statea and properly appreciated. Several
instances of duplicate programming aspects exist; that is,
sove Logically unique asoects reappear with another Label,
in oruer to provide alternative views of a given metric or
to round out a group of related aspects. For example, the
FUnCTION CALLS aspect and the STATEMENT TYPE COUNTS\
* Ascect Labels are always written compLeteLy in uppercaseletterst white references to general concepts appear inLCoerc4se Letters, with initial or defining occurrencesun ier Iined.
TA3LE 1
T+ab Le I. P rotram!inS S2.p t S
- Parenth'esized numbers refer to the explanatory notesin Cha:ter IV. Asterisks mark the confirmatory aspects; t'eeoLoratory asoects are unmarkeu.
rodimentary process aspects
* ICO PUTER JO STEPS(2) * I .1MODULE COCMPILATIONC ) * UNIQUE( ) IDENTICAL
(4) PROGRAM EXECUTION( 5) MISCELLANEOUS(6) * ESSENTIAL
(7) AVERAGE UNIQUE COIPILATIONS PER MODULE(_ ) 'AX. UNIOUE COMPILATIONS F.A.O. MODULE
"%. is an abbreviation for MAAIMUMF.A.O. is an aubreviat ion for FOR ANY OE
elaborative process aspects
(r)) * I ROGRAM CHANGES
rudimentary product aspects
(1 ) * ImODULE S
(11) * SEGMENTS
(i2) ISE(MENT TYPE COUNTS
(11) 1 FUNCTION'1, ) PROCEDURE
----------------------------------------------I(I1 ISEGMIENT TYPE PERCENTAGES
(1) ( FUNCTION1) PROCEDURE
(13) AVERAGE SEGMENTS PER MODULE
14i~) *LINES
15) * STTEMENTS
(1) STATEMENT TYPE COUNTS(lo) " Z
(! ) * CASE(jj) * wI'ILE(:I) * EXIT
(2 ,9) (PROC)CALL(23,90) NONINTRINSIC( , ) INTrRINSIC
(2'.)(E TURN
(1..).] • ISTA T E M EN T T Y P E P E R C E N T A G E S
11 ) " I CASE'.) * wHILE
30a
TAbLE 1
(.:1) EXIT(*:.) I (PROC)CALL(_2) INONINTRINSIC(2;) INTRINSIC(24) * RETURN
(?5) * AVERAGE STATEMENTS PER SEGMENT
(2o) * AVERAGE STATEMENT NESTING LEVEL
(27) * DECISIONS
(22,99) FUNCTION CALLS(23,99) NONINTRINSIC(23, 9) INTRINSIC(23) TOKENS(2) AVERAGE TOKENS PER STATEMENT
( ) INVOCATIONS( 11,9g) FUNCTION( 3, gO NON NTRINS IC(23,99) NTIN TRINSIC
(23,90) INTRINSIC(23,) NONINTRINSIC(23) INTRRINSIC
(30) aVG. INVOCATIONS PER (CALLING) SEGMENT(11) FUNCTION(23) NONINTAINSIC(2) INTRI NSIC(11) PPOCEDURE(23) NONINTRINS IC(23) I NTRI NSIC(2 99) NONINTRINSIC(23 INTRINSIC
('1 9Q) AVG. INVOCATIONS PER (CALLED) SEGMENT
(11) PROCEDURE
(32) DATA VARIAOLES
(37) DATA VARIABLE SCOPE COUNTS(33) GLOPAL(34) ENTRY(35) MODIFIED
(35) UNMODI F IED(34) NONENTRY(35) MODIFIED(35) UNMODIFIED
(35) MODIFIED'it c UNMOD IFIE0
("3) NONGLOBAL(33) PADM ETEP(36) VALUE(!o) REFERENCE
A33) A LOCAL
(37) DATA VARIABLE SCOPE PERCENTAGES('3) * GLOBAL(34) ENTRY(35) MOD!FIED(35) UNMODIFIED(34) NONENTRY(35) MODIFIED(35) UNMODIFIED(15) C DI F I ED(35) UNMODIFIED
30b
T ABL3 1
( . 3) NONGLOBAL(*5) PARAMETER
(30) VALUE(30) REFERENCE(33) LOCAL(3o) IAVERAGE GLOaAL vARIABLES PER MODULE I(34,) E TNTRY(34) NONENTRY(35) MODIFIED(35) UNMODIFIED
(7d) jVE-R-A-GEONGLOBAL VARIAaLES PER SEGMENT(33) PARAMETER(33) LOCAL
("9) PARA'.ETER PASSAGE TYPE PERCENTAGES(30) VALUE(36) REFERENCE(40) (SEGMENTGLOSAL) ACTUAL USAGE PAIRS(34) ENTRY
(35) M00IFIED(3z) UNMODIFIED(3Z) NONENTRY
(35) MODIFIED(35) UNMODIFIED
(35) NMODIFIED
(4) (SEGMENTGLOBAL) POSSIBLE USAGE PAIRS(34) ENTRY( 5) MODIFIED(35) UNMODIFIED(34) NONENTRY( 35) MODIFIED('5) UNMODIFIED(35) MODI FIED
5 UNMODIFIED II---------- -------------------- --------- I(4C) '(SEGMENT,GLOAL) USAGE PAIR ;EL.PERCENT.9
(34) ENTRY(7 5) MOD I F IED(3 5) UNMODIFIED(34) NONENTRY
7( r,) !ACD IF I FD
(35) U . %R0OD I F I(35) H 0OD FI E D(35) UNMODIFIED I
AVG. is in aobreviation for AVERAGEREL.PERCENT. is an abbreviation for RELATIVE PERCENTAGE
etaborative product aspects
(44) ICYCLOMATIC COMPLEXITY
(4j) SIMPPRED-NCASF VARIATION(46) TOTAL(47) #SEGS :CC>=10
(40) 0.5 QUANTILE POINT VALUE(4d) ,5 QUANTILE TAIL AVERAGE(4d) 0 .? QUANTILE POINT VALUE(4,) I 0.7 QUANTILE TAIL AVERAGE I(4) C. QUANTILE POINT VALUE(4*) . * 3.5 QUANTILE TAIL AVERAGE(46) C .9 QUANTILE POINT VALUE I(4C) I O. QUANTI LE TAIL AVERAGE
30Oc
TA6LE 1
(4) 1~ SIYPPPED-LUGCASE VARIATION :4:3) TOTAL(47) ISEGS:CC>=10
(4,-) '.5 QUAN'ILE POINT VALUE(4,) C.5 QUANTILE TAIL AVERAGE(4.1 I C.? QUANTILE POINT VALUE I(4.) 0.7 QUANTILE TAIL AVERAGE(4-,) I.0 QUANTILE POINT VALUE(4) 4F QUANTILE TAIL AVERAGE(4:) 0,, QUANTILE POINT VALUE(4.) 6,,9 QUANTILE TAIL AVERAGE
(45) COMPPRED-NCASE VARIATION(4o) TOTAL I(47) ,SFGS:CC>=1,(4;) 0.5 QUANTILE POINT VALUE(4:) 0.! QUANTILE TAIL AVERAGE(4c) 0.7 QUANTILE POINT VALUE(4.-) 0.7 QUANTILE TAIL AVERAGE I( ..) 0.q QUANTILE POINT VALUE I-. -°; C . QUANTILE TAIL AVERAGE
(4.) .9 QUANTILE POINT VALUE I(4. 0..9 QUANTILE TAIL AVERAGE
(45) COMPPCED-LOGCASE VARIATION(4) TOTAL(47) #ScGS :CC>=1 -J(4 -2) 0. QUA1NTILE POINT VALUE(4.) 0.5 QUANTILE TAIL AVERAGE I( 4) { 0.7 QUANTILE POINT VALUE(4o) . QUANTILE TAIL AVERAGE I(4 ,) 1 .3 QUANTILE POINT VALUE(43) C.S QUANTILE TAIL AVERAGE(4,,) 0. QUANTILE POINT VALUE(4.) 0.9 QUANTILE TAIL AVERAGE
(41) (SEGMENT,GLOJALSEGMENT) DATA BINDINGS :1(42) ACTUAL I(43) SU! FUNCTIONAL(4-.3) I INDEPENDENT(42) POSSI9LE(42) RELATIVE PERCENTAGE
(4i) SOFTWARE SCIENCE aUANTITIES( 7') I VOCABULARY(50) LENGTH(5) ESTIMATED LFNGT(S) /DIFFERENCE(N •(5 ) VOLUME1% 99) INTELLIGENCE CONTENT(C(" ESTIMATED BUGS
(51) 1ST CALCULATICN METHOD(% ) PROGRAM LEVEL(S 3) DIFFICULTY(50, 99) POTENTIAL VOLUME0( * U-) LANGUAGE LEVEL(s3) EFFORT I(5j) ES T IMATED TIME
(c1) 2ND CALCULATION METHOD(5,) PROGRAM LEVEL(50) DIFFICULTY(50) POTENTIAL VOLUME(in) %DI FFEtENCE(VI)(5u) LANGUAGE LEVEL( { ) IEFFORT
( J) IESTIMATED TIME
30d
CHAPTER IV
(PQOC)CALL aspect are each LaoeLpd and categorized from the
viewpoint of implementation Language construct frequencies.
nut the same metrics can also be considered from the
viewpoint of segment invocation frequencies, warranting the
inclusion of the two Ouplicate aspects INVOCATIONS\FUrvCTIONS
ani INVOCATIONS\PROCEDURES as variants of the general
I.%OCATIONS aspect. Among the 197 programming aspects
tisteu in Table 1t there are 8 pairs of duplicate aspects
(iientified in note 99 below), leaving 189 nonredundant
aspects examined in the study. Ry definitiont the data
scores obtained for any Pair of duplicate aspects will be
identical, and thus the same statistical conclusions will be
reached for both aspects. This redundancy must be kept in
mind when evaluating the results of the experiments.
Lrief explanatory notes about the programming aspects
are ':iven below, in the form of numbered paragraphs keyed to
the list in Table 1, with definitions for the nontrivial or
unfamiliar metrics. These notes usually supply Loose
exoLanations for the general concepts behind these
programming aspects, before mentioning any restrictions or
variations in how they were applied and measured in this
stJdy. Technical meanings for system- or language-dependent
terms (e.g.,t module segmwent, intrinsic, entry) also appear
here. Since computer Programming terminology is not
staniarcizedv the reader is cautioned against drawing
inferences not oasea on this dissertation's definitions.
.x:o1anator X g22 f2 It Pr2grI ADg !sIe~ts
(1) A con2uter Jo _ste2 is a conceptuatly indivisible
oroirammer-oriented activity that is performeo on a computer
at tme operating system command level, is inherent to the
software development effort, and involves a nontrivial
=oxenuiture of computer or human resources, IdeaLly
k ~~~31 .....
CHAPTFR IV
soeakirig, examPLes of jo!) Steps would include editing
syimooLic texts, compiLing source modules, Link-editing (or
cotlecting) object modules, and executing entire programs;
howeverv ocerat ions such as querying the operating system
for status informat ion or reouesting access to on-L ine fi Les
would not aualif y as job steps- In this study,
consicerat ion for the C014PUTER JOB STEPS aspect was Limitedexclusively to the activities of compiling source modlutes or
executing entire programs, but not all of the activities so
counted dealt with the finaL product (or logical
creaecessors thereof).
(2) A modyle r.miaiLctia is an invocation of the
itoLenientation Language orocessor on the source code of an
inlividual module. In this study, only compilations of
molules comprising the final software product (or logical
Predecessors thereof) are counted in the COM~PUTER JOB STEPS\v'VDULE CIMPILATION aspect.
(3) ALL module comoiLations are (suo)categorizeo as
either iclentiqil or uniggg dlepending on whether or not the
soirce code compiled is textuaLy identical to that of a
orevious compilation. During the development process, each
unique compilati on was necessary in some sense, whi Le an
identical compitation could conceivably have been avoided by
saving the (relocatable) object module from a previous
compi lation for Later reuse (except in the situat ion of
undoing source code revisions after they have been tested
and found to be erroneous or superf luous).
(4) A p gltgig is an invocation of a completePog~ramme r-deve Loped program (after the necessary
compiLation(s) and link-editing) upon some test data. In
this study, only executions of programs composed of modules
corprising the final product (or logical predecessors
CHAPTER IV
therpof) are counted in the COMPUTER JOt STEPS\PROGRAM
EXECUTIONj asoect.
(5) A _miii6ne2os i29 11Q1 is an auxiliary
conpilation or execution of something other than the final
software product. In this study, the COMPUTER JOB STEPS\
"ISCELLANEOUS aspect counts exactly those activities
inclucea in the COMPUTER JOB STEPS aspect but not incluaed
in the CO4PUTER JOB STEPS\MODULE COMPILATION or COMPUTER JOB
STEPS\PRnGRAm EXECUTION aspects.
(6) An essentiaj i2q ste is a computer job step that
involves the final software product (or Logical predecessors
thereof) and could not have been avoided (by off-Line
cotputation or by on-Line storage of previous compilations
or results). In this study, the COMPUTER JOB STEPS\
ESSENTIAL asoect is the sum of the COMPUTER JOB STEPS\MODULE
COmPILATION\UNIQUE aspect plus the COMPUTER JOB STEPS\PROGRA M
EXECUTION aspect.
(7) The AVERAGE UNIQUE COMPILATIONS PER MODULE aspect
is a way of normalizing the COMPUTER JO STEPS\MODULE
C04PILATION\UNIQUE aspect.
(3) The MAXIMUM UNIQUE COMPILATIONS FOR ANY ONE MODULE
asoect is another way of normalizing (by isolating the worst
c3se) the COMPUTER JOB STEPS\MODULE COMPILATION\UNIQUE
aspect.
(9) The 21:22La gbingl metric [Dunsmore & Gannon 77)
is a measure of the total amount of textual revision made to
croiram source code during the (postdesign) software
development period. The rules for counting program changes
are designed to identify individual conceptual changes
a;orithmicatLye .ach occurrence of the following revisions
33
CHAPTER IV
is counted as a single program change: modification of a
singLe statement, insertion of contiguous statements, or
modification of a single statement followed immediately by
insertion of contiguous statements. However, the following
revisions are not counteJ as program changes: deletion of
contigjuous statements, insertion of standard output
statements or special conpiler-provided debugging
directives, and instances of Lexical reformatting without
syntact ic/semantic alteration.
see Chapter v for further discussion of the program
changes metric.
(10) A rngdule is a separately compiled portion of the
covplete software system. In the implementation Language
SI4PL-T, a typical module is a collection of the
declarations of several global variables and the definitions
of several segments. In this study, only those modules
which comprise the final product are counted in the MODULES
asoect.
(11) A searnent is a collection of source code
statements, together with declarations for the formal
;.rameters and Local variables manipulated by those
statements, that may be invoked as an operational unit. In
the implementation language SIMPL-T, a segment is either a
value-returning fuD.12n (invoked via reference in an
expression) or else a non-value-returning roslFOI!re (invokeo
Via the CALL statement). The segment, function, and
orocedure of SIMPL-T correspond to the (sub)program,
function, and subroutine of Fortran, respectively.
(12) The group of aspects named SEGMENT TYPE COUNTS
gives the absolute number of programmer-defined segments of
each type. The group of aspects named SEGMENT TYPE
00CENTAGES gives the relative percentage of each type of
34,
CHAPTER IV
segment, comoared with the total number of proqrammer-
defined segments. The latter group of aspects is a way of
normaLizing the former groups of aspects.
(13) Since in the implementation Language SIMPL-T
segment definitions occur within the context of a moduLe, a
natural way to normalize (or average) the raw counts of
segments is provided. The AVERAGE SEGMENTS PER mODULL
asaect represents the average size, in segments, of modules
in the program.
(14) The LINES aspect counts every textual Line of
delivered source code in the final product, including
coi ments, compiler directives, variable declarations,
executable statements, etc.
(15) The STATEMENTS aspect counts only the executable
constructs in the source code of the final product. These
are high-levet, structured-programming statements, including
sivple statements--such as assignment and procedure catL--as
well ds compound statements--such as if-then-else and while-
do--which have other statements nested within them. The
i Lementation Language SIMPL-T allows exactly seven
different statement types (referred to Oy their
distinguishing keyword or symbol) covering assignment (.=),
alternation-selection (IF, CASE), iteration (wHILE, EAIT),
anj crocedure invocation (CALL, RETURN). Inout-output
operations are accomplished via calls to intrinsic
procedures.
(18) The group of aspects named STATEMENT TYPE COUNTS
gives the absolute number of 'executable statements of each
tyoe. The group of aspects named STATEMENT TYPE PERCENTAGES
gives the relative percentage of each type of statement,
cotpared with the total numoer of executable statements.
CHAPTER IV
The totter group of aspects is a way of normalizing the
former groups of aspects.
(17) AS mentioned abovet the := symbol aenotes the
issionment statement. It assigns the value of the
exoression on the right hand side to the variable on the
left hand side.
(1d) 3oth if-then and if-then-else constructs are
counted as IF statements. Each IF statement allows the
execution of either the then- or else-part statements,
deoenoing upon its boolean expression.
(19) The CASE statement provides for selection from
several alternatives, depending upon the value of an
exoression. In the implementation Language SIMPL-Tt exactly
one of the alternatives (or an optional eLse-part) is
selected per execution of a CASE, a list of constants is
exoLicitly given for each alternative, and spLection is
bdsed upon the equality of the expression value with one of
rhe constants. (These constants are referred to as 'case
laoers'; these alternatives, as "case branches.') A case
construct with n alternatives is Logically and
selmantically equivalent to a series of n nested if-then-
else constructs.
(20) The WHILE statement is the only iteration or
tooping construct provided by the implementation language
SIIPL-T. It allows the statements in the loop booy to be
executed repeatedly (zero or more times) depending upon a
0 ooLean expression which is reevaluated at every iteration;
the Loop may also be terminated via an EXIT statement. Each
wmILE statement may be ootionatLy labeled with a desi3nator
(referenced by EXIT statements) which uniquely identifies it
from other nested WHILE statements.
36
CHAPTER IV
(21) The EXIT statement allows the abnormal
termination of iteration Loops by unconditional transfer of
control to the statement immediately following the WHILE
statement. Thus it is avery restricted form of GOTO. This
exiting may take place from any depth of nested loops, since
the EXIT statement may optionaLly name a designator which
ioentifies the Loop to be exited; without such a oesi nator
only the immediatety enclosing loop is exited.
(22) Since there are two types of segments in the
imlementation Language SIMPL-T, there are two types of
"caLls" or segment invocations. Procedures are invoked via
the CALL statement, and functions are invoked via reference
in an expression. The counts for these separate constructs
are reported separately as the (PROC)CALL and FUNCTION CALL
aspects, and jointly as the INVOCATIONS aspect.
(23) Intrin1ij means provided and defined by the
imolementation Language; 2onintrinsi means provioed and
defined by the programmer. These terms are used to
distinguish built-in procedures or functions (which are
supported by the compiler and utilized as primitives) from
segments (which are written by the programmer). NearLy aLL
of the intrinsic procedures in the implementation language
SI'IPL-T perform input-output operations and external oata
file manipulations. ALl of the intrinsic functions in
SIPL-T perform data type coercions and character string
operations.
(24) The RETURN statement allows the abnormal
termination of the current segment by unconditional
resumption of the previously executing segment. Thus it is
another very restricted form of GOTO. within a functiont a
OETURJ statement must specify an expression, the value of
which becomes the value returned for the function
37
CWAPTER IV
invocdtion. within a procedure, a RETURN statement must not
specify such an expression. Additionally, a simple RETURN
st3tement is optional at the textual end of Procedures; it
will ue implicitly assumed if not explicitly coded. In this
stjoy, the total number of explicitly coded and implicitly
assumed RETURN statements, both from functions and
oroceoures combined, is counted.
(25) The AVERAGE STATEMENTS PER SEGMENT aspect
nrviaes a way of normalizing the number of statements
relative to their natural enclosure in a program, the
segment. The measure also represents the average length, in
Pxecutable statements, of segments in the program.
(26) In the implementation language SIMPL-T, both
simple (e.g., assignment) and compound (e.g., if-then-else)
statements may be nested inside other compound statements.
A oarticular nestin level is associated with each
-t3tement, starting at 1 for a statement at the outermost
leveL of each segment and increasing by 1 for successively
'ested statements. Nesting Level can be displayed visually
.,i3 proper and consistent indentation of the souce cooe
!isting.
(27) The DECISIONS aspect is the sum of the numoers of
'F, CASE, and WHILE statements within the program's source
:ole. Each of these statements represents a unique
(possihtly repeated) run-time aecision coded by the
nrogrammer. Because the implementation language SIMPL-T has
only structured control structures, this aspect is closely
related to the cyclomatic complexity metrics discussea
te t O.
(28) Tokens are the basic syntactic entities--such as
keywords, operators, parentheses, identifiers, etc.--that
CHAPTFR IV
occur in a program statement. The average number of tokens
cer statement may be viewed ds an indication of how mucn
"information" a typical statement contains, how "powerful" a
tyoicaL statement is, or how concisely the statements are
codeJ.
(p9) An iaQ Ilge is the syntactic occurrence of a
construct oy which either a programmer-defined segment or a
bjilt-in routine is invoked from within another segment;
both procedure calls and function references are counted as
!%VOCA T IONS. They are (sub)categorized by the type (i.e.,
+unction or procedure, nonintrinsic or intrinsic) of segment
or routine being invoked.
(30) The group of aspects named AVERAGE INVOCATIONS
PE? (CALLING) SEGMENT reoresents one way to normalize the
absolute number of invocations. These aspects reflect the
average number of calls to programmer-defined segments and
built-in routines from a programmer-defined segment. They
are (suo)categorized oy the type of segment or routine being
invoked.
(31) The group of aspects named AVERAGE INVOCATIONS
0 ER (CALLED) SEGMENT represents another way to normalize the
aosolute number of invocations. These aspects reflect the
avera~e numoer of calls to a programmer-defined segment from
other segments. They are (suo)categorized by the type
(i.e., function or procedure) of segment being invokeo.
(32) A gata 31rijab i is an individually named scalar
or structure. The implementation language SIMPL-T provides:
(a) three data tr2es for scalars--integer, character, and
(varying-Lengtn) string;
(o) -ne kind of data structure (besides scalar)--single
uimensional array, with zero-origin subscript range;
79
CHAPTER IV
a no
(c) several LeveLs of si212e (as explained in note 33 oeIow)
for data variables.
In aduition, all data variaoles in a SIMPL-T program must be
ex:Licitty declared, with attributes fully specified. The
DATA ARIASLES aspect counts each data variable declared in
the inal software product once, regardless of its type,
structure, or scope. Note that each array is counted as a
singLe data variable.
(!3) In the implementation Language SImPL-T, data
viriautes can have any one of essentially four Levels of
sco P--entry global, nonentry global, parameter, and Local--
-Oeoenaing on where and how they are declared in the program.
Note that the notion of scope deals only with static
accessibility by name; the effective accessibility of any
variaote can always be extended by passing it as a parameter
between segments. The scope Levels are explained here (and
presented in the aspect (sub)categorizations) via a
rierarchy of distinctions.
The primary distinction is between global anano-i Lobal. GLooaL variables are accessible by name to each
rlt. egmencs in the module in which they are declarea.
o~aj, variaoles are accessible oy name only to the
sinjle segment in which they are declared.
GLobal varaibtes are secondarily distinguished into
entry and nonentry categories. Entry a o is may be
aLcessioLe by na'-e to each of the segments in several
molules (as explained in note 34 below). Nann~r t b.iis
are accessiole by name only within the module in which they
are JecLared.
,4ongLobaL variaotes .re secondarily distinguished into
formal parameters and Locals. Formal plr1!tj srs are
accessible by name only within the enclosing (called)
s~e Tentq out their values are related to the calling segment
43J
CHAPTER IV
(as exolained in note 36 below). L2_IS are accessible by
navie only within the enclosing segment, and their values are
completely isolated from any other segment.
(34.) Entry means that the data variable (explicitly
declarea as ENTRY in one mooule) is accessible from within
other q eoarateLy compiled modules (in which it must be
exalicitly declared as EXTernal). UQnfn1ry means that the
d3ta variable is accessible only within the modute in which
it is declared. In this study, these terms are used only in
rpference to global variables, although the implementation
tan;uage SIMPL-T handles the accessibility of segments
across modules in the sane way.
Although the implementation Language SIMPL-T does allow
the ExTernal attribute to be declared Locally so that just
the enclosing segment has access to an identifier decLarea
as ENTRY in another module, this feature is seldom used; it
never occurred in any of the final software products
ex3mined in this study.
(35) ' _gified means referred to, at least once in the
vr +grom source code, in such a manner that the value of the
data variable miiht be (re)set when (and if) the appropriate
st3tements were to ue executed. Data variables can be
(re)set only oy (a) being the "target" of an assianment
statement, (b) being passed by reference to a programmer-
Jefired segment or built-in routine, or (c) being named in
an "input statement." (This third case is really covered by
the secona case since all the "input statements" in SIMPL-T
are actually calls to certain intrinsic procedures with
oasseu-oy-reference oaraveters.) Vn!22i1ieS means referred
to, throughout the program source code, in such a manner
that the value of the data variable could never be (re)set
r4Jrin,, execution. These terms refer only to global data
var i 3Dles.
41
CHAPTER IV
Any global variable is allowed to have an initial value
(constants only) specified in its declaration. Gtobaks
which are initiaLized but unmodified are especially useful
in SIMPL-T programs, serving as "named constants."
(36) The inplementation language SIMPL-T allows two
tyoes of parameter passage. Pass-by-vjtue means that the
value of the actual argument is copied (upon invocation)
into the corresponding format parameter (which thereafter
behaves like a local variable for all intents and purposes);
the effect is that the called routine cannot modify tne
value of the calling segment's actual argument. Pass-by-
referee means that the address of the actual argument
(,jhich must be a variable rather than an expression) is
passed (upon invocation) to the called routine; the effect
is that any changes made by the called routine to the
corresponding formal parameter wilt be reflected in the
value of the calling segment's actual argument (upon
return). In SIMPL-T, formal parameters that are scalars are
normalLY (default) passed by vatue, but they may be
A 4LicitLy declared to be passed by reference; formaL
..,rameters that a,-e arrays are always Passed by reference.
(37) The jroup of aspects named DATA VARIABLE SCOPE
COJNTS gives the absolute number of declared data variables
iccoroing to each level of scope* The group of aspects
ia-ed ATA VARIA8LF SCOPE PERCENTAGES gives the relative
percentage of variaotes 3t each scope level, compared with
the total number of declared variables. The latter group of
asoects is a way of normalizing the former groups of
a)Dects.
(30. A natural way to normalize (or average) the raw
counts of data variables is provided# since data variable
•JectLdrations in the implementation language SImPL-T may only
42
CHAPTER IV
ap~ear in certain contents within the program: gtobals in
the context of a module and nonglobats in the context of a
segment. The group of aspects named AVERAGE GLOgAL
VAQlAbLES PEq MODULE represents the average number of
glbaLs declared for a module. The group of aspects named
AVERAGE NONGLOBAL VARIABLES PER SEGMENT represents the
averaGe number of nonglobals declared for a segment.
(3*) Since there are two types of parameter passing
frecnanisms in the implementation Language SI*PL-T (as
exalained in note 36 above), it is desirable to normalize
their raw frequencies into relative percentages, indicating
the programmer's degree of "preference" for one type or the
other. The group of aspects named PARAMETER PASSAGE TYPE
0ECENTAGES gives the percentages of each type of parameter
relative to the total number of parameters declared in the
program.
(40) A segment-glooaL uSa dir (p,r) is an
instance of global variable r being used by a segment
(i.e., the global is either modified (set) or accessed
(fetched) at Least once within the statements of the
segment). Each usage pair represents a unique "use
connection" between a global and a segment. In this study,
segment-gtobal usage pairs were (suo)categarized by the type
(i.e., entry or nonentry, modified or unmodified) of global
jata variable involved and were counted in three different
ways.
First, the (SEGRENTGLOPAL) ACTUAL USAGE PAIRS aspects
count the absolute numbers of realized usage pairs (p,r)
the global variable r is actually used by segment p a
They represent the frequencies of use connections realized
within the program. Second, the (SEGMENTGLOBAL) POSSIBLE
!JSAGE PAIRS aspects count the absolute numbers of potential
usaue pairs (pr) , given the program's global variables
43
CHAPTER IV
.inJ tneir declared scoue: the scope of global variable r
-ereL/ contains segment p v so that p coul potentially
,4odify or access r * These counts of possible usage pairs
are computed as the Sum of the number of segments in each
qLooal's scooe. They represent a sort of "worst case"
frequencies of use connections. Third, the (SEGMENTGLLBAL)
USAGE PAIR RELATIVE PERCENTAGE aspects are a way of
normalizing the numoer of usage pairs since these measures
are ratios (expressed as Percentages) of actual usage pairs
to possible usage pairs. They represent the frequencies of
reaLi-ea use connections relative to potential use
cj'nections. These usage pair relative percentage metrics
are emoirical estimates of the likelihood that an arbitrary
segment uses (i.e. sets or fetches the value of) an
ar3i*rary global variable.
In some sense, all three types of aspects dealinu with
segment-global usage pairs (actual, possible, and relative
percentage) reflect quantifiable characteristics of "data
mouLarization" within a program, i.e., the static
organization of data definitions and references within
segments and modules. In particular, the possible usage
ojirs aspects reflect the general degree of encapsulation
-)forcea by the implementation Language for global
variaoles. "oreover? the usage Pair relative percentage
asoects retlect the general degree of "globality" for global
variables, i.e., the extent to which globals are actually
usea oy those segments that could possibly do so.
(41) A se.ment-globaL-segment d#t. Injnjc (Stevens,
Oy-brs & Constantine 74, op. 118-119] (prq) is defined as
an occurrence of the following arrangement in a program: a
selment p modifies (sets) a global variable r that is
also accessed (fetched) by a segment q , with p different
*r~m n - The (SEGMENTGLOBALSEGMENT) DATA BINDINGS
asoects count these unique communication paths between pairs
44
CHAPTER IV
of sewments via glooaL variables. These aspects thus
reflect the degree of one form of connectivity within a
progra .
See Chapter V for further discussion of the data
bindings metrics.
(42) In this study, segment-global-segment data
bindings were counted in three different ways: ACTUAL,
POSSIoLE, and RELATIVE PERCENTAGE. First, the DATA
PI4DINGS\ACTUAL aspect counts the total number of data
bindings actually coded in the program, reflecting the
degree of realized connectivity. Second, the DATA BIhDINGS\
POSSIOLE aspect counts the total number of data bindings
that could possibly be allowed, given the programs
organizational structure* It reflects the degree of
potential connectivity. Third, the DATA BINDINGS\RELATIVE
PERCENTAGE aspect is the ratio (expressed as a percentage)
of actual data bindings to possible data bindings,
reflecting the normalized degree of realized connectivity
relative to potential connectivity.
See Chapter V for further discussion of the data
bindings metrics.
(43) Actual data bindings are (sub)categorized
deoencing on the invocation relationship between the two
segments. A data bindin3 (orq) is su_9fun&ti2! if
either of the two segments o or q can invoke the other,
whether directly or indirectly, as a "subroutine." A data
binding (p,rtq) is intpEnqtE if neither of the two
segments p or q can invoke the other, whether directly
or indirectly.
See Chapter V for further discussion of the data
bindings metrics.
(44) j 2g1ilzzy [McCabe 76) is a graph-
45
CHAPTER IV
theoretic measure of control-flow complexity. For an
imoLementation Language with only structured control
structures (sucn as SIMPL-T), this measure is dependent only
on the number of predicates (i.e.,t aoolean expressions
governing flow of controL) in the source code. The
cyctomatic complexity v(p) of a program P with 11
oredicates strewn among s segments is computed as
v(P) = + s
the cycLomatic complexity v(S) of a segment S with iT
oredicates is computed as
v (S) = + I .
See Chapter V for further discussion of the cycLomatic
complexity metrics.
(45) Four definitional variations of the basic
cyclomatic complexity measure were examined in this study in
orler to explore alternatives for identifying predicates and
for handling case statement constructs. Under the SIMPPRED
alternative, simple 9oolean subexpressions joined by and or
gr connectives are each counted as predicates. Under the
COWPPRED alternative, only each complete Boolean expression
is counted as a preaicate. Under the NCASE alternative,
each case statement construct is counted as contributing n
predicates, where n is the number of "cases" involved.
Under the LOGCASE alternativet each case statement construct
is counted as contributing LLog 2( n )J* predicates, thus
giving a discount for case statement constructs relative to
series of nested ifthenetse constructs.
See Chapter V for further discussion of the cyclomatic
conpLexity metrics.
(46) For each of the definitional variations, tne
CYCLOMATIC COMPLEXITY%...\TOTAL asoect measures the
* The notation L x J signifies the greatest integer lessthan or equal to x
46
CHAPTFR IV
cycLomatic complexity of the entire program. It is simply
the sum of cycLomatic co-nplexity values for the individual
segments comprising the program.,ee Cnaoter V for further discussion of the cyclomatic
co-mplexity metrics.
(47) For each of the definitional variations, the
CYCLOMATIC COMPLEXITY\...\#SEGS:CC>=1O aspect counts the
nu-nDer of segments in the program whose cycLomatic
complexity values equal or exceed the threshold value 10.
See Chaoter V for further discussion of the cyclomatic
complexity metrics.
(46) For each of the aefinitional variations, a common
descriptive statistic of the empirical distribution of
cycLomatic complexity values from the individual segments
comprising an entire an entire program was used as a vehicle
for measuring the general level of cyclomatic complexity
within the relatively nontrivial segments of the program.
This descriptive statistic, known as a ggul [Conover 71,
P). 11-32, po. 72-73], can be Loosely described (in the
discrete case) as the value (of the random variable in
q-lestion) corresoonding to a Particular fixed probability
level on the cumulative relative frequency curve
(representing the distrioution of that random variable).
The CYCLOMATIC COMPLExITY\,,,\ f QUANTILE POINT VALUE
asoects are defined to measure the Largest integer x such
tr3t the fraction of cyclomatic complexity values which are
less than x is Less than or equal to the fixed fraction
f , The CyCLOMATIC 1OM0LEXITY\,.,\ f QUANTILE TAIL AVERAGE
asoects are defined to measure the average of cycLomatic
complexity values greater than or equal to the f quantile
point value. Several particular quantites were examined in
this study: the n.5 quantiLe is closely related to the
distribution's median, and the 3.7, 0.8, and 0.9 quantiles
47
CHAPTER IV
nrovide a series of increasingly smaller tails of the
di s t r ihut ion.
See Chapter V for further discussion of the cycLomatic
complexity metrics.
(4i) According to software science theory [Halstead
77], several interesting quantities can be computed from the
soirce code of a program anO uSed to measure characteristics
of both the aostract algorithm and its expression as
imoLemented. ALL of these software sienae gu o-itie E are
conputed in terms of the number of conceptuaLly unique
"ooerators" and "operands" and the total occurrences of such
"ocerators" and "ooerands" within a program. In this study,
these "operators" and "ooerands" were identified
syntacticaLLy according to a set of rules established for
the implementation Language SIMPL-T.
See Chapter V for further discussion of the software
science metrics.
(50) Given the basic parameters of software science:
total "operator" count N1
total "operand" count N 2
unique "ooerator" count n1
unique "ooerand" count T 2
unique potential "operand" count Ti
the following formulas define the software science
ouantities examined in this study;
VDCAEzULARY T1 = r) + T 2
LE4GTH N =N 1 N21 + N2 n ) T1 * t (l )
FSTIATED LENGTH n (n 1 og2 (n 1 + ( 2 tog2 2
%IFFEREN CE(NR) = - N (N)
V L u4E V = N Log 2 (n)
P3TENTIAL VOLU= 2 * Log2 ( 2 + 2*)
Pfk3GOAM LEVEL L V* / V
DIFFICULTY D= 1 P L
CHAPTER IV
INTELLIGENCE CONTENT I = (2 / /1 I (n2 N 2) V
*DIFFERENCE(V*#I) = (i - V*I) I (V*)
LANGUAGE LEVEL X = L * V*
EFFORT E = V *
ESTIMATED TI"E T = E / S
ESTIMATED BUGS = L • E / E0
wrere S and E are psychologically determined constants*
See Chapter V for further discussion of the software
science metrics.
(51) Two different calculation methods were emoLoyed
in the study to compute the subset of software science
quantities whose exact values cannot be obtained directly
(via the defining formulas) from a program's source code.
These cdculation methods each rely upon a different
estination technique to obtain approximate values for these
quantities. The 1ST CALCULATION METHOD relies upon the
commonly accepted theoretical estimate of the orogram Level
quantity; the 2ND CALCULATION METHOD relies upon an
internally applied empirical estimate of the Language level
quantity.
See Chapter V for further discussion of the software
science metrics.
(99) Several instances of duplicate programming
asoects exist in the Table 1 Listing. That is, some
logically unique aspects reappear with another label, for
reasons explained aoove. Listed below are the pairs of
-luolicate programming asoects that were considerea in this
study:
le FUNCTION CALLS <=> INVOCATIONS\FUNCTION
NONINTRINSIC • NONINTRINSIC
it INTRINSIC <=> INTRINSIC
4e STATEMENT TYPE COUNTS%(PROC)CALL <=> INVOCATION S\PROCEDUkE
NONINTRrustC <=> NONINTRINSIC
CHAPTER IV
114'TPINSIC (=> IN TR INSI C
'.AVERAGE INVOCATIONS DER(CALLING) SEGMENT\ < AVERAGE INVOCATIONS PERNONINTRINSIC (CALLED) SEGMENT
SO'LFTWARE SCIENCE SOFTWARE SCIENCEOUANrITIES\INTELLIGENCE <=> QUANTITIES \1ST CALCULATIONCONTENT mETHOD\POTENTIAL VOLUME
-?y definition, the data scores obtained for any pair of
du~ticate aspects will be identical, and thus the same
statisticaL conclusions will be reached for both aspects.
CHAPTER V
This chapter orovioes an in-depth discuSsion of the
elaoorative programming aspects examined in the study. The
material is presented, in a tutorial fashion, in order to
rotivate their aopeal as software metrics and to explain how
they might be interpreted. The reaoer who is acquainted
with one or nore of these measures might consider skimming
the corresponding sections.
_0 r_ r,* Changes
The program changes metric pertains to textual
revisions made to program source code during development?
fr3m the time a program is first presented to the computer
system, to the completion of the project. The metric's
definition is framed so that one program change approximates
one conceptual change to the program. The following rules
4or ioentifying program changes are reproduced from
[Dunsmore 78, pp. 19-20]:
"The following text changes to a program represent oneprogram change:
1. One or more changes to a single statement.(Even multiple character changes to astatement represent mental activity with onlya single aostract instruction.)
2. One or more statements inserted between existingstatements.
(The contiguous group of statements insertedprobably corresponds to the concretestatements that represent a single abstractinstruction*)
3. A change to a single statement followed by theinsertion of new statements.
IThis instancp probably represents adiscovery that an existing statement isinsufficient and that it must be altered andsupplemented by additional statements toimplement the abstract in;tructioninvolved.)
"however, the following text changes to a program arenot counted as pro;ram changes:
CHAPTER V
1. The deletion of one or more statements.(Deleted statements must usually be replacedby other statements elsewhere. The insertedstatements are counted; counting deletions aswell would give double weight to such achange. 3ccasionaLLy statements are deletedbut not replaced; these were probably beingused for debuiging purposes and theirdeletion requires Litte mental activity.)
2. The insertion of standard output statements orinsertion of soecial compiter-provioed Oebuigingstatements.
(These are occasionally inserted in awholesale fashion during debugging. when theproblem is found, these are then all removed,and the necessary program change takesplace.)
7. The insertion of blank lines, insertion ofcomments, revision of comments, and reformattingwithout alteration of existing statements.
(These are all judged to be cosmetic innature. )"
Fro~ram changes are counted aLgorithmically by comparing the
source code from each pair of consecutive compilations of a
module (or Logical predecessor thereof) and applying the
identification rules. Thus the total number of program
ch3nges is a measure of the amount of textual revision to
source code during (oostlesign) system development.
The program changes metric may be interpreted as a
Dr~ojrdming complexity measure, because textual revisions
7--p usuatly necessitated by errors encountered while
tuiLding, testing, and debugging software. Independent
research [Dunsmore & Gannon 77) has demonstrated a high
(r3nk order) correlation between total program changes (as
counted automatically according to a specific algorithm) and
total error occurrences (as tabulated manually from
exhaustive scrutiny of source code and test results) during
ssftware implementation in the SIMPL-T programming Language.
Thus empirical evidence justifies consideration ot the
DrOgram changes metric as a direct measure of the relative
nuioer of programming errors encountered outside of design
work. It is reasonable to assume that each textual revision
ert ;its some expenditure of the programmer's effort (e.g.,
nlinning the revision, editing source code on-line), In
FV
CHAPTER V
th t sense, this metric may also be considered an indirect
measure of the level of human effort devoted to
imo tement at ion.
Control-flow complexity may be measured in terms of
cyctomatic complexity ['cCabe 76), a graph-theoretic metric
that is independent ot physical size (i.e., insertion or
deletion of function statements Leaves the measure
unchanged) and dependent only on the decision structure of a
program. The cyctomatic number v(G) of a graph G having
n nodes, e edges, and p connected components is
defined as
v(G) z e - n * o
In a strongly connected graph, the cyctomatic number is
equal to the minimum number of basis paths from which all
other oaths may be constructed as linear combinations in an
edge-atgebraic fashion (see [M cCabe 76] for details). By
mojeLing the control flow of a program as a graph in the
traditional manner, the cyctomatic complexity measure is
defined to be the cycLomatic number of the graph
corresponding to the program's flow of control.
For a structured Language Like SIMPL-Tt it is not
necessary to construct a control-flow graph in order to
measure a program's cycLomatic complexity. The measure can
be computed directly from the source code simply by counting
the number of oredicates (i.e., Boolean expressions
governing control flow), since the predicates of the program
correspond exactly to the binary-branching decision points
of the controt-flow graph. It is easily shown, using a
temma proven in (ills 7Z , that for a segment S with It
credicates the segment's cyclomatic complexity is
53
CHAPTER V
v(S) Tr .1
and f(jr a royram P with II predicates strewn among s
segments the program's cyclomatic complexity is
v(P) = + s o
This measure originated as an absolute count of the
maximum number of Linearly independent execution paths
throuvh a segment, in the graph-theoretic edae-algebraic
sense alluded to above. Since each of these paths merits
individual testing, the measure was proposed to serve as a
auantitative indicator of the difficulty of testing a given
segment to a certain aegree of thoroughness. Testability is
clearly an issue closely related to software complexity in
-eneral, and a program's cycLomatic complexity may be viewed
as one quantitative measure of its control-structure
complexity.
Defin itiona I Variations
Several variations of the basic cycLomatic complexity
neasure were considered, because there are at least two
cefinitional issues for which intuitively motivated
alternatives lead to meaningful variations.
One of these issues is the weighting given to instances
of case statement constructs. The original definition of
cycLomatic complexity views a case statement as the
semanticatty equivalent series of nested ifthenelse
statements: each case statement contributes n units of
cyclomatic complexityg where n is the number of individual
@Icases" involved. It can be argued, however, that a case
st3tement deserves a smatler contribution to cyctomatic
coaplexity since its inherent uniformity and readability
have a moderating effect on programmer-perceived complexity
(relative to an explicit series of nested ifthenelse
54
CHAPTER V
statements). One reasonable alternative views each case
statement as contributing LLog 2 C n )J* units of cycLomatic
co-nptexity, where n is the number of inaiviouaL "cases"
involved. This logarithmic weighting is anpropriate since a
case statement's moderating effect seems to increase with
the number of "cases" involveo.
The other issue is the manner of counting predicates.The original definition counts simple (sub)predicates
iniividually, so that the compound predicate
(I < J) in ((A(I)= A(J)) 2r (ngl SORTED))
woull contribute three units of cyclomatic complexity, for
example. An alternative definition considers each complete
oredicate as an indivisible part of a program, contriouting
one unit of cyclomatic complexity. The motivation is that
the complete predicate represents a single abstract
coidition governing the flow of contro(. Note that this
issue is the basis for a proposed extension [Myers 77] to
the original cyctomatic complexity measure. This issue also
affects the way individual "cases" of a case statement
construct are identified and counted. The original
definition counts each case Label separately, since multiple
case laoels on the same case branch are semantically
ecuivatent to simple predicates joined by ors to form the
goalean expression governing the case branch. The
alternative defintion counts only the case branches
themselves, regardless of case label multiplicity. In
parallel with the motivation given above, multiple case
la:ets on a case branch represent a single abstract
condition governing that branch (e.g., the set of case
ta:eLs Q, I , ... , 2 may be abstracted to d .Lg).
This study examined the four variations of cyclomatic
* "ne notation L x-J signifies the greatest integer lessthan or eauat to x
CHAPTER V
coiplexity defined as follows for the SI4PL-T programming
(an jud ie :
SIMPPPED-NCASE -- Simple predicates contribute 1 unit;
case statements contribute 1 unit for each case
Label.
SI.4PPRED-LOGCASE -- Simple predicates contrioute 1
unit; case statements contribute LLog 2( n )J
units, where n is the number of case Labels.
COMPPRED-NCASE -- Compound predicates contribute 1
unit; case statements contribute 1 unit for each
case branch; multiple case Labels on the same case
branch are disregarded.
COMPPRED-LOGCASE -- Compound predicates contribute 1
unit; case statements contribute LLog 2 ( n )j
units, where n is the number of case branches;
multiple case labels on the same case branch are
disregarded.
Note that the SIMPPRED-NCASE variation of cyclomatic
complexity is McCabe's original measure.
There are several ways to apply the cyclomatic
complexity measure (or variations thereof) to an entire
orogram in order to obtain a metric for its overall control-
flow complexity. First of all, the metric is defined
directly for a program composed of individual segments: a
program's total cycLomatic complexity is simply the sum of
its segments' cyclomatic complexities. However, this total
cyclomatic complexity measure is not particularly useful as
a oasis for comparing entire programs because it is, in a
certain sense, insensitive to the program's modularization.
As a metric, the total cyctomatic complexity of a program is
(oy definition) a linear function in two variables, tne
nuioer of predicates ann the number of segments. A subtle
56
hi -
CHAPTER V
tr3de-off relationship exists between these two variaoLes,such that substantial fluctuation in the metricos value can
arise from simpleminded changes to a program'smoJuLarization atone.
A better comparison of entire programs is afforded by
focusing attention upon the cyclomatic complexity values ofindividual segments and upon instances of segments with high
values of the metric. MCCabe originally proposed the number
!' as a reasonaole thresholo value for a segment's
cycLomatic comolexity. Segments exceeding this threshold
need to be recoded or decomposed into smaller segments inorder to attain an acceptable Level of testability ano
co'ntrol-fLow complexity. Hence, a second way to apply the
CycLomatic complexity measure to an entire program is to
count the number of segments whose cycLomatic complexity
value exceeds this threshold. In this case, the basis for
covparing entire programs is the frequency of segments with
unacceptably high cyclomatic complexity.
Finally, it would be desirable to compare the fullspectrum of cyclomatic complexity values for the individual
segments of one program against that of another programt
since consideraote diversity often exists. Programs
typically contain several small segments with very LowcycLomatic complexity values (e.g., a function to compute
the dvera.ge of a vector) and a few large segments with high
cycLomatic complexity values. Being easily understood and
testea, the small segments are relatively inconsequential,
while the Large segments contain the substance of the
orogram and contribute most of the consequential control-
flow complexity. IdeaLly, one wishes to disregard the"svaLLer" cycLomatic comolexity values and summarize themagnitude and frequency of the "larger" cyclomatic
covplexity values via a single quantitative indicator, but
57
CHAPTER V
o so in a flexibLe, normalized fashion, where "smatter" and
"larger" are determined relative to one another within each
orogram.
This ideal can be aoproximated by means of the
quantiLes of the empirical distribution of cyclomatic
conplexity values across the segments comprising the program
(see Figure 1). QuantiLes are a standard tool from
descriotive statistics [Conover 71, pp. 31-32, pp. 72-73],
covmonly used to summarize the "shape" and "position" of a
listribution function, especialLy its upper tail region.
Both the quantile Point value (i.e., the largest integer x
such that the fraction of cycLomatic complexity values which
are Less than x is less than or equal to some fixed
fraction) and the quantile tail average (i.e., the average
of cycLomatic complexity values greater than or equal to the
nuantiLe point value) are normalized ways to quantify just
how high the cyclomatic complexity is for the relatively
nontriviaL segments of a program. Several different
Quantiles were examined: the 0.5 quantiLe is closeLy related
to the median of the distribution, and the 0.7, 0.8, and 0.9
ouantiles provide a series of increasingly smatler tails of
the distribution. Thus, the basis for this third comparison
of entire programs is a series of quantitative descriptors
of the empiricaL distribution of cycLomatic complexity
values within a Program.
The data bindings metrics [Stevens, Myers & Constantine
74; 7asili & Turner 75; Turner 76] originated as a way to
quantify a certain kind of connectivity (i.e., directed
cowmunication between segments via global variables) within
a orogram. Their motivation is based on the intuitive
58
FIGURE 1
figure 1. Fregumencx 2is1tW1-J2n 2f_ US12MI~o~Zif 12!!21Ei1y
6cth the absoLute and the relative-cumulative frequencydistribution of cycLomatic comoLexity values from 47seguents comprising an entire program are plotted. The tailregion associated with the 0.8 quantiLe is shaded on eachp tot.
---------------------- 0.5 quantite15 *
/\ :0.7 ouantite
a f a ----------------- 0.8 quantile
b r / \ V "
s e 10 / - -- - 0.9 quantiLe
l u /Iutn *e 5 *-*
y/
i ." " a S I I I
0 2 4 6 8 10 12 14
cyctomatic complexity per segment
0 2 4 6 8 10 12 14
1.0 I0.80.8 ------------ >
c f 0. 7 --------- '-"r u f
L U qa L U 0.5 - >t a ei t n 0.4v i ce v y 0.3
eC.2
0.1
G.0
0 2 4 6 8 10 12 14
cycLomatic complexity per segment
58a
CHAPTER V
orinciole that the Logical complexity of a composite system
is a function of the muttiolicity of connections amon its
component parts (cf. [Simon 693).
A segment-gLobal-segment dta tinding (p,r,q) is an
occurrence of the following arrangement in a program: a
segment p modifies a global variaole r that is also
accessed by a segment q , with segment p different from
segment q . The existence of a data binding (pvrvq)
suggests that the behavior of segment q is probably
deoenodent on the performance of segment p through the data
variaote r , wnose value is set by p and fetched by q
The binding (p,r,q) is different from the binding
(I,rp) which may also exist; occurrences such as
(Drp) are not counted as data bindings. Thus each data
binding represents a unique communication path between a
pair of segments via a gtobal variable. As a metric, the
total number of segment-gLobat-segment data bindings
reflects the aegree of that kind of connectivity within a
orogram.
Data bindings may be counted in three different ways:
actual, possible, and relative percentagee (iear in mind
that, since these measures are determined statically from
the source code, the terms 'actual' and 'possible' refer to
a orogram's syntactic form only.)
First, the A count is the absolute number of data
bindings (p,r,q) actually coded in the program: segment
p contains a statement modifying global variable r , and
segment q contains a statement accessing r . This count
of actual data oindings represents the degree of realized
connectivity in the program. Second, the Rq11ibjt count is
the absolute number of data bindings Cpr #) that could
possibly be allowed under the program's structure .)f segment
59
CHAPTER V
definitions and global variable declarations: the scope of
7L0oal variable r merely contains both segment P and
segment Q , So that segment o could potentially mooify
r and segment q could potentially access r . This
coint of possible data bindings represents the degree of
potential connectivity, in a "worst-case" sense. It is
computed as the sum of terms s* (s-1) for each global
variable, where s is the number of segments in that
IltoaL's scope; thus, it is heavily influenced (numerically
5.:eaking) by the sheer number of segments in a program.
',ir-z, the _re ative grCjC! e is a way of normatizin; the
aosotute numbers of data bindings, since it is simply the
Quotient (expressed as a percentage) of actual data bindings
divided by possible data bindings. It represents the degree
of realized connectivity relative to potential connectivity.
Actual data bindings may also be subcharacterizecd on
the basis of the invocation relationship between the two
segments. A data binding (p,rq) is l aflsncgntJ if
either of the two segments p or q can invoke the other,
whether directly or indirectly (via a chain of intermediate
invocations involving other segments). In this situation,
the functioning of the one segment may be viewed as
contributing to the overall functioning of the other
segment. A data binding (ptrq) is 1_n g..!2gCO.t if neither
of the two segments p or q can invoke the other, whether
directly or indirectly. The transitive closure of the call
graph among the segments of a program is employed to make
this distinction between suofunctionaL and independent data
bindings.
In some sense, all three measures dealing with segment-
globaL-segment data bindings--actual, possible, and relative
cercentage--reftect Quantifiable characteristics of a
orogram's "data modutarization" (i.e., the static
60
CHAPTER V
")r~dnizdtinn of Jatd detinitions anu references within
segments and modules).
In particular, the oossibte data bindings metric
reflects the yeneral degree of encapsulation enforced by the
imoLementation Language for global variables. One can
imagine two extremes of encapsulation for the same
collection of global variables and segments. On the one
hand, the program could be written (in the implementation
language SIr4PL-T) as a single module containing alL tne
segments, with each glooal potentially accessible from every
segment. This moduLarization would maximize (explosively
so, due to the squaring of the number of segments) the
number of possible data bindings. On the other handt the
Program could be written (in the implementation language
SIMPL-T) as several modules, one for each segment, with
appropriate ENTRY and EXTernal declarations to provide each
segment with potential access to exactly those gLobals it
actually uses. This modularization would minimize the
number of possible data bindings (to precisely the number of
actual data bindings).
Moreover, the data bindings relative percentage metric
also reflects the general degree of (operational)
"globality" for the global variables declared in a programt
i.e., the extent to which gLobals are actually modified
(set) and accessed (fetched) by those pairs of segments that
couLd possibly do so. One can imagine two different
situations in which the relative percentage of data bindings
for a small set of otherwise equivalent global variables
(say, an array and an integer) would be extremely high and
extremely low, respectively. On the one hand, this global
array and global integer could be serving as a stack, and
nearly every segment that refers to these glooals could be
botn popping the stack to examine its contents and pushing
61
CHAPTEP V
ned items onto it. Here the two global variables are quite
central to the overalL operation of that coLlection ot
segments; their data binding relative percentage woulu be
close to one. On the other hand, this global array and
global integer could be serving as a buffer for a varying-
length vector that is initially oroduced (set) by one
segment and nondestructively consumed (fetched only) oy
several Other segments. Here the twO globaL variables are
rather incidental to the overall operation of that
collection of segments, serving merely as a convenient
medium for disseminating information (which could also have
achieved via parameter passing); their data binding relative
vercentage would be close to zero.
The software science quantities are a set of metrics
based upon the tenets of software science (Halstead 77), as
Pioneered by Halstead and his colleagues. Billed as
a branch of experimental and theoretical science
dealing with the human preparation of computer programs
and other types of written material ... ,'
software science is concerned with measurable attributes of
algorithms or programs and with mathematical relationships
among those attrioutes. Software science is
characteristically actuarial in nature: its measures and
relationships may be inaccurate when applied to individual
proirams, but they become surprisingly more accurate when
apulied to Large numbers of orograms, such as are found in
large software development projects.
The software science quantities are all defined in
The blocked quotations throughout this section are taken4rorn (Halsteaa 77].
CHAPTER V
terms of certain frequencies of so-called "operators" and
"ooerands" aopearing within an atgorithm's functional
specification or a PrOgramos source code implementation.
Sove uf the nuantities (e.g., vocabulary, Length, volume)
are purely descriptive and provide the building-blocks of
the theory. A few (e.g.t estimated Length) are predictive
of other descriptive quantities within the theory. Several
(e.g., program Level, language Level, effort) claim to be
quantifications of fundamentally qualitative and intuitive
concepts. Stitl other Quantities (e.g.# estimated time,
estimated bugs) purport to measure--under ideal conditions--
externally observable and Quantifiable programming
Phenomena.
f!gntificati on Criteria
The criteria for identifying "operators" and "operands"
(and their uniqueness) are important since they are the
foindation for measuring the software science quantities.
Hodeyer, this identification is an area not clearly
aoressed by the theory. Except for the Fortran programming
tanguaje, in which most of the pioneering work was done,
exoLicit standards or guidelines for identification do not
exist. For another Language, a researcher can only attempt
to adapt and extend the principles that he personally judges
to oe behind the Fortran work. The following "operator/
operand" identification criteria were designed for the
SIMPL-T programming language:
1. In general, only the portion of source code pertaining
to executabte statements (after expansion of all
DEFINE-macros) is considered.
2. Constants and data variable identifiers are natural
"operands." Data structures (e.g., arrays, files) are
considered single o~jects and not decomposed into
c oioonen t s.
63
CHAPTER V
3. The input stream file and the output stream file are
counted as "ooerands," with implicit occurrences
recognized for each operation on these files.
4. The normal prefix unary and infix binary operator
symbols (i.e., for arithmetic, Logical, and character-
string operations) are natural "operators."
5. The intrinsic procedures (e.g., READ, WRITE, REWIND),
type-coercion functions, and input-output operation
keywords (e.g., EJECT, SKIP) are "operators."
6. Segment invocations (i.e., procedure calls and function
references) are "operators."
7. Different types of statements or constructs are
considered individual "operators," as follows:
(assignment)
IF...THEN.oEND
IFe.THEN...ELSE* .END
CASE. .OF...END
CASE. .OF..,ELSE. ..END
WHILE. ,.DO..,E4D
EXIT
RETURN
9. 0ther delimiter patterns are considered "operators," as
foLo. ws:
(caselabeL designation "operator")
coo *I (partword and substring "operators")
(..) (suoexpression, array subscript,
actual argument list, and function
return value "operators")
9 (List item separation "operator")
9. FinaLly, implicit statement List brackets (associated
with pairs of keywords such as THEN...ELSE and
ELSE...END) are considered "operators," as are implicit
statement separators between consecutive statements of
the same statement list.
(The quotation marks flagging "operator" and "operand" as
64
CHAPTER V
technical terms are suppresseu throughout the remainder of
this section for readability.)
The five basic parameters of software science are
determined in accordance with the criteria established for
iontfying operators and operands.
The theory defines four basic parameters pertaininy to
a orosram's implementation: the total operator count N 1 ,
the total operand count N, 9 the unique operator count
n 1 and the unique operand count n 2 * The total counts
include aLl occurrences of operators/operands, while the
unique counts disregard iultipte occurrences of the same
operator/operand. Although the issue of synonymy for
operators has already been dealt with in the identification
criteria, issues of synonymy for operands still remain. In
particular, formal parameters are considered to be
synonymous with corresponding actual arguments; therefore
occurrences of formal parameter identifiers contribute to
the total operand count but not to the unique operand count,
with respect to the entire program. This rule is not,
however, applied in the case of formal parameters passed by
value and modified by the segment; because these are
actually treated as special initialized-upon-entry Local
variaotes in the implementation language SIMPL-T, they are
not considered to be synonymous operands with respect to the
entire program.
The theory defines four additional basic parameters,
analoyous to those described above, but pertaining to an
algorithm's or program's "shortest possible or most succinct
form" (i.e., its "one-Liner" functional specification,
conceived as an assignment statement or procedure call
65
T
CHAPTER V
invoLving a single ouitt-in routine). These are the total
potential operator count N * , the total potential operand
count N 2 * , the unique potential operator count l * , and
the unique potential operano count 2* (The modifier
s*potential" and the superscripted asterisk distinguish
quantities pertaining to the functional specification from
analogous quantities pertaining to the implementation.) The
theory assumes, however, that the total potential operator/
operand counts must always equal the unique potential
perator/ooerana counts, because the most Succinct
;Qecification wouLd contain no redundant occurrences of
operators/operands. An assumption is also maoe that the
unique potential operator Count is always equal to the
coistant 2, because
"... the minimum possible number of operators .. must
consist of one distinct operator for the name of the
function or procedure and another to serve as an
assignment or grouping symbol."
Thus, all software science quantities pertaining to a
-)r-gram's specification are completely determined
Fnimerically speaking) by a single parameter# the unique
notentiat operand count n * It is the fifth basic2carameter of software science theory and is conceptually
equivalent to the number of "logically distinct input/output
narameters" for an algorithm or orogram. This count nolds
considerable significance in both the theory and its
apolication, but unfortunately it is rather intractable for
most nontrivial programs (i.e., those whose specifications
are not easily stated as "one-liners" without gross
oversimplification). For example, some logically distinct
innut parameters may appear as special constants embedded
within the code, ana the number of logically distinct output
rara-ieters represented within a printed report is often
unclear.
66
CmAPTEP V
An alternative and iore tractable conceptualization
defines this uninue potential operand count simply as the
numuer of distinct operands busy-on-entry (i.e., initially
containing a value that is utilized or accessed by the
algorithm or program) plus the number of distinct operands
busy-on-exit (i.e., finally containing a value that was
furnished by the algorithm or program to be utilized or
accessea suosequently). For an individual segment, 2
may be estimated from the implementation by counting all of
the global variables that are referenced, each of the formal
Darameters, one for both the input stream file and the
output stream file (if they are read or written), and one
for the function return value (if the segment is a
function). It shouLd be noted that this estimate is a lower
boind since it disregards the oossibility that a format
parameter which is passed by reference should be counted
twice because it is both busy-on-entry and busy-on-exit.
For an entire program, fl * may be estimated from the
imoLementation by counting ore for both the imput stream
file and the output stream file if the program reads or
writes them, plus one for the set of control bits or option
letters that might oe used to regulate the program's
execution.
Thus, for the programs examined in this study, the
esti-ated number of unique potential operands is always
either 2 or 3, depending on whetheroutput listing compileand~execute( inputdeck )
or
outputlisting
compileand execute( inputdeck, option letters )
states their functional specification. It is clear that
this kind of estimation of n 2* from the imolementation is
considerably more accurate for individual segments than for
entire programs; this fact is partial motivation for the
67
CHAPTER V
particular estimation techniaue employeL the second
methot! of calculation discussed below.
The derived properties of software science are defined
in terms of the five basic parameters.
The vocabulary fl is defined as
) = I 1 + T2
inI renresents the cardinatity of the set of Logically
distinct "symbols" used to implement the program. The
Length N is defined as
N = N N2
ani represents the abstract size of the program-s
implementation as measured in units of Logically distinct
"symbols." This property is closely associated with the
numoer of syntactic tokens in the source code of a program;
it can be considereo a refinement of the rudimentary TOKENS
isDect. The estimated Lenth R is defined as
q (1 * Log 2 ( 1 )) + (2 Log 2( 2 ))
-eflecting one of the fundamentaL conjectures of the theory;
namely, that the observed length of a program's
iDlementation is a function solely of the number of unique
operators/operands involved.
Considerable empirical evidence has supported the
vatidity of this "length prediction equation" on tht er
fi.e., major software studies have reported correlation
coefficients of between 3.95 and 0.99 for the relationship
het.een N and q [Fitzsimmons & Love 78)). HOwever, its
accuracy for tny livern E rr may be low; the theory
attrioutes this to the presence of so-called "impurities"
tnl.icating a Lack of polish in a program. These impurities
include instances of unnecessary redundancy and needless
CHAPTER V
constructions, such as inverse operations that cancel each
other, common subexpressions, or unreachable statements.
This has Led some researchers to view the discrepancy
between N and R as a possible software quality measure.
For these reasons, it was desirable to examine the
YDIFFEPENCE(N,n) aspect, calculated as
(IX - NI) / (N)which normalizes the degree of discrepancy.
The voJum V is defined as
V = N * Log 2 (n
and represents the abstract size of the program's
imolementation as measured in units of information-theoretic
bits. Specifically, it is the minimum number of bits
requirecl to encode the implementation as a sequence of
fixed-width binary strings (since it is the product of the
total number of "symbols" and the minimum bandwidth required
to distinguish each of the unique "symbols"). The 2210tiat
vYu_"e V* is defined analogously as
V* = N* * Log2 (*)
= (NI* + N2 *) * log 2 (n 1 * + T 2 )
= I + n 2*) * Lg 2 ( 1 * + n2(2 + n 2*) * Log2(2 + T2.) .
The potential volume of any algorithm or program is
theoretically indeoenOent of any Language in which it might
be implemented; thus,
"Orovided that r) * is evaluated as the number of2conceptually unique operands involved, V* appears to
be a most useful measure of an aLgorithm's content."
The Pro.ar tj v L is defined as
L Z V* / V
-vn, as a ratio of volumes, can only take on values between
iero ana one; it quantifies the intuitive concept of "level
datstraction" for an inplementation. Since the potential
69
CHAPTER V
volume of any given algorithm is constant, the formula
inlicdtes an inverse relationship (as desired intuitively)
bet.een Level of abstraction (measured by L ) and size
(measured by V ). The theory also attaches meaning to the
reciprocal of program level, defining diffiC Jty D as
which may alternatively be viewed a- the amount of
rejundancy within an impLementation.
unfortunately, this definition of program Level is not
OdrticuLarly useful since it is difficult (as discussed
aoave) to determine exactly a program's unique potential
operand count or its potential volume V* . Desiring
to be able to measure program Level even if these quantities
were unavailable, Halstead conjectured that an estimateg
2r2era j_ C , defined as
C (= * / nlI * / N2 )
= (2 / ) * ( I N ) ,
could be measured directly from an implementation alone,
without a specification or the unique potential operand
count l 2 * * With only limited evidence supporting the
validity of this estimate, the theory makes the qualified
claim that
,.. for many purposes L and C may be used
interchangeably to specify the level at which a program
has oeen implemented, at Least for smaller programs."
%Lthough most software science studies (e.g., CElshoft 76;
Love & Cowman 76; Curtis et al. 79)) have had no choice but
to rely upon this "program Level prediction equation" to
calculate the derived properties, the program level
estimator C has been criticized pubticaLLy COldehoeft 77]
for unsatisfactory behavior under certain conditions. The
ouestionable validity of this prediction equation is the
principal motivation for considering the two alternative
metnous of calculation discussed below.
70
CHAPTE.R V
In any event, the theory employs C internally,
defining in~j~jj2tD i IQDatOl 1 asI = C * V
and oroposing it as a measure of "how much is said" in an
aLgorithm or program (i.e., its information content). This
intelligence content quantity represents the amount of
detail expressed in an implementation but weiGhteo by its
level of expression. ey definition, I is determinaote
from an impLementaion aLone. If there is a strong
relationship between L and C , intelligence content I
would be approximately equat to potential volume V* * In
fact, Hatstead originally demonstrated that the removal of
program "impurities" (as described above) consistently
imoroved the numerical agreement between V* and I *
Normalizing the degree of discrepancy between these two
Quantities, the %DIFFERENCE(V*,I) aspect, calculated as
(II - V*|) / (v*)
may be interoreted as another possible software quality
measure, according to the theory.
The ta eL X is defined as
Lk = V*
X L L * V
V* V* I V
and claims to quantify the popular intuitive concept known
by the same name. The theory suggests that X should
raiain relatively constant for any particular implementation
Language white the implemented algorithm itself is allowed
to vary. Empirical evidence from a carefully constructed
set of programs, each implemented in several common
crogramming tanguages, indicated that the ordering of mean
values for X (which ranged from about 0.8 for assemoly
Language to about 1.6 for PL/1) concurred exactly with the
generally accepted intuitive ordering of the Languages
! hemse t yes.
CHAPTER V
The effort E is definea as
E =V *D
V L
= V * V I V(
but this Quantity does not purport to measure development
effort in the usual sense. Rather, the theory originally
restricted
S.g.. the concept of programming effort to be the mental
activity required to reduce a preconceived algorithm to
an actual implementation in a Language in which the
imptementor (writer) is fluent ...s
According to further elaooration of the theory [Gordon 793,
this property represents the effort required (under iueat
conditions) to comprehend an implementation rather than to
produce it; E may tnus be interpreted as a measure of
program clarity. The effort property is considered to have
the dimension either of bits or of "elementary mental
discriminations." Borrowing from research in psychology#
the theory converts this amount of mental effort into an
externally observable duration of time, defining the
estiated lime T asT" = E / S
where S is the so-called Stroud rate, i.e., the numoer of
"elementary mental discriminations" made by a programmer
(comprehender) per second. Psychologists had shown that
5 < S < 20 and Halstead determined empirically that S = 18
was a reasonable value.
Finally, the theory purports to quantify one other
Pxternally observable property, namely, the total number of
"oelivered" bugs in an implementation. The ttj1i*Sj. j
property is defined as
S=L E / E0
VIE0
where E is defined as "the mean number of elementaryo
72
CHAPTER V
mental discriminations between potential errors inprogramming." The theory argues that E 3000 is a
0reasonable value. This number of bugs may be interpreted as
either the expected number of errors remaining in a
delivered program or the number of errors observeo during
program testing; both interpretations have received some
empirical support.
3ecause the validity of the "program level prediction
equation" is suspect (as discussed above), this study
emoloyed two aifferent methods for calculating software
science quantities: one relies directly upon this estimate,
the other does not.
ooth methods calculate exact values for some derived
properties via their defining formulas directly from the
imolementation's basic parameters* The methods are
therefore identical with regard to the following measured
aspects: VOCABULARY, LENGTH, ESTIMATED LENGTH,
%DIFFERENCE(NdA), VOLUME9 INTELLIGENCE CONTENT* and
ESTJ'AATED BUGS. 3ut, because reasonable values for unique
cotential operand counts are generally unavailable (from
either the specification or the implementation) for programs
of the size considered in this studyt both methods of
calculation can only approximate the remaining derivea
properties by relying upon various estimates. Due to the
intrinsically high degree of interrelationship among the
softare science quantities, it generally suffices to
approximate just one additional derived property via some
estimdtion technique; the remaining derived properties can
then all be approximated in turn via their defining formulas
from the known exact values plus the estimated value. The
methods therefore differ in their choice of Quantity to be
737
CHAP T ER V
estiTated, in their estimation technique, and with rejard to
the following measured aspects: PROGRAm LEVEL, DIFFICULTY,
'DTEI!TIAL VOLUME, %DIFFERENCE(V*,l), LANGUAGE LEVEL, EFFORT,
and ESTIMATED TIME.
The first method relies upon a "theoretical" estimation
of the program Level quantity. The estimated program level
C is calculated directly from the program-s implementation
vi3 its delining formula and then substituted as an
aporoximation for the (true) program Level L . Under this
methoo the exact value for intelligence content I is, by
definition, always equal to the approximate value for
ootential volume V* ; hence it is pointless to examine the
'YDIFFEPENCE(V*,I) aspect under this method of calculation.
The second method relies upon an "empirical" estimation
of the language level quantity. A programos language level
is approximated as the mean value of estimates for tIe
language Levels of the segments comprising the program. An
estimate of each segment's Language level can be calculated
directly from the implementation (via the defining formulas
for X , , and V ), using an estimate of the segment's
unique potential operand count Tj 2 in addition to the
exact values of the segment's other basic parameters N1 ,
N 2 , T11 , an-d r 2 . The unique potential operand estimate
is obtained by counting operands that are busy-on-entry or
tusy-on-exit (as discussed above); this technique seems
Quite reasonaole when applied to segments, most of which are
smatt enough. Use of the mean estimate for X across the
individual segments of an entire program was inspired by the
exoerimental treatment of Language level given in Halstead's
book. Under this method of calculation, all of the derived
croperties defined above are distinct and nonredundantly
calculated.
74
CHAPTER VI
This chapter describes the Steps taken to guide the
olanning, execution, and analysis of the experimental
investigation reported in this dissertation. The
investi3ative methodology outlined here was devised as a
vehicle for research in software engineering. It relies
upon established principles and techniques for scientific
research: emoiricat study, controlled experimentation, and
statist ica ( analysis.
The central feature of the investigative methodology is
a "differentiation-among-groups-by-aspects" paradigm. The
research goal is to answer the question, what differences
exist among the treatment groups (which represent different
programming environments) as indicated by differences on
measureo aspects (which reflect quantitative characteristics
of soft.are phenomena)? This use of "difference
discrimination" as the analytical technique dictates a
statistical model of homogeneity hypothesis testing that
influences nearly every element of the investigative
met hoaoloqy.
Other analytical techniques could have been employed:
estimation of the magnitude of differences between
experimental treatments#
correlations between measured aspects across all
exaerimnental treatments,
multivariate analysis (rather than multiple univariate
analyses in DaralLeL, as is the case here), or
factor analysis (breakoown of variance in one aspect
among the other measured aspects),
to name a few examples. These are useful techniques and may
be used at a Later time to answer other research questions.
75
CHAPTER VI
For tne present investiojtion, difference discrimination was
Chosen as a reasonable "first-cut" probe of the empirical
data collected for the research project; by taking this
conservative approach, information may be obtained to help
guide more refined probes in the future.
ALthough the methodology is built around running an
exneriment, collecting data, and making Statistical tests,
these activities (i.e., the execution phase) Play a small
rote within the overall investigative methodology, in
comparison to the planning and analysis phases. This is
readily apparent from the schematic in Figure 2, which
charts some of the relationships among the various elements
(or steps) of the investigative methodology. Another
feature of the investigative met hodology is the careful
distinction made during the analysis phase between objective
results (the empirical scores for the metrics and the
statistical conclusions they infer) and subjective results
(interpretations of the objective results in Light of
intuition, research goals, etc.).
The remainder of this chapter outlines the overall
methoa by defining each step and discussing how it was
apolied. Further details of certain steps are given within
other chapters of the dissertation, as follows:-tep 5 Research Frameworks Chapter VIII
Step 6 Experimental Design Chapter III
Step 7 Collecteo Data Chapter III
Step 10 Statistical Conclusions Chapter VII
Step 11 Research Interpretations Chapter VIII
Step 1: 29fios 2f 1I1r1s
severaL questions of interest were initiated and
refined so that answers wight be given in the form of
76
,FIGURE 2
C3 .Z -
ua
Q. *
Ij,.4,o
- a
4'0
" I *.- ,, ;I I
I , , o
at
-.
. . . . . . . . . . . .
S76a
WON --
CHAPTER VI
statistical conclusions and research interpretations.
Cuestions were formulate on the basis of several concerns:
(1) software oeveLopment rather than software maintenance,
(2) a desire to assess the effectiveness of disciplined team
proiramming, in comparison to undisciplined team programming
and individual programming, (3) quantitatively measurable
asoects of the process and product, and (4) the analytical
tecnnique of difference discrimination. The questions of
interest took the final form, "During software development,
what comparisons between the effects of the three
programming environments,
(a) individual programming under an ad hoc approach,
(b) team programming under an ad hoc approach,
(c) team programming under a disciplined methodology,
appear as differences in quantitatively measurable aspects
of the software development process and product?
Furthermore, what kind of differences are exhibited and what
is the airection of these differences?"
Step 2: Research HYpothCs91
Since the investigative methodology involves hypothesis
testing, it is necessary to have fairly precise statements,
catled research hypotheses, which are to be either supported
or refuted by the evidence. The second step in the method
was to formulate these research hypotheses, disjoint pairs
designated null and alternative, from the questions of
interest.
A precise meaning was given to the notion of
°difference." The investigation considered both (a)
differences in central tendency or average value, and (b)
differences in variability around the central tendency, of
observed values of the quantifiable programming aspects. It
should be noted that this decision to examine both Location
7
CHAPTER VI
ani jiispersion comodrsons among the experimental groups
brought a pervasive duality to the entire investigation
(i.e., two sets of statistical tests, two sets of
statistical results, two sets of conclusions, etc.--always
in parallel and independent of each other), since it
addresses both the glctar~z and tne 2reidt1iJiitZ of
behavior under the experimental treatments.
Some vagueness was removed regarding the size of the
particular programming task bY making explicit the implicit
restriction that completion of the task not be beyond the
caoability of a single programmer working alone for a
reasonaole period of time. Additionally, a large set of
programming aspects were specified; they are discusseo in
Chapters IV and V. For each programming aspect there were
si-nilar questions of interest, similar research hypotheses
and similar experiments conducted in parallel.
The schema for the research hypotheses may be stated as
"In the context of a one-person-do-able software development
project, there < is not I is > a difference in the
< location I dispersion > of the measurements on programming
asoect < X > between individuals (AI), ad hoc teams (AT),
and disciplined teams (DT)." For each programming aspect
X' in the set under consideration, this schema generates
two pairs of nondirectional research hypotheses, aepending
upon the selection of 'is not' or 'is' corresponding to the
null and alternative hypothesis, anu the selection of
'Location' or 'disoersion' corresponding to the type of
dif ference.
Step 3: SItaitical Mode l
The choice of a statistical mooeL makes explicit
various assumptions regarding the experimental design, the
--. . . .-- .. .. . x .. . . . .: . . . i il~ l [ i iii ~ , ... .L . .-7. . . .. .=
CHAPTER VI
deoenaent variabtles, the underlying population
distributions, etc. because the study involves a
ho-nogeneity-of-populations problem with shift and spread
attern3tives, the multi-sample model used here requires the
fot lowing: independent populations, independent and random
sampling within each population, continuous underlying
distributions for each population, homoscedasticity (equal
variances) of underlying distributions, and interval scale
of measurement [Conover 71t pp. 65-0 7 1 for each programming
asoect. Although ranoom sampling was not explicitly
achieved in this study by rigorous sampling proceoures, it
was nonetheless assumed on the basis of the apparent
reoresentativeness of the subject pool and the Lack of
oovious reasons to doubt otherwise. Due to the small sample
sizes, the unknown shape of the underlying distrioutions,
anJ the partially exploratory nature of the study, a
nonparametric statistical model was used.
whenever statistics is emoloyed to "prove" that some
systematic effect--in this case, a difference among tne
groups--exists, it is imoortant to measure the risk of
error. This is usually done oy reporting a significance
level [Conover 71, p. 79), which represents the
prooability of deciding that a systematic effect exists when
in fact it does not. in the model, the hypothesis testing
for each programming aspect was regarded as a separate
iniepenoent experiment. Consequently, the significance
level is controlled and reported experimentwise (i.e., per
asoect). While the assunption of independence between such
exoeriments is not entirely supportable, this procedure is
valid as Long as statistical inferences that couple two or
more of the orogramming aspects are avoided or prooerly
qualified. In this study, statements regarding
interrelationships among aspects are made only within the
interpretations in Chapter VIII.
79
CHAPTER VI
Step ,.: IIIIJ1iiia _HY2ohese
The research hypotheses must be translated into
statistically tractable form, called statistical hypotheses.
A correspondence, governed oy the statistical model, exists
between application-oriented notions in the research
hyootheses (e.g., typical performance of a programming team
under the disciplined methodology) and mathematical notions
in the statistical hypotheses (e.g., expecteo value of a
raido:n variable defined Over the population from which the
disciplined teams are a representative samole). Generally
speaking, only certain mathematical statements involving
pairs of populations are statistically tractable, in the
sense that standard statistical procedures are applicable.
St3tements that are not directly tractatle may be decomposed
into tractable (sub)components whose results are properly
recomoined after having been decided individually.
In this study, the research hypotheses are concerneo
with ,irectionaL differences among three programming
environments. Since the corresponding mathematical
st3tements are not directly tractable, they were aecomposeo
into the set of seven statistical hypotheses pairs shown
below. As a shorthand notation for longer English
sentences, symbolic "equations" are useo to express these
statistical hypotneses. The - symbol denotes negation.
The + symbol oenotes pooling. The = , * , ano <
syvmbols indicate comparisons on the basis of either the
location or dispersion of the dependent variables.
The hypotheses Pair
null : p], rC oIiive:
Al = AT = DT -(tl = AT = DT)
adiresses the existence of an overall difference amon6 the
groups. However, due to the weak nondirectional
90
CHAPTER VI
alternative, it cannot indicate which groups are different
or in what uirection a difference ties. Standard
statistical practice prescrioes that a successful test for
overall difference among three or more groups be followed by
tests for pairwise differences. The hypotheses pairs
AI = AT AI $ AT or AI < AT or AT < AI
AT = DT AT t DT or AT < DT or DT < AT
Al = IT Al * DT or Al < DT or DT < Al
aaIress the existence and direction of pairwise differences
between groups. The results of these pairwise comparisions
were used to refine the overall comparison. Data collected
for a set of experiments may often De legitimately reused to
"simulate" other closely related experiments, by combining
certain samples together and ignoring the original
distinction(s) between them. It is meaningful, in the
context of this study's experimental design, to compare any
two groups pooled against the third since (1) Al and AT are
both undisciplined, while DT is disciplined; (2) AT and OT
are both teams, and Al is individuals; and (3) uncer the
assumption that disciplined teams behave like individuals--
which is part of the study's basic premise, DT and Al can be
pooLed and compared with AT acting as a control group. The
hyootneses pairs
null : allgrelalive:
AI+AT = DT AI+AT * DT or AI+AT < DT or DT < AI+AT
AT DT = Al AT*DT * Al or AT+DT < Al or AI < AT+DT
AI+DT = AT AI+DT 0 AT or AI*DT < AT or AT < AI+DT
adiress the existence and direction of such oocled
differences. The results of these pooled comparisons were
useo to corrooate the overall and pairwise comparisons.
Thus, for each orogramming aspect, the research
hyoutheses pair corresponds to seven different pairs (null
and alternative) of statistical hypotheses. The results of
AD-AO% 452 MARYLAND UkIV COLLEGE PARK DEPT OF COMPUTER SCIENCE F/6 9/2AN EXPERIMENTAL INVESTIGATION OF COMPUTER PROGRAM DEVELOPMENT A--ETCIU)DEC 79 R W REITER AFOSP-77-3181
UNCLASSIFIED TR-853 AFOSR-TR-81-0214 NL
22 Fl flUlllEEEEEEEEEEE-EEEEEniuEEE/hEEEEEEEIhEEEEEEEIIEEIIIIEEEEEEElllElllEEEEEI
I.E I
C-4APT F- V
*estin- each set of seven h ypotheses must be abstracted and
or ar"lzed into One %titisticak conctu% ion t sin the first
research framework discussei in the next steo.
The research frameworks provide the necessary
organizational basis for abstracting ano conceptualizing the
massive volume of statistical hypotheses (and statistical
results thdt follow) into a smaller and more intellectually
'ana eaole set of conclusions. Three separate research
fr3meworks have been chosen: (1) the framework of possitle
overall comparison outcomes for a given programming aspect,
(2) the framework of dependencies and intuitive
relationships among the various programming aspects
consiuered, and (3) the framework of basic suppositions
regarding expected effects of the exoerimental treatments on
the comparison outcomes for the entire set of programming
asoects° The first framework is employed in the statistical
conclusions step because it can be applied in a
statistically tractaole manner, while the remaining two
frameworks are reserved for employment in the research
interpretations step since they are not statistically
tr3ctable and involve subjective judgement.
Since a finite set of three different programminj
environments (Al, AT, and DT) are being compared, there
exists the following finite set of thirteen possible overall
comparison outcomes for each aspect considered:
CHAPTER VI
A I AT = DT
At < A T DT Al < AT < DTAl AT z DT
AT = DT < A J At < DT < AT
AT < DT = Ar AT < DT < AlAT $ DT = At Al # AT DT
DT = Al < AT AT < At < DT
DT < Al = AT' DT < At < ATDT A Al = AT
Ar = AT < DT DT < AT < AI
There is a hierarchical Lattice of increasing separation ano
directionality among these possible overall comparison
outcomes as shown in Figure 3. These thirteen possible
overall comparison outcomes comprise the first research
framework and may be viewed as providing a complete "answer
space" for the questions of interest. It is clear that any
consistent set of two-way comparisons (such as represented
in the statistical hypotheses or statistical results) may be
associated with a unique one of these three-way comparisons.
this framework is the basis for organizing and condensing
the seven statistical results into one statistical
conclusion for each programming aspect considered.
Since a large set of interrelated programming a3pects
are being examined, it would be desirable to summarize many
of the "per aspect" hypotheses and results into statements
which refer to several aspects simultaneously. For example,
average number of statements per segment is ore aspect
directly dependent on two other aspects: number of segments
ani number of statements. Other interrelationships are more
intuitive, Less tractable, or only suspected, for example,
the "trade-off" between global variables and formal
parameters. A simple classification of the programming
aspects into groups of intuitively related aspects at least
orovices a framework for jointly interpreting the
corresponding statistical conclusions in light of the
unierlying issues by which the aspects themselves are
related. The programming aspects considered in this study
FIGURE 3
figure 3. 1 Lattice 2f Possible ftjz~aSguj. Daim I=r Three-May QaQartn
I AX.AT*DT (null)
S&<AT-DT AT.iT(AI , -AI DT.AV(AT ' DT<AI.AT AI.AT(DT' (partially- - - - ,- ' differentiated)
AIk (&T(DY Al(bT[(AT &"T<!< AX T(AX<DCT D'Q1IAT OT<AT<AI (cm ~te.AUMVT HOUR MOTO MAI DUAVT IM AX Oosietely
" , differentiated)
M.S. The circles indicate which directional outcomes correspond to the same nondirectional outcome.
Figure 3.2 1-ttJlnn 2L Pgansble Handireaninati QWaC2h J= Thrueg-ay oanison
Al - AT - DT (null)
At d AT x DT AT d DT a AI DT i At a AT (partially
differentiated)
At J AT d OT (compLetely
differeptiated)
83a
CHAPTER VI
were classified according to a particular set of nine
higher-Level proqramming issues (such as data variabLe
organization, for example); details are given in Chapter
Viii. This second research framework is the oasis for
aostractinc and interpreting what the study's finoingi
inJicate about these higher-Level programming issues, as
well as explicitly mentioning several individual
relationships among the orogramming aspects and their
conclusions.
Since the design of the experiments, the choice of
treatments, etc., were at least partially motivated by
certain general beliefs regarding software development
(e.g., "disciplined methodology reduces software oevelopment
costs"), it should be possible to explicitly state what
comparison outcomes among the experimental treatments were
exoected a priori for which programming aspects. A list of
preplanned expectations (so-called "basic suppositions") for
thue outcomes of each asoect's experiment would provide a
frame.ork for evaluating how well the experimental findings
as a whole support the underlying general beliefs (oy
comparing the actual outcomes with the basic suppositions
across all the programming aspects). Such a list of oasic
suopositions was conceived prior to conducting the
exoeriments, and it constitutes the third research
framework; details are given in Chapter VIII. This
framework is the basis for interpreting the study's findings
as evidence in favor of the basic suppositions ano general
hetiefs.
step 6: r±xDnpi DeitQ
The experimental design is the plan according to which
the experiment is actually executed. It is based upon the
st3tistical model and deals with practical'issues such as
CHAPTFER VI
exoerimental units, treatments, local control, etc. The
e xerimentat design employed for this study has been
Jiscussed in considerable detail in Chapter Ill.
Step 7 : Co~~ LLF1. fl
The pertinent data to carry out the experimental design
was collected and orocessed to yield the information to
which the statistical test procedures were aoplied. Some
details of these activities have been given in Chapter III.
A statistical test oroceaure is a decision mechanism,
founded upon Oeneral principles of mathematical probaoility
and combinatorics and upon a specific statistical model
(i.e., requiring certain assumptions), which is used to
convert the statistical hypotheses together with the
calected data into the statistical results. As dictated by
the statistical model, the statistical tests used in the
study were nonparametric tests of homogeneity of populations
ag3inst shift alternatives for small samples. Nonparametric
tests are slightly ,more conservative (in rejecting the null
hyoothesis) than their parametric counterparts;
nonparametric tests generally use the ordinal ranks
associated with a Linear ordering of a set of scores, rather
tnan the scores themselves, in their computational formulas.
!n particular, the standard Kruskal-wallis H-test [Siegel
56, pp. 184-193] and Mann-Whitney U-test [Siegel 56, pp.
115-127) were employed in the statistical results step.
Ry3n's Method of Adjusted Significance Levels [Kirk 6d, p.
'7, pp. 495-497), a standard procedure for controlling the
exoeri'nentwise significance level when several tests are
verfornied on the same scores as one experiment, was also
emoloyed in the statistical conclusions step.
kR
CHAPTER VI
The Kruskal-wallis test is used in three-sample
situations to test an X = Y = Z null hypothesis; its test
St3tistic is comouted as
M = 12*[(Rx2 /nx)+(Ry2 /ny)+(Rz2 /nz)]/[n*(n+l)) - 3a(n+1)
where Rx , Ry , and Rz are the respective sums of the
ranks for scores from the x, Y, and Z samples; n equals
nx+ny+nz where nx , ny , ano nz are the respective
sample sizes. The Mann-,Whitney test is used in two-sample
situations to test an X = Y null hypothesis; its test
St3tiStic is computed a-
U = min[ nx*ny + nx*(nx+1)/2 - Pxny*nx + ny*(ny+1I/? - Ry
wnere Rx , Ry , nx , 3nd ny are defined as before.
For every statistical test, there exists a one-to-one
madding, usually given in statistical tables, between the
test statistic--a value completely determined by the sample
data scores--and the critical Level. The critical level
[Conover 71, p. 81) is defined as the minimum significance
level at which the statistical test procedure would allow
the nutL hypothesis to be rejected (in favor of the
alternative) for the given sample data. Thus critical level
reoresents a concise standarized way to state the full
resjlt of any statistical test procedure. Two-tailed
rejection regions are apolied for tests involving
n ondirectional alternative hypotheses, and one-tailed
rejection regions are apolied for tests involving
directiunal alternative hypotheses, so that the statea
critical level always pertains airectly to the stated
alternative hypothesis. A decision to reject the null
hypothesis and accept the alternative is mandated if the
critical level is low enough to be tolerated; otherwise a
decision to retain the null hypothesis is mane.
7he Ryan's procedure is used in situations involving
mattiple pairwise comparisons, in order to properly account
CHAPTER VI
for the fact that each pairwise test is made in conjunction
with the others, using the same sample data. The individual
critical Levels 6 obtained for each pairwise test in
isolation are adjusted to proper experimentwise critical
Levels & via the formula
6t = [(r*l)*k/2] *
where k is the total number of samples; and r is the
numner of (other) samples whose rank means fall between the
rank means of the particular pair of samples oeing compared.
A simple "minimax" step--taking the maximum of the several
aajusted pairwise critical levels, plus the overall
comparison critical level, which are all minimum
significance Levels--completes the procedure, yieldin a
sin;le critical level associated jointly with the overall
and pairwise comparisons.
These tests and procedures apply straightforwardly when
differences in location are considered. A slight
mojification makes them applicable for differences in
dispersion: prior to ranking, each score value is simply
reDLacea by its absolute deviation from the corresponding
within-group sample median [C'emenyi et al. 77, pp. 266-270).
It should be noted that this modification results in only an
aporoximate method for solving a tough statistical problem,
namely, testing whether one population is more variable than
another [Nemenyi et al. 77, op. 279-283J. The modification
is not statistically valid in the general case (it weakens
the power of the test procedures and can yield inaccurate
critical Levels when testiny for dispersion differences),
but every other available method also has serious
limitations. This method has been shown (empirically via
monte Carlo techniques) to possess reasonable accuracy, as
long as the underlying distributions are fairly symmetrical,
and is readily adapted to the study's three-way comparison
situation.
L=, a7,, ,
CHAPTER VI
Step C:' aijjjPu~
statistical result is essentially a decision reached
by aopLying a statistical test procedure to the set of
collected and refined data, regarding which one of the
corresponding pair (null, alternative) of statistical
hyootheses is indeed supoorted by that oata. For each pair
of statistical hypotheses, there is one statistical result
coisisting of four components: (1) the null hypothesis
itself; (2) the alternative hypothesis itself; (3) the
critical level, stated as a prouability value between 0 and
1; a-a (4) a decision either to retain the null hypothesis
or to reject it in favor of (i.e., accept) the alternative
hyoothesis.
y convention, the null hypothesis is that no
systematic difference apoears to exist, and the alternative
hypothesis purports that some systematic difference exists.
The critical levet is associated with erroneously acceoting
the alternative hypothesis (i.e., claiming a systematic
lifference when none in fact exits). The decision to retain
or reject is reached on the basis of some tolerable level of
significance, with which the critical level is compared to
see if it is Low enough. In cases where a null hypothesis
is relected, the appropriate directional alternative
hypothesis (if any) is used to indicate the direction of the
systematic difference, as determined by direct observation
from the sample medians in conjunction with a one-tailea
test.
Conventional practice is to fix an arbitrary
,i nificance level (e.g., 0.05 or 0.01) in advance, to be
used as the tolerable level; critical levels then serve only
as steoping-stonps towarl reaching decisions and are not
reoorted. For this partially exploratory study, it wiS
Q.
CHAPTER VI
,ltm,.j morv appropriate to fix a toleraole level only for
tfie rurpo-,v of d screening decision (simply to purnie those
results witn intolerably hivh critical levels) anu to
ex:)licitLy attach any surviving critical Level to each
st3tistical result. This unconventional practice yields
statisticaL results in a more meaningful and flexible form,
since the significance or error risk of each result may be
assessea individually, and results at other more stringent
significance levels may oe easily determined. Furthermore,
the necessary information is retained for properly
recomoining multiple related results on an experiment.,ise
basis in the statistical conclusions step.
The toleraole level of significance used throughout
this study to sceen critical levels was fixed at under 0.20.
, lthough fairly high for a confirmatory study, it is
re3sonabLe for a partially exploratory study, such as this
one, seeking to discover even slight trends in the data. A
critical level of 0.20 means that the odds of obtaining test
scores exhibiting the same degree of difference, due to
raiaom chance fluctuations alone, are one in five.
As an example, the seven statistical results for
location comoarisons on the programming aspect STATEM4ENT
TY3E COUNTS\IF are shown below. (N.9. The asterisks will be
exolained in Steo 10.)
null alternative critical (screening)
At = AT = DT -(AI = AT = DT) .06" rejectAl = AT AI < AT .046 rejectAl = DT Al * DT >.99Q ret ainAT = OT DT < AT .011 reject
AI+AT = DT DT < AI+AT 38 .3 reject *AI+DT = AT AI+DT < AT .009 rejectAT+DT = Al AT+DT I AI .335 retain *
"oserve that the stated decisions simply reflect the
aoolication of the 0.20 tolerable level to the stated
critical levels. Oesults under more stringent Levels of
,rignificance can be easily oetermined by simply aooLying a
L.o
CHAPTER VI
lo.er tolerabLe level to form the decisions; e.g., at the
?.]5 sijnificance ltvel, only the Al < AT, DT < AT, and
"1+ T < AT alternative hypotheses would be accepted; only
the AI+D T < AT hypothesis would be accepted at the 0.01
level
ste 1': tatistical Conclusions
The volume of statistical results are organized and
condensed into StatiStical conclusions according to tne
Drearranged researchi fraaework(s). A statistical conclusion
is an austraction of several statistical results, but it
retains the same statistical character, having been derived
via statistically tractanle methods and possessing an
associated critical level.
The first research framework mentioned aoove was
emoloyed to reduce the seven statistical results (with seven
individual critical levels) for each programming aspect to a
single statistical conclusion (with one overall critical
level) for that aspect. The statement portion of a
st3tistical conclusion is simply one of the thirteen
possible overall comparison outcomes. Each overall
comparison outcome is associatec with a particular set of
statistical results whose outcomes support the overall
comparison outcome in a natural way. For examole (reading
*r:m the fifth row of the chart in Figure 4), the
)T = AI < AT conclusion is associated with the following
results:
reject AI = AT DT in favor of -(AI = AT = OT),
reject Al = A7 in favor of AI < AT,
retain AI = DT,
reje.ct AT = DT in favor of DT < AT, and
reject AI D = AT in 4avor of AI+DT < AT.
Siice the other two comp3riscns (AI+AT versus AT, AT+L T
CHAPTER VI
versus AI) are in a sense orthogonal to the overall
comparison outcome (DT = AI < AT), their results are
considered irrelevant to this conclusion. The chart in
Figure 4 shows exactly which results are associated with
Pach concLusion: the relevant comparisons, the null
hyDotneses to be retained, and the alternative hypotheses to
te accepted. The other oortion of d statistical conclusion
is the critical Level associated with erroneously accepting
the statement portion. It is computed from the individual
critical Levels of certain germane results.
A simple algorithm based on the chart in Figure 4 was
useo to generate the statistical conclusions (and compute
the overall critical Level) automatically from the
st3tistical results. For each programming aspect, the
algorithm compared the set of actual results obtained for
t e seven statistical hyootheses pairs to the set of results
associated (in the chart) with each conclusion, searcning
for a match. Ryan's procedure was used to properly combine
the individual critical levels for the overall result and
the relevant pairwise results, by adjusting them via the
formula and then taking their maximum. The critical levels
for the relevant pooled results were then factoreu, by a
sivple formula based on the multiplicative rule for tne
joint probability of independent events.
Cnntinuing the examole startea in Steo 9, the
statistical results shown there for location comparisons on
the STATEMENT TYPE COUNTS\IF aspect are reduced to the
statistical conclusion DT = Al < AT with .07? critical level
overall. The five results not marked with an asterisK in
Step 9 match the five results associated above with the
DT = Al < AT outcome. (Note that the other two marked
results represent comparisons that are irrelevant to this
concl,,sion.) The .04b and .011 critical levels for tne two
?,I
FIGURE 4
0.' 0I
0 ~ +nn I 4. V . 44-J1.- I b - 4 4 - I- 4 4 4
"CC 3 IAI 4 ' 4 4 3.
* C4 In ~-- - . 4 - .
cl > .4, . 4 I Q3 IC in --x321, Q0 0Mc If I 4. V + 4 . v
= t,- 0-+ Iv + + v v +
-'I In 'U IA'. -n m 4
Ul C -- 34IA'iI C M0 E-P .
4-CS -(U o3 C= 4 n < < 4I,f -~oz 0 LI. II I +. V V V + +(! A .CwJ4 I it It 4 4t - 4v-4-alt.-44- c I 4<
<u ~n *-'U + + + . 44m 3 16- .4 >4 "4
l .E 0 u-4 - I -X <- 9'r
-a >U.L.' 4- I _
jIl C.44 o. - - P.I 4 - I- - P- I.- o- I.- -CL -~..C .. <3 n ia -cc a3 n3 < 3 ' 4 3
Ia 03 - fin it I *- If If v V V V V V v V v VLI1 > 40 1.- 1 0 - 1.- l'- 4- P.- I.- I- l- l'- I..- P4-01 4-GJ iA#A 0 C ' 0 M4 -In a <4 -cc In4 4 < in 3
'* 'U4 ,6 - ~ C 0 I .
'-1 'A %jCG.w41 In 4 I p- .4 1- 4 -4 4 - 0.- -4 P.-L 30I C '3 Or '3 1 = 3 4r C 3 < 3 I= 9= 4 <4
- UI '-j CL u- I I V V it 11 V V v V V V V VOCo'-.-u -4 " -4 -- - - - I-- -
Lii L. 0'U 4 1 '3 4 Cl 4 '3 3
4- a- C~ In-10- -acti~.'G w -I <- < .
CIQ.in 4- I) C - - 4 - 4 4 - ~ -44
of I C.L 'DiC.= I V- I
(At4 C=. :3 I t- 0- 4 - 1.- P4-4-4- 4 0 .n 41 1 n "* a 43 a =k 4 -- 4. 47 Q 4
CI~~I It1IA .0O
(AI - %,Z - . I- I - i- - 9- - 4- 4- - - - P- -
6. . 000 .,. )l I I II I II II I I I I I
L~ ~ 4-c~ A-Cf IllI u I I v if v It v v I
I--. -- I-I
cImC-. " - 4- P. .- i - " - I- P.- 4
0CuOT.. Cs 6I
IA I 4- - - - '- 491a- 4 - ~ 4
CHAPTER VI
odiriise differences are adjusted to .070 and .033t
respectively, and the maximum among those aojusteo values
anJ the .063 overall difference critical level is .07.. The
relevant pooled comparison critical level of .008 is
factored in by taking the complement of the products of the
complement s:
S- (1- .06Q)*(1 - .008)] =072
Thus, the statistical conclusions are in one-to-one
correspondence with the research hypotheses and provide
concise answers on a "per aspect" basis to the questions of
interest. Further details and complete listing of the
statistical conclusions for this study are presented in
Chapter VII.
Ste p 11: R a LnIe1tr Ii 201
The final step in the method is to interpret the
statistical conclusions in view of any remaining research
framework(s), the researcher's intuitive understanding, and
the work of other researche rs. These research
interpretations provide the opportunity to augment the
o:jective findings of the study with the researcher's own
professional judgment and insight. The secono and thiro
research frameworks mentioned above--namely, the intuitive
relationships among the various programming aspects and the
basic suppositions governing their expected outcomes--were
consicered important for this purpose. However these
particular research frameworks can only be utilized for the
research interpretations, since they are not amenable to
rigorous manipulation. Nonetheless, within these frameworks
based upon intuitions about the software metrics and
programming environments unoer consideration, the study
hears some of its most interesting results and implications.
Complete details ano discussion of the research
CHAPTFR VII
This chapter reports the objective results of the
stidy , namely, the statistical conclusions for each
orogramming aspect considered. In keeping with the
emoiricat and statistical character of these conclusions,
the tone of discussion here is purposely somewhat
disinterested and analytical. ALL interpretive discussion
is deferred to Chapter VIII, in accordance with the
invest i gat ive methodology.
Each statistical conclusion is expressed in the concise
form of a three-way comparison outcome "equation." It
states any observed differencest ano the directions thereof,
among the programming environments represented by the three
groups examined in the study: ad hoc inoiyiduals (AI), ad
hoc teams (AT), and disciplined teams (DT). The equality
AI = AT = DT expresses the null outcome that there is no
systemat.c difference among the groups. An inequaLitY,
e.j., Al < AT = DT or DT < Al < AT, expresses a non-null (or
alternative) outcome that there are certain systematic
difference(s) among the groups in stateo direction(s). A
critical level value is also associated with each non-nutL
(or alternative) outcome, indicatinj its individual
retiarility. T his value is the oroDability of having
erroneously rejected the nult conclusion in favor of the
alternative; it also provides a relative index of how
pronounced the differences were in the sample data.
The remainder of this chapter consists of (a)
Presenting the full set of conclusions, (b) evaLuatin, their
imlact as a whote, (c) exposing a "relaxed differentiation"
view of the conclusions, (d) exposing a "directionless" view
of the conclusions, and (e) individually highlighting a few
94
CHAPTER VII
of the more noteworthy conclusions.
:r2sentation
The complete set of statistical conclusions for both
Location and dispersion comparisons appears in Table 2
arranged oy Programming 3spect. Instances of non-null (or
alternative) conclusions--those indicating some distinction
among the groups on the basis of a measured programming
asoect--are Listed by outcome in Taoles 4.1 (for Location
conparisons) and 4.2 (for dispersion comparisons).
Examination of Table 2 immediately demonstrates that a
large number of the programming aspects considereo in this
study, especially product aspects, failed to show any
distinction between the groups. This tow "yield" is not
surprising, especially amonq product aspects, and may be
attributed to the partially exploratory nature of the study,
the small sample sizes, and the general coarseness of many
of tI'e aspects considered. The issue of these null outcome
occurrences and their significance is treated more
thoroughly in the next subsection, Impact Evaluation.
It is worth notingt however, that several of the null
conclusions may indicate characteristics inherent to the
apotication itself. As one example, the basic symbol-table/
scanner/parser/code-generator nature of a compiler strongly
influences the way the system is modularized and thus
practically determines the number of modules in the final
croduct (give or take sove occasional slight variation due
to other design decisions).
The collective impact of these statistical conclusions
95
TABLE 2
0c 411 Cc ~ 6
* ~~~~1 54 to ~4 4 4 6 I 6 6 6
0. IE ~ w 4 1- 60 * : . it 61
UC. 40 4. . . . . . . .-- 44fy 66- 4M 61 CO 4O *tlf it *
! ~ ~ ri ftkr 4 - 6 4* 4 i 6
- m 4 , 4 -9 64 4 4 6 6 6
c 6
6 4 o 06 of 11 1 "4 4, it'' i 4 66. I i it 6 1 6
01 4b -919- 4 - 6 4a w 6 6 6 6 6
16.4k* V V I V* I0 41 6v16 66V6 V6 V M66 466666 .666666 666666 1 VZ* I 0 4 14 1 6 6 6 6 6
_j6 to Uo a v I- - - 1 1-0- It - I 6 6C 4 4 4 1.6 0. 4 f 1-91 6 6 0
4 ~~ 4l I 6 6 6 6 6
05~t I44 4-166, 4D m I 6 6 6 6
546 cc 6. 6 6 6 6 6
U04 u5 4V6%60 VA.616~4 ~ 14 666~~6 6 6 66O'6 * ~ 066o~J 6.0. F ~ C4 666 6 *
.... 16.4~1 It5 0 06 .. 4 4 666 6 6 6In ~ ~ ~ ~ ~ ~ ~ ~ ~ I .1* 0 6*6 *66 44 4 66 6 6 6
C. U -c"0 0006 0 60 6O C 6* C U 4 4 4 4 4 66 4 66 6 6 6 1
- ~ ~ ~ ~ v _x_ z6 W. 4440 44 64 4 4 0 * 66 6 6 4U ~ ~ ~ ~ o wzi I605 4 , 6 6 6 6 6
C ~ ~ ~ ~ ~ ~ ~ ~ ~ i I4 4046 I6 C66166616 6666 4 V 1666 666 6666o a.. %Z04 ac 02 1 6
Cz jz & ac.64 ' 6 44'- : i 160 1 -- I I4S4 4 5=M 4L44z4 44 *1.4 U ,A -66I 6 ~ 1 .
0 - C 4 4 0 4000 004 .0 . .1 Q04 0. 4 64
- ~ ~ ~ ~ ~ ~ ~ ~ I IA. 11,.46 4 6 .4 66 6 66
S ~ ~ ~ ~ ~ ~ ~ 3 'A 4 4 4 4 6
- -6 4 - 4666664 6.6 4 - 4 6 95a 6
TABLE 2
IA A II iI II II II
AAU)A rMa 01 01 II II II Il O O 11 d.O~i - A ' I II N11. , :: *, I .. :1 I I I . *,l
N II n It of 0110 0i 0iol *. A1 :: I I V Ingo
Ac C; II o f It Iit IA
I 4A 4448 • II II II II 011 4 444 I 4 AII N II I A1
AMA AANA HIN iAAAii I i i ANI NIA1IV AAIA8HNNNAIAI NA II INViiVAiA iN A AiVN #l I INIINV A I N INNNNIAl I A A II IA II IIA
Ni -A, -- 8 -A t II II i a ' II IS -- I1 i N l # I 1 If v8 # I it if I 8 v i i 1 Is to so II II II II v I 4 0 I 4
11V I tlA Al It14 v sov I IAI 11 8 11 1 1 A1 it 1 IsAIIs IIt A:1 1:: I t ofAAIIto 41 111 I t fV V I s ifo Al IAN N IAIVN Is
N A II II II IIAISP-I-4- -t v-I It i I I AP- -
0 8i II II 0i i
N A2 0 i II It N A
AI Oc I I ;N 1 I. I A
SI I N it II IIP
IsII Ii If I .*1 sof II v 1 i
N A C OCA 0loss o N N Is00
N A 'l A A . .. .N .. ..I II A A
N A I A IA II IA II N I II{
i|N A Iy i I I I iV V I I I 1l A Al II V I lI IH I II tt Il II 88 N II II I U t II I I I
IN OH NV It N N f it toV IfVI I ft It It11H111 A f V ol :1 toV 11 IsIsN 11111111 1 11 Ittol I s V AN i 111.11111 to V. 1 IN A I i II N I I I
Nz i-i 11 0 it .. a 4 I.
141i I N i II II II A
liII A .P ~ .Al1 II l II II P - - II
AIA t 11 IA i 00 Z4II AAA II II II iI II A iI
II AII I N I IIIA II iI Ii IN II II A
il I A iI II II II II AZII AAA II II II il II I
I A II i II N I ItIA I Ii Il n I i ii Ii Il.IA I I I IAl il II ii II A .
liA I il 1l i II Al III
II I AAP IIAA II l II II I
iI A I 5 1 II lI II Ii II W I
4.1 Az 401 AU lI h. A 11 is . 12 I
II A U ICI l i I II ii l.JII A
II I A: I IJII I ii r- 3. il
i I- II II II Ii II m
iAl A~ APl,-l ~l ' AftI I lII l II I IO I l i Z i t 1i I i.. + , - I~
II I f A Z ,I , G. i- ,a i I iIU i I I I ,a -II itII I.1 AII + tl It. ,
o I, C Ik W I C Osll I" II U) II iZ I
it AI ) AC.- Q U) 0-I I 4m CI 1..1 U 9) A5b U )
II ~ ~ ~ ~ c 8141 cc of4 -- A.Ptz!1U1 If W. I-- -I - - -IA A C. .aftU) AC. ..Ifthfl A II 44111 II .a s) I 211 t) 3UU - fll CUU
N A P. ...JP-Z A P. .. AP.Z~~~~I A .Pl I. As, 2.1 l h l. ZIA -11.~.
N r Z-l IP. 42A I N11 14.N~ 41 )21.t.".I.. 4Z95bZ-...
TABLE 2
to0 0 00 00 0
- ~ ~ ~ ~ ~ ~ ~ I :!: A ?0~IU o~.N0 0S t 0-, ~ ~ ~ ~ ~ ~ 9 1 if .t . 0J N~ % 0 - 0 . 1:
o 0 tO C 0 0 0 00 0 00 cao of aCall
If 0 1 H i si ft fo 1I f1 fV 11 o fo fI t0 a 0 0 0 V 1 0 Itt 0 fII to V V I 0 .i
- of I - - : S $- -O - : - : o0
It I o t
St~~ O il0 tt S
00 0 0 -; -; -; 0 1 N 0 t
a a 01 a 0 St
OS I ot to- t 0 f 0 of -o -# U If N 0 lol 0 if I o $f l ~ l0 1 1 f0 I fI io f o t 0 tot. 1 . i li
: tot444 0 4 0 44 t St 4 t
to 0 0t V o to 0N t
I0 0 to t12 1e 1 i0 .
-- ~~~~ o0f -*. 0 0 ** * 0 * 00 of to
00~~~~o 10z 0 0C 0 0 0 0 t
to W 5 0 0 tO
tof 0 1 0 a - a I- 1 0
,A Wat a I wa wa a 0 0 .. toS. o c 0 o
o:; :: a I 0 0 0 0 I
'V 00 - - - 0f cc - 0 -- of tEI0io 440 0 4= 4 1 4 04 1 -co 0 4=Oa J: toow99 -MAzl
If 0 - 0 10 r 0 . KtO
0 0 0 z St St
to ~ 0 ON tOto
St 0 0 = to5t
TABLE 2
I C) O4O I I D
-C AK In CC A
4~~CC 4; 444 I A 0A
A 0 - ...A I A
A- C Ir C Ad le
A ~ ~ A C I
- A ~ A V AA -0
4 4 4 4 A 4 A 4 IAc a, -c =. m -cZ
I- a Cjk_ x-ac j
A A C A AA- w Ac I *w cA. EA A
AV% A 4 .* A L reAj' 2. *~
A. 7 I Co . . .A . A . *. IA C X 0I 0 4 4 A 00 A 00010 .
A aw wi A3 1 A
. . . . . .4 - 4. . . . . . . . . . .Az r A z4 C E A
A - to is * A6mA -
14 A 4 4 4 A 4 44 A 4 4 4
A C La * A AAAAIAAAAAAAAAAIAAAAVVIAAAAIII4 ~ 4C AAAVAVVAAIAAA AAAAUAAAIAorI
A A C 4 4 4 A9Ad
TABLE 2
If 5Iq to ai IO.
00uimi v If tov to0 it to 0 000 141M 11a.~pv v il N lo l
to1 ICa aM. of 16 0 " M t .I l 1 t w ti l1 6 H on v I f f IS It It.N If it it 1 11i i l o " H t
f- I 5 N 50 -9 -a: 440 za .. , 1ia 4 4
f O- '" 0 o' I wtoa s It a
- ~ ~ c 0*0 a -i m
4Z 7 5 z4i f 5 4 a4 44a.v f ~ v , 1~ :~ 4 v " t 0vo t n .I f1 t 4 I 11 if 55 I If ao .
Ia HS a
of' M ~ 0 of .0 v ofa a~ M It0 af t .t .It1 f I tI a w " ol l to 11 11 M o a. 1 i 11
o ~ ~ ~ ~ ~ ~ I %A to ao f m u aa------------ 0 ~ I 0 1s-..00 aa
N a N if. = f I f - - IIa
.4 144 of 444 1 a. -1 a. a a
ifZwo c W if ZI Ia.4 Q. a.-0 a for I aZ.-k
-Na - 1 is -1 if 5.
4z 04 a 4 4 a M: 4 I 4w ='C I 1a=
iio is a a I
a If -.1N - NN N i -. i1 w- - Z. a
w" It 5m5 a.K I 5
L tc8 0 *DI CL a. 0a00 0 zZ 1 M 'c - w COI wAl
if9 if a. I,. : a.- .0. a1c O s 19 5lml
95e5
CHAPTER VII
may be objectively evatuated according to the following
statistical principle [Tukey 69, pp. 34-953. Whenever a
series of statistical tests (or experiments) are made, all
at a fixed Level o4 significance (for exampLe, 0.10), a
corresoonding percentage (in the example, 10%) of the tests
are -xoected a priori to reject the null hypOtheSis in the
co-iplete absence of any true effect (i.e., due to chance
alDne). This expected rejection percentage provides j
conparative index of the true impact of the test results as
a ihole (in the example, a 25% actual rejection percentage
wojLJ indicate that a truel y significant effect, other than
chance alone, was operative).
The point here may be illustrated in terms of simple
coin-tossing experiments. The nature of statistics itself
dictates that, out of a series of 100 separate statistical
tests of a hypothetically fair coin at the 0.05 significance
level, roughly S of those tests would nonetheless indicate
that the coin was biased; if only 6 out of 110 tests of a
re3L coin indicate oias at the 0.05 level, those six results
have very little impact since the coin is behaving rather
unciaseoty over the full set of tests.
This same "multiplicity" prtnciole applies to the
st3tistic3L conclusions of the study, since they represent
the outcomes of a series of separate tests and were assumed
in the statistical model to be separate experiments. It is
aporoc.riate to evaluate the location ano dispersion results
seoarately, since they reflect two separate issues
(expectency and predictability) of software development
behavior. It is also aporopriate to evaluate the process
and product results separately. Finally, it is only fair to
evaluate the con'irmatory asoects as a distinct subset of
all :ispects examined, since they alone had been honestly
consiuered prior to collecting and analyzing the data.
CHAPTER VII
uetails of this impact evaluation for the study's
oojective results, oroken down into the appropriate
cateiories identified above# are presented in the following
ta:le. (This table is an excerpt from Table 3, which
orovides an extensive imoact evaluation, broken down
hierarchically according to all of the various dichotomies
ijentified for the programming aspects.) The evaluation was
oerformed at the OL = C.20 significance Level used for
screening purposes, hence the expected rejection percentage
for any category was 20%. For each category of aspects, the
taote gives the numoer of (nonredunoant) programming
asoects, the expected (roundea to whole numbers) and actual
numoers of rejections (of the null conclusion in favor of a
directional alternative), and the expected and actual
rejection oercentages. An asterisk marks those categories
demonstrating noticable statistical impact (i.e., actual
rejection oercentage welt above expected rejection
percentage).
----------..... 4.-----------. -------- -------------------------
category g 0 # ep ae:-I .. . ..rej. % I %
-- - -- - -- - - - - ------- --------- 4----------------- -----.lo cat ion 38C 51 1 0: j2: 1
oJrocess 109confir-natrry only 6 1 6 20.0 10.0
oroduct 1 179 1 36 I 44 1 20.0 1 .4.6 1confirmatory only 2 12 20.0 41.4
confirmatory only 35 1 18 1 200 14-- - -- - - -- - - -- - - ----- -- ---------------------
I o uonly 179 38 4 ;1 200 Z2 1confirmatory only 29 1 6 9 20.0 I 31.0 I'
confirmatory only 35 7200 25.7
----- 4-- -------------------------- ------------------
4: numoer of aspectsexp. I rej. : expected number of rejectionsact. I rej. : actual number of rejectionsexp. rej. : expected rejection percentageact. rej. : actual rejection percentage
The tadte shows that the location results, dealing with
the expectency of software aevelopment behavior, oo have
st3tistical impact in several subcategories. Process
aspects have mori i.pact than product aspects on the whoLe,
TABLE 3
Table 3. ttisti-CAt MRaQi [. uttiYi20
4 4o: . . 4e°.l" ' ' 4" .'o 4.' "number expect.Iactuatf expect. actuaL.category of num. o hum. of reject. reject.Iaspects reject. reject. oercent percentl
I location o189 1 38 53 1 20.0 1 z.O 1I process 1 10 I 2 1 9 1 20.0 1 90.0 1*
I rudimentary 9 I 2 I 8 I 20.0 I 88.9 1*confirmatory 5 1 1 5 20.0 100.0xploratory 4 I 3 28.0 75.0
etaborat ive 1 0 1 20.0 100.0confpratory 1 0 : 100.0eptoratory C' 0 ~
confirmatory 6 1 08.0 108:8exploratory 4 1 3 20.0 7
------ -- - -- -- -- -- -
ruoimentar 115 22 8I conf irmaforyl 26 5 11 288 2o3
elaborative 64 13 22 0.0 34,.4confirmatory 3 1 1 20.0 33.3 *exploratory 61 12 21 20.0 34.5
confirmatory 1 29 6 12 1 20.0 1 41.4 I*exploratory I 150 30 32 1 20.0 1 21.3 1
+.-----------------+----------------------4 --------------- -- 4
I rudimentary I 124 I 25 J 30 J 20.0 I 24.2 II confirmatory I 31 I 6 i 16 1 20.0 I 51.6 1*
elaborative 6 1 4 2 0
confirmatory 4 1 2 20.0 50.0 1*exploratory I 61 I 12 l 21 20.0 1 3..5 1*
confirmatory 31 7 18 2 0 1 51.4 J*xptora tory 154 31 35 0:- 22 .
Idisoersion 1 189 1 38 I 43 1 20.0 1 22.8 I--- ----- ------ ----------------- -----------------------io rocet 10 I 2 1 2 88 I 1 I:
rudimentary 4 1 2 2 5confirmatory 5 I 1 0 I 20.0 1 0.0 I
exploratory 9 I 0 0 20.0 0.0 1confirmatory 6 I 1 I 20.0 1 0.0 1exploratory 4 I 1 2 I 20.0 1 50.0 1*
------- --- ----------- .-- 4
I product 1 17Q 36 41 I 20.0 I 22.9 Irudimentary I115 I 23 I 28 20.0 2 24.:3confirmatory 26 5 I 20.0 30.8 *exploratory 89 i1 20 20.0 22.5 I
1 elaborative 64 13 13 20.0 I 20.3I confirmatory I 3 I 11I 11 20:0 I33:3I
expqoratory 61 12 12 20.0 19.7conf irmatory 29 6 9 20.0 31.0 1*
I exploratory 1 150 I 30 32 I 20.0 I 21.3 I
I rudimentary 124 25 °30 i20.01 24.2confirmatory 31 6 8 20 25.8exploratory 93 19 22 20.0 23.7
elaborative f 65 13 13- 20 Iconfirmatory 4 1 1 20.0 25:0
expLoratory 61 12 12 20.0 19.7confi rmatory 35 7 9 20.0 25.7 1exploratory 154 1 31 34 1 20.0 1 22.1 I
97a
CHAPTER VII
but when tempered by consideration of tne listinction
!-et.een confirmatory and exploratory aspects, the stucy's
location results Dear strong statistical impact for buth
orocess and product. They are better explained as the
consequence of some true effect related to the experimental
treatments, rather than as a random phenomenon.
It is also clear from the table that the dispersion
results, dealing with the oredictaoility of software
developmnent behavior, have little statistical impact in
_eneral. This is due primarily to the aiminished power of
statistical orocedures usea to test for dispersion
differences, compounded y the small sample sizes involved
anJ tne coarseness of many of the programming aspects
themselves. The lack of strong statistical impact in this
area of the study does not mean that the dispersion issue is
unimportant or undeserving of research attention, but rather
that it is "a tougher nut to crack" than the location issue.
The study's dispersion results are still worth pursuing,
however, as possible hints of where differences might exist,
provided this disclaimer regarding their impact is heeded.
A Relaxed 2iffe rentjajtioQn iew
As described in Chaoter VI, the research framework of
possiole three-way comparison outcomes provided the basis
for converting the statistical results into the statistical
concl.usions. This framework has two innerent structural
characteristics that may be exploited to make additional
observations regarding the statistical conclusions. These
structural characteristics and the supplemental views of the
conclusions that they afford are described here and in the
next subsection.
The first structural characteristic is that each
TABLE 4
0 I- - -44-Z
IG 11 Z.4 -'0.0
z 4-. C- w a
00 4*~~~~~ C2. O.,.4 a02 0 4
oa * 3- ao a .c a j a44...... 4 -
o I---- - - -- - - - -- Z2 0 cc
coo-----W---- - q-4
I .. co z 'L ... . .. 4....x...... ...
4 0 0 '0 0 0.20 2
- 0 ft 2.J rcrxx r 4 haL Ir thiI
C 4 I 4.4ft ~ 0 000 0=0000 00=2=
QM P -L 'Z=040004-= 04 0 - - - - -40..
m 2 --w- 00 --- != -o ==; 4-4.. 4 00
* 4 I oo =22 - - - - - ---- -- ----- - 44 ! 2f 444~~~ ...... 2tt~f~t ' t~t 4s.f
C ~ ~ ~ Z 4 . a - f4 4 4 f a.
O 4 I ~~C:7- In 23i4 44O.:. .. - ..... 4 24
- 4v =c cr 8I. K a- 20 8882. hi0 00-c hic. C. CI 0i 22 hahilh 0 'h' CD h0 4 0
4 c I 112 44 4.a4.44hh444444fh 0 2 f 4 4 4 L
4 -2a44a 022i1hiatIf~fhiia .. h4.4 0 .. O~hth98a 4 h
TABLE 4
1 1 06
a a -I oa 0 a-
* a a -a
-; - f. - -. aol
a~~Q aw .0
o~W- a a -240 53 -
o~~ =a I I ZiC 4wnai -0 .
= =aZ .4 00 c0u a- r a=
a ~ ~ o I t z - -o ,2 0 c 200 -I
a ~ ~ 2 a t_ OZ - - - - cc... - a ItI a a 4-~ O- 0-a .JC .. 2 5S' a
5 a-. azz Ixa0~ 040 2 - 0 - -
11.4~ 24 - 4 0 0 4 - -W ~~ c
- a 0a. >-a ~ 4 -c Ca o~ 2 a-a oo1a a
- a a 40- sa 25IO 0~ 001 O-D QOQ .M..00a
I lo ai 0I ~ 2aaC .W Wklltt
* r all13 l . 55t -3 O 2 a. tLloco o.uaZ 4 1 0 0 - .. a .CO acca 4
* ao -- c- - .an za-e~a- 2.5 aa34z98ba a-
CHAPTER VII
coipletely differentiated outcome is related to a specific
rair of partially differentiated outcomes, as shown in the
lattice of Figure !.1. For example, Al < AT < DTt a
completely differentiated outcome, naturally weakens to
either AI < AT = DT or Al = AT < DT, two partially
differentiated outcomes.
Eaci completely differentiated outcome consists of
three oairwise differences (AI < AT, AT < DT, AI < DT in the
ex3mole), while each partially differentiated outcome
consists of only two pairwise differences plus one pairwise
equality (Al < DT, AI < AT, AT = DT and AI < DT, AT < PT,
Al = AT in the example). The "outer" difference of the
completely differentiated outcome (Al < DT in the example)
is common to ooth partially differentiated outcomes, while
each partially differentiated outcome focuses attention on
one of the two "inner" differences (AI < AT and AT < DT in
the example) to the exclusion of the other "inner"
difference which is "relaxed" to an equality. Within a
statistical environment or model which places a premium on
cl3iming differences instead of equalities, a partially
differentiated outcome is a safer statement, containing less
error-prone information than a completely differentiated
outcome. Since these outcomes represent statistical
conclusions, the same data scores which support a completely
differentiated outcome at a certain critica.L Level also
s..pOort each of the two related partially differentiated
outcomes at tower critical Levels.
Thus, every completely differentiated conclusion may
also oe considered as two (more significant) partially
differentiated conclusions, each of these three conclusions
having equal ano complete statistical legitimacy. The
"outer" difference of a completely aifferentiated conclusion
is, of course, stronger than either of its two "inner"
99
CHAPTER VII
differences; but the strengths of the two "inner"
differences (relative to each other) will vary in accordance
with the data scores ano inoeed are reflected in the
significance Levels of the two corresponding partially
differentiated conclusions (relative to each other). Tables
5.1 and 5.2 give the details of this "relaxed
differentiation" analysis for each of the completely
differentiated conclusions found in the study, ana an
English paraphrase appears in the two paragraphs immediatetL
below. AlL of the partially differentiated conclusions
Listea in these tables shoulo be added to those presented in
Taoles 2 and 4; they deserve full consideration in any
analysis or interpretation of the study's findings.
Hooever, in the case that one of a partially differentiated
pair is noticeably stronger than the other, it is fair to
consider only the stronger one for the puroose of analysis
or interpretation dealing primarily with partially
differentiated outcomes, since the study is mainly concernzo
with the most pronounced difference affordea by each
asDect's data scores.
Un Location comparisons, four programming aspects
yielded completely differentiated conclusions. They are
"relaxed" to partially differentiated conclusions as
fol Lows:
1. From DT < AI < AT on the PROGRAM CHANGES aspect, the
DT < AI = AT conclusion dwarfs the DT = Al < AT
conclusion with respect to Level of significance.
2. The DT < AT difference is more pronounced than the
Al < DT difference from Ar < DT < AT on the LINES
aspect.
3. AT < DT < Al on the (SEGvENT,GLOBAL) USAGE PAIR RELATIVE
PERCE14T4GE\ENTRY asoect is more appropriately "relaxed"
to the AT < DT Al conclusion than to the AT = OT < Al
conclusion.
TABLE 5
4 I U64 11 4 V4
I -6 00 11 ~ 44ma I -- a9 9- a 0 I 6 0 10 --. 9
*-to:~ 00 00 00 00 * Mn 6 * cc 00; cc 00,a~ *4 V1 *4 . . .
aa ec. 1 0: --- - *. -C .9. NNZ ANC - o~ so40 4 4 44~I 4 4 4 4
4 I SO: 4N N60 064i 0o c VIlv o
* 4 00 604 C. NQ 4b C 490II. - a
04. W, ~ 4 4 I2
41 1 4 VI4 r4VI V I 4- 46 4 I I I I
40D 4 4 I
6 . 4 C4 0 04 4, 44 CC 0. 4 6 I 0 00 0 0
CL4 0.: U 4 I
V, Cc 1 0 go .6C s- o V 1
CO 1 COON C 4 as 4 16 0 a
61 1 so 64 4 4 I 4
10 4 1 4 '1 0 N I .6
OW or.I a a ~ ~ 4-I 40 a a4 4 OIU 4 46 462I 4
U~I 46 * f4............. j. -a.l 4. . . .
'AC~C 4L 0 0C C 4
c ~ ~ OU 00 44 I
-Z 6.. C04 -1 -. N N 1 N N -4 I *0 4 -4 464 I 0 4 4 JZ
AD %2. I - 4 I
64C 4 4 C z I..
4~~ 4 9
0~~~~ 4 zz 4 N "' 0 4 44
.I4 4p 4 44
64 .J ..J .. 5 4 1N00a
CHAPTER VII
4. The AT < DT and DT < AI differences from AT < DT < Al on
the (SEGMENT,GLOaAL) USAGE PAIR RELATIVE PERCENTAGE\
ENTRY\MODIFIED aspect are equally strong.
On dispersion comparisons, four programming aspects
yielded completely differentiated conclusions. They dre
"relaxed" to partially differentiated conclusions as
fotl Lows:
?. The DT < AI difference is much more pronounced than the
Al < AT aifference from DT < Al < AT on the MAXIMUM
UNIQUE COMPILATIONS FOR ANY ONE MODULE aspect.
2. From DT < Al < AT on the STATEMENT TYPE COUNTS\RETURN
aspect, the DT = Al < AT conclusion dwarfs the
DT < AI = AT conclusion with respect to Level of
si gni f icanc e.
!. AI < DT < AT on the (SEGMENTGLOBAL) POSSIBLE USAGE
PAIRS aspect is more appropriately "relaxed" to the
AI < AT = DT conclusion than to the DT = Al ( AT
conclusion.
4. The Al < DT difference is more pronounced than the
DT < AT difference from Al < DT < AT on the
(SEGMENTGLOBAL) POSSIBLE USAGE PAIRS\NONENTRY\
UNMODIFIED aspect.
A Direqtionis_ View
The second structural characteristic of the possible
outcome framework is that the outcomes may be classified
into another closely related set of directionless outcomes,
as shown in the lattice of Figure 3.2. For examole,
Al < AT = DT and AT = DT < AI, two directional partially
different iatea outcomes, both correspond to AI 0 AT = DT, a
nonairectional partially differentiated outcome. ALl six of
the directional completely differentiated outcomes
correspond to the single nondirectional completely
7~1
CHAPTER VII
lif fferent iated outCome Al A AT i DT.
iy emphasizing just the existence and not the direction
ot distinctions between the treatment groups, these
d1irectionless outcome categories focus attention on tne
original research issue of discovering which observaoke
programmina aspects differentiate among the three
prog,ramminq environments. In particular, there are three
nonairectional partially differentiated outcomes (each of
the form "one group different from the other two whicn are
similar"), and it is noteworthy to observe just what set of
orogra-ming aspects supports each of these basic
distinctions. (Table 4 is arranged so that the directional
distinctions listed there can be readily coalesced by eye
into airectionless categories.) It is revealing to note
that, with one exception, the directionLess distinctions on
location comparisons segregate cleanly along the process-
versus-product dichotomy Line: all of the product
distinctions fat into the Al A AT = DT or AT A DT = Al
categories, while the process distinctions consistently fall
into the DT * AI = AT category. Interestingly enough, the
one exception is that a number of the cyclomatic complexity
metric variations (which are product aspects) show the
DT $ AI = AT directionless outcome (which otherwise
ch3racterizes only process aspect distinctions).
The purpose of this concluding section is to point out
wh3t seem to be the "top ten" (well, eleven and nine) most
noteworthy conclusions from among the study's objective
results. These conclusions are interesting individually,
either because the programming aspect merits attention or
because the difference in its exvectency or predictability
is pronounced (as indicated by a Low critical siqnificance
102
!CHAPTER VII
level) in the experimental sample data.
Noteworthy jOrjtjg distinctions are mentioned below.
1. According to the DT ( AI = AT outcome on the COMPUTEP
JCB STEPS aspect, the ois:iplined teams used very
noticeably fewer computer job steps (i.e., module
compilations, program executions, and miscellaneous joo
steps) than either the ad hoc individuals or the ad hoc
teams .
2. This same difference was apparent in the total number of
module compilations, the number of unique (i.e., not
identical to a previous compilation) module
compilations, the number of program executions, and the
number of essential job steps (i.e., unique module
compilations plus program executions), accoroing to the
OT < Al = AT outcomes on the COMPUTER JOB STEPS\MODULE
COMPILATION, COMPUTER JOB STEPS\MODULE COMPILATION\
UNIQUE, COMPUTER J03 STEPS\PROGRAM EXECUTION, and
COMPUTER JO STEPS\ESSENTIAL aspects, respectively.
3. According to the DT < Al = AT outcome on the PROGRAM
CHANGES aspect, the disciplined teams required fewer
textual revisions to build and debug the software than
the ad hoc individuals and the ad hoc teams.
4. There was a definite trend for the ad hoc individuals to
hveeproduced fewer total symbolic lines (including
co entst compiler directivest statementst
declaNQations, etc.) than the disciplined teams who
produced fewer than the ad hoc teams, according to the
Al < DT < AT outcome on the LINES aspect.
5. According to the AI < AT = DT outcome on the SEGMENTS
aspect, the ad hoc individuals organized their software
into noticeably fewer routines (i.e., functions or
procedures) than either the ad hoc teams or the
uisciplinea teams.
6. The ad hoc individuals aisplayed a trend toward having a
103
CHAPTER VII
yreater number ot executable statements per routine
than did either the ad hoc teams or the disciptineo
teams, according to the AT = DT < AI outcome On the
AVERAGE STATEMENTS OER SEGMENT aspect.
7. ~According to the DT = Al < AT outcomes on the STATEMENT
TYPE COUNTS\IF and STATEMENT TYPE PERCETAGE\IF
dspects, both the ad hoc individuals and the
uisciplinej teams coded noticeably fewer IF statem.ents
than the ad hoc teams, in terms of both total number
ana percentage of total statements.
'. According to the DT = Al < AT outcome on the DECISIONS
aspect, both the ad hoc individuals and the disciplinea
teams tended to code fewer decisions (i.e., IF, wHILE,
or CASE statements) than the aa hoc teams.
Q. both the ad hoc teams and the disciplined teams declarec
i noticeably Larger number of data variables (i.e.,
scalars or arrays of scalars) than the ad hoc
individuals, according to the AI < AT = DT outcome on
the DATA VARIABLES aspect.
13. According to the AT = DT < AI outcome on the DATA
VARIABLE SCOPE PERCENTAGES\NONGLOBAL\LOCAL asoect, the
ad hoc individuals had a larger percentage of local
variables compared to the total number of declared data
variables than either the ad hoc teams or the
discip)lined teams.
11. There was a slight trend for ooth the ad hoc
individuals and the disciolineu teams to have fewer
potential data bindings (i.e., possible communication
paths between segments via global variables, as allowed
oy the software's modutarization) than the aa hoc
teams, according to the DT = Al < AT outcome on the
(SEG:4ENTGLOBAL,SEGMENT) DATA bINDINGS\POSSIBLE aspect.
Noteworthy d!-J2P}rp_2in distinctions are mentioned below.
l. There was a noticeable difference in variability, with
I34
CHAPTER VII
the disciplined teams Less than the ad hoc individuals
Less than the ad hoc teams, in the maximum number of
unique compilations for any one moaule, according to
the DT < Al < AT outcome on the MAXIMUM UNIQUE
COMPILATIONS FOR ANY ONE m ODULE aspect.
"he ad hoc individuals exhibited noticeably greater
variation than either the ad hoc teams or the
oisciplined teams in the number of miscellaneous job
steps (i.e., auxiliary compilations or executions of
something other than the final software project),
according to the AT = DT < Al outcome on the COMPUTER
JOB STEPS\MISCELLANEOUS aspect.
3. According to the DT = Al < AT outcome on the AVERAGE
SEGmENTS PER MODULE aspect, the a, hoc individuals and
the disciplined teavs both exhibited noticeably Less
variation in the average number of routines per modulethan the ad hoc teams.
4. According to the DT = Al < AT outcomes on the STATEMENT
TYPE COUNTS\RETURN and STATEMENT TYPE PERCENTAGES\
RETURN aspects, the ad hoc teams showed rather
noticeably greater variability in the number (both raw
count and normalized percentage) of RETURN statements
cooed than both the disciplined teams ana the ad hoc
individuals.
5. In the number of calls to orogrammer-defined routines,
the ad hoc individuals displayed noticeably greater
variation than both the ad hoc teams and the
disc ipLined teams, according to the AT = DT < AI
outcome on the INVOCATIONS\NONINTRINSIC aspect.
5. According to the DT < Al = AT outcome on the DATA
VARIABLES SCOPE PERCENTAGES\GLOSAL\NONENTRY\MODIFIED
aspect, the disciplined teams displayed noticeably
smaller variation than either the ad hoc individuals or
the ad hoc teams in the percentage of commonplace
(i.e., ordinary scooe and modified during execution)
105
CHAPTEK VII
IObd t variables compared to tne total number of data
variatles ajeclared.
7. Tnh ad noc individuals displayed noticeanly Less
v3riation in the number of formal parameters passed by
reference than botn the ad hoc teams and the
disciplined teams, 3ccoroing to the Al < AT = DT
outcome on the DATA VARIA 3LE SCOPE CCUNTS\NONGLD6AL\
PARAMETER\REFERENCE aspect.
Q. According to the Al < DT < AT outcome on the
(SEG-ENT,GLOBAL) POSSIbLE USAGE PAIRS aspect, there was
a noticeable difference in variability, with the ad hoc
individuals less than the disciplined teams Less than
the ad hoc teams, for the total numoer of possible
segment-global usage pairs (i.e., occurrences of the
situation where a global variaule coulo be modified or
accessed by a segment).
Q. According to the DT = AI < AT outcome on the
(SEGMENT,GLOBAL,SEGMENT) DATA BINDINGS\POSSI8LE aspect,
the ad hoc teams tended toward greater variaoility than
either the ad hoc individuals or the disciplined teams
in the number of potential data bindings.
CHAPTEQ VIII
VIII. INTERVRETI~. a_ SUL11
This chapter reports the interpretive results of the
stjdy, namely the research interpretations based on the
conclusions oresented in Chapter VII. The tone of
discussion here is purposely somewhat subjective and
opinionated, since the study's most important results are
derived from interpreting the experiment's immediate
finainas in view of the study's overall goals. These
interpretations also express the researcher's own estimation
of the study's implications and general import according to
his Drofessional intuitions about programming and software.
The interpretations presented here are neither
exhaustive nor unique. They only touch upon certain overall
issues and generally avoid attaching meaning to or giving
exolanation for individual aspects or outcomes. It is
anticipated that the reader and other researchers might
formulate additional or alternative interpretations of the
stidy's factual findings, using their own intuitive
juigments.
Tdo distinct sets of research interpretations are
discussed in the remainder of this chapter. The first set
states general trends in the conclusions according to the
basic suppositions of the study. The second set states
general trends in the conclusions according to a
classification of the programming aspects which reflects
certain abstract programming notions (e.g., cost,
modularity, data organizations, etc.).
The study's "basic suppositions" (or "hypotheses") are
107
CHAPTEP VIII
a set of simpleminned a priori expectations regaroing
dif ferences among the exoerimental programming environments
for location ano dispersion comparisons on process anc
prD,.uct aspects. These basic suppositions are stated in the
fol.owing table:
--------------------- ---------jasic Suppositions I for Location Ifor Dispersioni
I Comparisons I Comparisons I- ---- -- ----------------------- -----------------------------I on Process Aspects I DT < AI = AT I DT < AI = AT I---- --------------------------------------------
I I DT = At < ATe DT = AI < ATI on Product Aspects I Ar A or
I AT < DT = AII AT < DT = AI------------------------------------------------- -
The basic suppositions are founded upon "general
beliefs" regarding software phenomena, which had been
formulated oy the researcher prior to conducting the
experiment. These gener3L beliefs state that
(a) methodological discipline is the key influence on the
yeneral efficiency of the process;
(u) the disciplined methoaoLogy reduces the cost and
complexity of the process and enhances the
predictability of the process as well;
C) the preferred direction for both location and dispersion
jifferences on process aspects is clear and
undebatable, because of the familiarity of the process
aspects and the direct applicaoility of expected values
and varianc s in terms of average cost estimates and
tightness of cost estimates;
(a) "mental cohesiveness" (or conceptual integrity [3rooks
75, pp. 41-50]) is the key influence on the general
Quality of the product;
(e) a programming team is naturally burdened (relative to an
individual programmer) by the organizational overhead
and risk of error-prone misunderstanding inherent in
coordinating and interfacing the thoughts and efforts
o4 those on the tean;
(W) the disciplined methodoloy induces an effective mental
7 Q~h
CHAPTER VIII
cohesiveness, endbling a programming team to behave
more like an individual programmer with respect to
conceptual control over the program, its design, its
structure, etc., because of the disc ipLine's
antiregressive, comolexity-controlling FBelaay and
Lehman 76, o. 245] effect that compensates for the
inherent organizational overhead of a team; and
( ) the oreferred direction for both Location and dispersion
oifferences on oroduct aspects is not always clear,
c ecause of the unfamiliarity of many of the product
aspects anu a 3eneral lack of understanding regarding
the implication of dispersion for product aspects.
In view of the general beliefs and basic suppositions
stateo above, each possiole comparison outcome (Cf. Figure
3) may be ."egaraed as "voting" either for or against a given
basic supposition (or as "abstaining"), depending on whether
that outcome would substantiate or contravene the
corresponding general beliefs. For process aspects,
(1) outcome DT < AI = AT obviously affirms the
suppos it ion;
(2) outcomes DT < Al < AT or DT < AT < Al, which are
completely aifferentiated variations of the
supposition's main theme, indirectly affirm the
suPposition, especially when DT < AI = AT is the
stron; er of the corresponding partially
differentiated outcome pair;
(3) outcome Al = AT = DT may negative the supposition,
or it may be considered an abstention for any one
of several reasons
(it is possible that (a) the asoect's
critical level is not low enough, so it
defaults to the null outcome; (b) the aspect
reflects something characteristic of the
acplication/task or another factor common to
1,
ChAPTEP VIII
d L i the jroups in the exL)eriment; or (c) the
aspect measures somethin fundamentaL to
software development phenomena in general ano
would aldays result in the null outcome); and
(4) all other outcomes -- Al < AT < DT, AI < DT < AT,
C DT < , AT C AI <T, AI A AT = DT
(AI < AT = DT, AT = DT < AI), AT AI DT
(AT < DT = Al, DT = Al < AT), and Al AT < T --
neaative the supoostion.
For oroduct aspects,
(1) outcomes AT t DT = AI (AT < DT = Al, DT = Al < AT)
obviously affirm the suppostion;
(2) outcomes Al < DT < AT or AT < DT < Al, which may oe
considered approximations to the supposition (DT
is distinct from AT but falls short of AI, Jue to
lack of experience or maturity in the disciplined
methodology), indirectly affirm the supposition,
esoecially when DT = Al < AT or AT < D- = Al
(respectively) is the stronger of the
corresponding oartially differentiated outcome
pair;
(3) outcome AI = AT = DT may negative the supposition,
or it may be considered an abstention for any one
of several reasors
(it is possible that (a) the aspect's
critical level is not low enough, so it
aefaults to the null outcome; (b) the aspect
reflects something characteristic of the
arplication/task or anotner f3ctor common to
all the groups in the experiment; (c) the
aspect measures something fundamental to
software development phenomena in general and
would always result in the null outcome; or
(d) several of the study's hit-and-miss
collection of exploratory Proauct aspects are
CHAPTER VIII
Juds and vay be ignored as useless software
measures);
(4) outcomes AI < AT < DT, AT ( Al < DT, DT < Al < AT,
and DT < AT < Al negative the supposition;
(5) outcomes OT A AI = AT (DT < AI = AT, AI = AT < DT)
negative the suppostions, especially discreciting
the belief that "mental cohesiveness" is the key
influence on the product; and
(6) outcomes Al A AT = DT (AI < AT = PT, AT = DT < AI)
negative the Supposition, especially discrediting
the belief that discipline methodology effectively
molds a team into an individual.
Thus, interpreting the study's findings accordin to
the Dasic suppositions consists of assessing how well the
research conclusions have borne out the basic suppositions
and how well the experimental evidence substantiates the
cenerat beliefs. On the whole, the study's findings soundly
support the general beliefs ,resented aoove, although a few
conclusions exist that are inconsistent with the Dasic
suDpositions or difficult to allay individually.
Support for the general beliefs was relatively stronger
on process aspects than on product aspects, and in Location
conparisons rather than in dispersion comparisons.
Overwhelming support came in the category of Location
cotparisons on process aspects in which the research
conclusions are distinguished by extremely low critical
levels ano by near unanimity with the basic supposition. In
the category of dispersion comparisons on process aspects,
only two outcomes indicated any distinction among the
qroups: one aspect supported the study's general beliefs and
one aspect showed an explainable exception to them. Fairly
stronyj support also came in the category of location
co'mparisons on product aspects for which the only negative
111
CHAPTER VIIi
evidence (besides the neutrdl At = AT = DT conclusions)
a oearej in the form of several Al A AT = DT conclusions.
They inoicate some areas in .hich the disciplined
methodology was apparently ineffective in modifying a team's
behavior toward that of an individual, probably due to a
tack of fully developed training/experience with the
methocaology. Comparatively weaker support for the study's
beliefs was recordea in the category of dispersion
comparisons on product aspects. Although the basic
suopos-itions were borne out in a number of the conclusions,
tnere were also several distinctions of various forms whir."
contravene the basic suppositions.
Thus, according to this interpretation, the study's
findings strongly substantiate the claims that
(Cl) methodological discipline is the key influence on
the general efficiency of the software Oevelopment
process, and that
(CZ) the disciplined methodology significantly reduces
the material costs of software development.
The claims that
(C3) mental cohesiveness is the key influence on the
general quality of the software development
Proouctt that
(C4) relative to an individual, an ad hoc team is
mentally burdened by organizational overheau, and
that
(C S) the disciplined methodology offsets the mental
burden of organizational overhead and enables a
team to behave more Like an individual relative to
the software product,
are moderately substantiated by the study's findings, with
oarticularly mixed evidence for dispersion comparisons on
Pr3duct aspects.
CHAPTER VIII
It shouto ue noted that there is a simpler, better-
suoported interpretive model for the location results alone.
With the beliefs that a Jisciplined methooology provioes for
the minimum process cost and results in a proouct which in
sone aspects approximates the product of an individual and
at worst approximates the product developed by an ad hoc
tea.i, the suppositions are DT < AIAT with respect to
process and Al S DT < AT or AT S DT < AI with respect to
product. The study's findings support these suppositions
without exception.
l irgina: 12 2 rg!E{&m~inA=.ecp~t Edatsification
It is desirable to examine the study's findings in view
of the way that higher-Level programming issues are
reflected among the individual programming aspects. For
this purpose, the aspects considered in this study were
grouped into (so-called) programming aspect classes. Each
class consists of aspects which are related by some common
feature (for example, all aspects relating to the programos
statements, statement tyoes, statement nesting, etc.), and
the classes are not necessarily disjoint (i.e., a given
asoect may be included in two or more classes). A unioue
higher-teveL programming issue (in the example, control
structure organization) is associated with each class.
The programming aspects of this study were organized
into a hierarchy of nine aspect classes (with about 10%
overlap overall), outlined as follows:
11!
CHAPTER VIII
!iher-tevet Pro~rammi ng tW:£.s_
Development Process EfficiencyEffort (Job Steps) . . . . . . . IErrors (Program Chances) . . . . . . II
Final Product GualityC ross Size . . . . . . .Controt-Construct Structure . . • . . IVData Variable Organization . . .. . VModulari ty
Packaalng Structure . . . . . . VIInvocation Organization ... . . VI
Inter-Segment Communicationvia Parameters . . . . . . . VIIIVia Global Variables . . . . . . . IX
The individual aspects comprising each class, together with
the corresponding conclusions, are listed by classes in
Ta:les c.1 throuah 6.9. For each aspect class, it is
interesting to jointly interpret the inoividual outcomes in
an overall manner in order to see something of how these
higher-level issues are affected by team size and
-tetnodological discipline.
Class I: EffQCt (Joo Steos)
Within Class I (process aspects dealing with COMPUTER
J03 STEPS), there is strong evidence of an important
difference among the groups, in favor of the disciplined
methooology, with respect to average development costs. As
a class, these aspects directly reflect the frequency of
computer system activities (i.e., module compilations and
test program executions) during development. They are one
possible way of measuring machine costs, in units of oasic
activities rather than monetary charges. Assuming that each
cotputer system activity involves a certain expenditure of
the programmer's time and effort (e.g., effective terminal
coitact, test result evaluation), these aspects indirectly
reflect human costs of development (at least that portion
exclusive of design work).
The strength of the evidence supporting a difference
with respect to location comparisons within this class is
114
TABLE 6
* ~ . 1- - 44
0 - 1 #- 411,
*~~ ~ 0-4 0 a0 a0
*--* .9 *4 4**e
.4c 4... -KC I 4 .0
4C~~.1 21 *
o404 ~ v vl v - 40 - e
1 00- z C I t 44- 4 4
- ~~Z 4 C4I 4 cc4* 4 0411II 111.4II I .;:. 64 V6
O O x 4
- 4 4 - W I!!wv INC"~ a n 4-4 :c
o ~ N . 14I9 4 b4V9%~ Nt~'eh ' . 4 4
b 000 060
o eCU * i *14a
CHAPTER VIII
haser. on both (a) tne near unanimity CS out of 9 aspects] of
the T < AI = AT outcome ano (b) the very Low critical
Levels 1<.025 for 5 aspects] involved. Indeea, the single
exception among the Location comparisons (Al = AT = DT on
C34PUTER JOB STEPS\MODULE COMPILATIONS\IDENTICAL) is readily
exoLained as a direct consequence of the fact that all teams
made essentially similar usage (or nonuse, in this case,
since icenticaL compilations were not uncommon) of the on-
Line storage capability (for saving relocatabLe modules and
thus avoiding identical recompiLations). This was expected
since all teams had been provided with identical storage
ca:aoility. but without any training or urging to use it.
The conclusions on Location comparisons within this class
are interpreted as demonstratiny that
employment of the disciplined methodology by a
programming team reduces the average costs, ooth
machine and human, of software development, relative to
both individual programmers and programming teams not
employing the methodology.
Examination of the raw data scores themselves indicates the
m&gnitude of this reduction to be on the order of 2 to 1
(i.e., 50%) or better.
aith respect to disoersion comparisons within this
class, the evidence generally failed to make any
distinctions among the groups CAI = AT = DT on 7 out of 9
asoects], These null conclusions in dispersion comparisons
are interpreted as demonstrating that
variability of software development costs, especially
machine costst is relatively insensitive to programming
team size and degree of methodological discipline.T he two exceCtions on individual process aspects deserve
mention. The COM PUTER JOB STEPS\MISCELLANEOUS aspect showed
a AT = DT < Al dispersion distinction among the groups,
reflecting the variaoility (as expected) of individual
115
CHAP T EP VIII
programmers relative to orogramming teams in the area of
buiLding on-Line tools to indirectly support software
deveLoprment (e..g., stand-alone module drivers, one-shot
auxiliary computations, tabLe generators, unanticipated
debugging stubs, etc.). The MAX UNIQUE COMPILATIONS F.A.O.
"ODULE aspect showed a DT < Al = AT dispersion distinction
arong the groups at an extremely Low critical level [<.0O53,
reflecting the Lower variation (increased orecictabitity) of
th- isciptined teams relative to the ao hoc teams and
individuals in terms of "worst case" compilation costs for
any one module. The additional Al < AT distinction for this
comPparison is attributaote to the fact that several teams in
graup AT built monolithic single-module systems, yielaing
rather inflated raw scores for this aspect.
Class 11: Errorj (Progran Changes)
qithin Class I (the process aspect PROGRAM CHANGES),
there is strong evidence of an important difference among
the groups, again in favor of the disciplined methodology,
witn respect to average number of errors encountered during
imolementation. Chapter V contains a detailed explanation
of now program changes are counted. This aspect directly
reflects the amount of textual revision to the source code
during (postdesign) development. Claiming that textual
revisions are generally necessitated by errors encountered
while nuilding, testing, and debugging software, independent
research [Dunsmore and Gannon 77) has demonstrateo a high
(rank order) correlation between total program changes (as
counted automatically according to a specific algorithm) and
total error occurrences (as tabulated manually from
exhaustive scrutiny of source code and test results) during
software implementation. This aspect is thus a reasonable
measure of the relative number of programming errors
encountered outside of design work. Assuming that each
116
TABLE 6
* 464w C D. 4f 4" 4004 4 4 0 4
I * : .z: : . a,* . 0**C ace 4004
40W 4 4 40.04
4.1*me4 4
:%#: C 4eze44 0
400.4 4 4
0-4. 44 - 4 4
-%~ ~~~ ae~
CHAPTER VIII
textual revision involves an expenoiture of programmer
effort (e.g., planning the revision, on-Line editing of
soirce code), this aspect inoirectty reflects the Level of
huian effort devoteu to implementation.
with respect to location comparison, the strength of
the evidence supporting a difference among the groups is
based on the very low critical level [<.0051 for the
DT < Al = AT outcome. The additional trend toward AI < AT
is much less pronounced in the data. The interpretation is
that
the disciplined methodology effectively reduced the
average number of errors encountered during software
implementat ion.
This was expected since the methodology purposely emphasizes
the criticality of the design phase and subjects the
software design (code) to thorough reading and review prior
to coding (key-in or testing), enhancing error detection and
correction prior to implementation (testing).
with respect to disoersion comparison, no distinction
among the groups was apparent, with the interpretation that
variability in the number of errors encountered auring
implementation was essentially uniform across all three
programming environments considered.
Class III: Groql _Sz
within CLass III (product aspects dealing with the
qross size of the software at various hierarchical levels),
there is evidence of certain consistent differences among
the groups with respect to both average size and variability
of size. As a cLass, these aspects directly reflect thenumber of objects and the average number of component
(sub)objects per object, according to the hierarchical
117
TABLE 6
*cl -4 11a
c ~ lo- '0 1. 10 4 C
0; I C I OC I ca 0:. 0 a f if m Is of of
w a~ . I . * *I 4
.cu cc a CIO ac C' :CO:
c0 I a a 4 49 1
*mC '!1a a a 0 .
4~b4a i a a0 0. -
4--a. 1. I I I 4 4
* 0 I ow S * Cmi 10~I Vs~,0 1. *iaaaa Iav I: 4 4
* 04 I sel I I ~ ac-6 4
a Wi 1 0 . 1 a w 44 I 4 1
0 4 *4 I so 04 I 0
400 :,Ja -im 1 WA IP *
* a go 'Jaw 44 I I Ip- I-I. . I VIC I 0. 4Pb4
10,211 Z.cca 0 ~ .
I0 X-C.
C- -C I U la- --c - do I 4 I P-04 4 144 144 04 ~ 4 404
a- - 4 I a ~ 117a
CHAPTEP VIII
organization (imposed by the programming Language) of
sof tw re into objects such dS modules, segments, Jata
variautes, Lines, statements, and tokens.
,,ith respect to Location comparisons witnin this class,
the non-null conclusions [L7 out of 17 aspects] are nedrLy
unanimous C5 out of 7] in 'he AI < AT = DT outcome. The
interpretation is that individuals tend to produce software
which is smaLter (in certain ways) on the average than that
orDouceo by teams. It is unclear wnether such spareness of
exDression, primarily in segments, global variables, ano
formal parameters, is advantageous or not. The two non-null
exceptions to this Al < AT = DT trend deserve mention, since
the one is Only nominally eAceotional and actually
supportive of the tendency upon closer inspection, whiLe the
other indicates a size aspect in which the discipLineo
methooology enabled programming teams to break out of the
pattern of distinction from individual programmers. The
AT = DT < AI outcome on AVERAGE STATEMENTS PER SEGMENT is a
sirple conseduence of the outcome for the number of
STATEMENTS CAI = AT = DT] and the outcome for the numoer of
SESMENTS CAl < AT = DT) and it still fits the overall
oattern of At A AT = DT on location differences on size
asoects. On the LINES aspect, the DT = Al < AT distinction
breaks the pattern since DT is associated with AI and not
with AT. Since the number of statements was roughly the
sa-me for all three groups, this difference must be due
mainLy to the stylistic ianner of arranging the source code
(which was free-format with respect to Line boundaries), to
the amount o documentation comments within the source code,
and to the number of Lines taken up in data variable
declarations.
with respect to dispersion comparisons within this
class, the few aspects which do indicate any aistinction
11
CHAPTER VIII
aman4 the groups [5 out of 17 aspects] seem to concur on the
Al = AT < DT outcome. This pattern, which associates
increased variation in certain size aspects with the
disciplined methodology, is somewhat surprising and tacks an
intuitive exolanation in terms of the experimental
tr.atments. The exception DT = Al < AT on AVERAGE SEGMENTS
PE? ',ODULE is really an exaggeration due to the fact uf
several AT teams implementing monolithic single-module
systems, as mentioned above. The exception AT < DT = Al on
STAT.:_ENTS is only a very slight trend, reflecting the fact
th3t the AT products rather consistently contained the
largest numbers of statements.
Oje overall observation for Class ill is that while
certain distinctions did consistently appear (especially for
location out also for dispersion comparisons) at.the middle
levels of the hierarchical scale (segments, data variibles,
lines, ano statements), no distinctions appeared at either
the highest (modules) or lowest (tokens) Levels of size.The null conclusions for size in modules and average module
size seem attributaole to the fact that particular
programming tasks or application domains often have standard
designs at the topmost conceptual levels which strongly
influence the organization of software systems at this
highest level of gross size. In this case, the symOol-
ta le/scanning/parsing/code-generation design is extremely
cormon for language translation problems (i.e., compilers),
regaruless of the particular parsing technique or symrol
taoLe organization employed, and the moaules of nearly every
system in the study directly reflected this common design.
The null conclusions for size in tokens is interpretable in
vied of Halstead's software science concepts LHalsteaa 77),
accoroing to which the program length N is predictaole
from the number of basic input-output pdrameters T1 and
the language level X . Since the functional specification,
119
CHAP T EP VIII
3 lication area, and implementation Language were all fixed
in this study, uoth rl 2* and X shoulo be constant for
each of tne software systems, imolying virtually constant
crDrdm lengths N Since program length N can oe
re-:aroea as roughly equivalent to the numoer of tokens in a
proram, the study's oata seem to support the software
science concepts in this instance.
Ct3ss IV: coo!!rQ-SQI:ru_ ruIV!:t
aithin Class IV (proauct aspects dealinq with the
scft.ore's organization accoroing to statements, constructs,
and control structures), there are only a few distinctions
mnjJe oetween the groups.
with respect to location comparisons, the few [5 out of
24), aspects that Showed 3ny distinction at all were
unanimous in concluding DT = AI < AT. Essentially, three
particular issues were involved. The STATEMENT TYPE COUNTS\
IF, STATEMENT TYPE PERCENTAGES\IF, and DECISIONS aspects are
all related to the frequency of projrammer-coded decisions
in the software product. Their common outcome DT = Al < AT
is interpreted as demonstrating an important area in .hich
the cisciplined methodology causes a programming team to
behave like an individual programmer. The number of
decisions has been commonly accepted, and even formalized
[',cCaoe 76], as a measure of program complexity since more
decisions create more paths through the code. Thus, the
disci;LAined methodology effectively reduced the average
complexity from what it otherwise would have been. The
STATEMENT TYPE COUNTS\RETURN aspect indicates a difference
between the ad hoc teams and the other two groups. Since
the EXIT and RETUR% statements are restricted forms of
G&TOst this difference seems to hint at another area in
which the disciplined methodology improves conceptual
12-
TABLE 6
*I - : e
a c.a .N It aa a af n * o 1 1Nu001 f I tV
fill Ita ocVIV Ifa ia a at If N 1 1 1 4,c
.~ _ . : : . . .o. ~ ~ ~ ' 40-a1 a-- aa Q. C 4-
4CC ~ ~ ~ ~ 1 44 4 aa 0 4
"w 0N W li a 00 won 0a 41 of 5 to - *
- If 0 4 a a a a a 0 N ol 0 *,1 I
-~1 Cc- a a . 0- ~ ~ ~ ~ ~ ~ ~ ~ C . c .aCk ~ .
*~ ~ ~ ~ w 1 0p -a 0aaa 4 C
*~I a c Ca aa
C 00.. aZ
CIL aa a a I I-
- .OZ. a o :4 41 a &Z . .~~~~~t 4 : a a 4 0
ow at to a a0 a . Ua a 4 4 a a a-a-a a-1a42-a 4 4t
4 U 4 . a a aaa120a aa . e
a- U * 4 ra a a
CHAPTER VIII
control over projram structure. The STATEMENT TYPE COUNTS\
(P4OC)CALL\INTRIrNSIC aspect also indicates a slight trend in
the area of the frequency of input-outout operations, which
seems interpretable only as a result of stylistic
differences.
with respect to disoersion comparisons, only two
'articular issues were involved. The STATEMENT TYPE COUNTS\
;ETURI, ano STATEMENT TYPE PEPCENTAGE\RETURN aspects both
indicated a strong DT = AI < AT difference, suggestin that
tne frequency of these restricted GOTOs is an area in which
the disciplined methodology reduces variability, causing a
programming team to beha.ve more like an individual
crorammer. The STATEMENT TYPE COUNTS\(PROC)CALL and
STATEMENT TYPE COUNTS\(PROC)CALL\NONINTRINSIC aspects both
showed a DT < AI = AT distinction among the groups, which is
dealt with more appropriateLy within Class V11 below.
In summary of Class IV, the interpretation is that the
functional component of control-construct organization is
largely unaffected oy team size and methodological
discipline, probably due to the overriding effect of
croject/task uniformity/commonality. However, two facets of
the control component that were influenced were the
frequency of decisions (especially IF statements) and the
frequency of restricted GOTOs (especially RETURN
st3tements). For these aspects, the disciplined methodology
seems to have altered the size of the program's control
structure (and reduced its complexity) from that of a team's
product to that of an individual's product.
Class V: 2a1a arAaolj tjzCa±.&jU2o2
within Class V (product aspects dealing with data
variaoles and their organization within the software), there
121J
TABLE 6
S4w I66
c . - 4 Of V N v 16 0 H W V 1 0 1 V I 1 1 # o *4M i f V 1 4 1
18 *- . . 0 9. u 6 N u I'.I " uo it N I - 6
4 W .4 6.-' AI I C 0 009
C; 46 1' CO c0 # 0 06 C
4- I: a No 00a0 i N is N 0 0 1N a V i6 N M of
a. 0: V I' v N 00 6 a v 0000 N 0 4 N I I$I *
4~~0.4 L
IQU 4S VI 445.6I WIII Vh.416 NV MM6664465ro~i .-- 0 4
- 4~g4 6 N N I - N N I .
C 9 --- - N
oq a- at I
0 10 :;Peat W. N 4
-~~~~~w 4 ~ 0 J 6~0
020 400 0 0 ~ 6 afwa II 60.* ECU 4 6 66 3
*C0. Z0. **...Or-. ..... .... ....
~~~~1 1- a
C riA P TE P VI i
are several distinctions among the groups, with an overall
treni for both the Location and dispersion comparisons.
,ata variable organization was, however, not emphasized in
the disciplined methooology, nor in the academic course
which the participants in group DT , ere taking. with
re ,ect to location comparisusq all aSPectS Shoting anY
distinction at all were unanimous in concluding
AI * AT = DT. The trend for indiviouals to differ from
te3ms, regardless of the disciplined methodology, appears
not only for the total rumber of data variaoles declareo,
tut also for data variables at each scope level (global,
parameter, local) in one fashion or another. The difference
regaraing formal parameters is especially prominent, since
it Shows up for their raw count frequency, their normalized
Dercentage frequency, and their average frequency oer
natural enclosure (segment). With respect to dispersion
comparisons, the apparent overall trend for aspects which
show a distinction is toward the Al = AT < DT outcome. No
particular interpretation in view of the experimental
treatments seems appropriate. Exceptions to this trend
apoeareo for both the raw count and percentage of call-by-
reference paramenters (both Al < AT = DT), as well as two
other aspects.
Ct3ss VI: Packaing Structure
aithin Class VI (product aspects dealing with
moJularity in terms of the packaging structure), there are
essentially no distinctions among the groups, except for two
location comparison issues. Most of the aspects in this
class are also members of Class 1i1 , Gross Size, but are
(re)considered here to focus attention upon the packaiing
chiracteristics of modularity (i.e., how the source code is
livided into modules and segnents, what type of segments,
etc.). The disciplined 'nethooology did not explicitly
TABLE 6
'A Is V I NO N I Ist I
.c-s c;o404C I
40C* *. . . . . . .
soC 0 4 14 116 0 Is I 6 Is 6 4
* 1.1 64 II V III 1 1.0 II 1 UI Nh
owl4 S I -. MI :
- C C I -X 1 4V
, C 4 gO S I 4
6 4 I4 I I
99- 1- 3 I - I 4
469 1.4. I -- M I W 4
00 ; 4 . 0 O IOU "i'
*W161 46 5 60 I C
1 166 - 1 w m I I - $ 4
I. 4 I- I III W- C4* lc .. J As I c
-~~~~~1 COUNMNhW2 ggS.~ C -C e I I I
29 =wn GW W- I2 -
.0 6 4 *4 I 4122a
CHAPTER VIII
include (nor did group DT's course work cover) concepts of
modularization or criteria for evaluating good modularity;
hence, no particular distinctions among the groups were
expected in this area (Classes VI and VII).
with respect to Location comparisons, the Al < AT = DT
outcome for the SEGMENTS aspects, along with the companion
outcome AT = DT < AI for the AVERAGE STATEMENTS PER SEGMENT
asoect (as explained under Class IIl above), indicates one
area of packaging that is apparently sensitive to team size.
Individual programmers built the system with fewer, but
larger (on the average), segments than either the ad hoc
teams or the disciplined teams. The AI < AT = DT outcome
for the AVERAGE NONGLOBAL VARIABLES PER SEGMENT\PARAMETER
asoect indicates that average "calling sequence" Length,
curiously enough, is another area of packaging sensitive to
team size. With respect to dispersion comparisons, there
really were no differences, since the single non-null
outcome for AVERAGE SEGMENTS PER MODULE is actually a fluke
(raw scores for AT are exaggerated by the several monolithic
systems) as explained above. T he overall interpretation for
this class is that
modularity, in the sense of packaging code into
segments and modules, is essentially unaffected by team
size or methodological discipline, except for a
tendency by individual programmers towaro fewer, Longer
sejments than programming teams.
CL3ss VII: LnvoCat. n QrCaniati2n
Within Class VII (product aspects dealing with
modularity in terms of the invocation structure), there are
two distinction trends for location comparisons, but no
clear pattern for the dispersion comparison conclusions.
This class consists of raw counts and average-per-segment
1Z3
TABLE 6
C.: - • 1- 1* o- I e .
: Ic C n I a ; N *
. C cci Cc a0 0
c **. . . ....... .
'10 .9 1 a
on0. P.6a a * 4P
C 4w-C :0~C .8v.2-- I -; cc-
o .. .. . .. .. a. .. .. .. ..
- 42
' U @ mivm.ilmlui iii i
i..l~q i. m~uq Ous , i , l aw
- 9 . 5OZ -. l- 5- i .l l I- I a S 5 4
C 4 0 *k
c
*: 4 c 41aa
P. 00 C I
o o-. I.- a 4
* . !NZ** vivaz w ;0 10:
0 4 lZ I L = I OW
123a
CHAPTER VIII
frequencies for invocations (procedure CALL statementb or
function references in expressions) and is considered
separately from the previous class since modularity involves
not only the manner in which the system is packaged, out
also the frequency with which the pieces might be invoked.
For tne raw count frequencies of calls to intrinsic
prDceoures and intrinsic routines, the trend is for the
inJivi-luals and disciplined teams to exhibit fewer calls
th3n the ad hoc teams. These intrinsic proceoures are
alvost exclusively the input-output operations of the
language, while the intrinsic functions are mainly data type
conversion routines. The second trend for Location
co-"parisons occurs for two aspects (a third aspect is
actually redundant) related to the average frequency of
calls to programmer-defined routines, in which the
individuals display higher average frequency than either
tyoe of team. This seems coupled with group AI's preference
for fewer but larger routines, as noted above. With respect
to dispersion comparisons, several distinctions appear
within this class, but no overall interpretation is reaoily
apoarent (except for a consistent reflection of a DT < Al
difference, with AT falling in between, leaning one side or
the other).
Class VIII: Y -Sa eot f~mUoi~tion via Para:!ernfrs
within CLass VIII (orocuct aspects dealing with inter-
segment communication via format parameters), there are only
a few distinctions among the groups. With respect to
location comparisons, the total frequency of parameters and
the average frequency of parameters per segment both show a
difference. The interpretation is that
the individual programmers teno to incorporate Less
inter-segment communication via parameters, on tne
average, than either the ao hoc or the disciplined
124
TABLE 6
On or 0 0- S'00
Va= K I aK
b 0.0 0, "ON 1 on.0 : m. vu: VmS
- ; 40 1 S 4
O* I xm 4 v 4
4,~ " 2 I z
*3 4 0 * SI I
c 42 a 4-
OW CC.
-C : mm .5 -
- a , a- *1 oi*l0 . wo a
Ce- of .At I a
2a Im mi it 5- I 5-
o ~ 14 .0bm 122 m m .hD 4~0 S 5
4 bUC 5 5124a
CHAPTER VIII
programming teams.
witn respect to dispersion comparisons, in addition to the
difference in the raw count of parameters referred to in
Class V, there is a strong difference in the variability of
the number of caIl-oy-reference parameters, also apparent in
the percentages-by-type-of parameter aspects. The
interpretation is that
the individual programmers were more consistent as a
yroup in their use (in this case, avoidance) of
reference parameters than either type of programming
team.
CL3ss IA: 1n1L-feta Oten DIt £2uID2Q Iii 2 221k YitItI
within Class IX (product aspects dealing with inter-
segment communication via gLooal variables), there are
several differences among the groups, including two which
indicate the beneficial influence of the disciplined
methodology. This class is composed of aspects dealing with
absolute frequency of globals, average frequency of gLobaLs
Per module, segment-global usage pairs (frequency of access
paths from segments to gLobals), and segment-9lobal-segment
data bindings (frequency of communication oaths between
segments via global variables).
.ith respect to Location comparisons, there is the
Al < AT = DT distinction in sheer numbers of gLobats,
oarticularly gtobals which are modified during execution, as
noted in Class V. However, when averaged per module, there
apoears to be no distinction in the frequency of gLobats.
The Al < AT = DT difference in the number of possible
segment-gLobal access paths makes sense as the result of
group AI having both fewer segments and fewer gLobaLs. ALL
three groups had essentially similar average Levels of
actual segment-global access paths, but several differences
125
TABLE 6
o.~ ,e* ,i ** *** lac a, .Q ** °. **
* o* ,e6 * * I
U I I C! I Si so O
S 4 * " il. .... .* .U4VaUaaaa, aNle NUvSamaNuwuuaavu,.vumahaai ua a
a N 4 If 4 a 0 0 Is I I If If If H I
-4 a
* - I l 6
ID 4 -- a m
a 07a aa
IIO
: . s. .. . aq a1 al,. bJ aOII nm a1 . .
£ 4 • 0 Z OQ I* 0a€ lac a Z[ al
I~~~ OWLim
c12aa
W 0e 40a a 4
a a a0 a
0 0"*a a aW l W- a W- a I -
@ '0 4 a0-00 a a at aW,
~~m 4D4 Vivfoaaav a *Ub:aawo-amNzuvmmWMmIamwo- maw:,% a az c a aC a
*0 z 0*4d z4 a
4 U 4 44 4 a1a 5 a a4 4 a
TABLE 6
.O ~ C 4 a0 40:1 44 1~V
- 4 40 Z-
44 4 ... .4.
4 * 4 4
-4 way a0- &M v 444 4 6-0 4P. 4 v
a C
a 40 4 01
ab 0 .C won a a
'0 4w So %~ '
I. - 00 4 CQuo
4 4
- 4 .W- 4125b~
CHAPTER VIII
a;oear in the relative percentage (actual-to-possible ratio)
cateJory. These three instances of AT < DT = Al differences
indicate that the degree of "globaLity" for global variables
was higher for the individuals and the disciplined teams
than for the ad hoc teams. Finally, another AT 0 DT = Al
difference appears for the frequency of possible se~ment-
gl:oat-segment jata oindings, indicating a positive effect
of the aisciplined methodology in reducing the possiole data
coupling among segments. It may be noted that these Last
two categories of aspects, segment-global usage relative
percentages and segment-global-segment oata bindings, also
reflect upon the quality of modularization, since gooo
modularity should promote the decree of "gtobality" for
globaLs and minimize the data coupling among segments. The
interpretation here is that
certain aspects of inter-segment communication via
Llobals seems to be positively influenced, on the
average, by the disciplined methodology.
with respect to dispersion comparisons, there is a
diversity of differences in this class, without any unifying
interpretation in terms of the experimental treatments.
The cyclomatic complexity and software science metrics,
wM3se results have not been integrated into the two
interpretive frameworks discussed aoove, aefinitely mcrit
sove interpretation.
On location comparisons, the results for cyclomatic
complexity measures exhibited a common underlying trend,
namely, DT < AT < aI. In fact, the non-null outcomes were
usually either AT = DT < Al or else DT < Al = AT. This says
that either the teams were differentiated from the
CHAPTER VIII
inJivid.als or else the disciplined methodology was
differentiated from the ad hoc approach, depending on the
oarticular variation of cyclomatic complexity invotveu.
This corresponds well with the intuition that team
Droramming alone snoulo force a general reduction of
cyclomatic complexity for individual routines, and that use
of the disciplined methodology within team programminj
Sh~uld promote this effect even further. The observeo
results for the cycLoatic complexity metrics seem to
display this kind of behavior.
The generally weaker differentiation (i.e., larger
critical levels) observed for the cyclomatic complexity
asoects relative to other aspects considered in the study is
auite understandable in Light of the fact that all 19
systems were coded in a structured-programming Language
which greatly restricts ootentiat control flow patterns. We
wouli expect cyclomatic complexity metrics to be more useful
in the context of unrestrictive programming languages such
as Fortran.
The results for software science quantities are
safewhat disappointing: surprisingly few distinctions among
the 4roups were ootainea. On Location comparisons, only the
vocacutary and estimated length metrics (the latter is a
function solely of the former) yielded non-null conclusions.
Their Al < AT = DT outcome corresponds to that obtained for
the number of segments and data variables, both of which
c otribute heavily to the number of "operator/operands."
The overall interpretation here is that the software science
metrics appear to be insensitive to differences in how
so'tware is developed. "aybe these measures, with their
actuaridl nature, are sensitive only to gross factors in
software development (e.g.t project, application area,
imolementation Language), all of which were held constant in
127
CHAPTER IX
A practical methodology was designed and developed for
exoerimentally and quantitatively investigating software
develooment onenomena. It was employed to compare three
oarticuLar software development environments and to evaluate
the relative impact of a particular disciplined methodology
(maje up of so-called modern programming practices). The
exoeriments were successful in measuring differences among
orogrdmming environments ano the results support the claim
that disciplined methodology effectively improves botn the
orocess and product of software development.
One way to substantiate the claim for improved process
is to measure the effectiveness of the particular
programming methodology via the number of bugs initially in
the system (i.e., in the initial source code) and the amount
of effort required to re-nove them. (This criteria has been
suggested independently oy Professor M. Shooman of
Polytechnic Institute of New York [Shooman 7,3).) Altnough
neither of these measures was directly computed, they are
ea:n closely associated with one of the process asoects
considered in the study: PROGRAM CHANGES and COMPUTER JOB
STEPS\ESSENTIAL, resoectively. The location comparison
st3tistical conclusions for both these aspects affirmed
T < Al = AT outcomes at very low (<.01) significance
levels, indicating that on the average the disciptined teams
mehsurea Lower than either the ad hoc individuals or the ad
hoc teams which botn measured about the same. Thus, the
Pvidence collected in this study strongly confirms the
effectiveness of the disciplined methodology in building
retiaole software efficiently.
The second claims that the product of a disciplined
CHAPTER IX
te3m should closely reseflu e that of a single individual
since the aisciplined methodology assures a semblance of
conceptual inteyrity within a programming team, was
partially substantiateo. In many product aspects the
products developea using the jisciolineo methodology were
either similar to or tended toward the products developed by
the individuals. In no case did any of the measures show
the oiscipLined teams' products to ce worse than those
developed by the ad hoc teams. It is felt that the
suoerticiality of most of the product measures was chiefly
responsible for the Lack of stronger support for this second
cl3im. Tne need for product measures with increased
sensitivity to critical characteristics of sottware is very
clear.
The results of these experiments will be used to guide
further experiments and will act as a basis for analysis of
software development products and processes in the Software
En;ineering Laboratory at NASA's Goddaro Space Flight Center
[3asiLi et at. 77]. The intention is to pursue this type of
emoiricaL research, especially extending the study to more
soohisticated and Promising software metrics.
I 3C
APPENDIX I
C a aa S
4 QfI fka* I S v a a a P I S n I
00 0*a a .. n aaa- ~ ~ ~ ~ ~~0 4nag. AID .ee..vsm.raaa aaa
* aC2 a* a a S PM
-M V O.a a* *son 30 a4 =01Q0P-..,C .09 lwv' P*. 45 *Sl * a** *
A c b- ONW. waa~ vNSVS*~~1 19:4 aW.~N S.S9! a, %a56. ol a ~ aa~a a 5) 0
a C-a a 0 v, I ~IL~a a c . sa a S
CQ£ e& a a m
8:4c a-a a aa-. ~ ~~~~~~~~ a z.Ce-Oaa zme m'
-, I I . . . St .aa.em a a~a a~~a0~ ae *Q 5.cc.'o
C~~~~~~~~~i a e :~ l~ea~ ~SCA ae a
01 aa a a S
co a£ aC a a V mc.-s ae a a a a
C a aES a &"aSNa e aC ~56 a S a a aa ea a
a 0 2 S. a- a..........I aa ........
-Cb a 4 ~ 0 a~ V C ~ 0 a fae
.4 SIC a0a a a aaloz a0xaC. a- C 0 a: b 1 a 0 a. a a4 ut a a Zo I Ia~ cosoa a a.C 6 5 3 4 a aCo IV
CI -cc- a ww a a an 10 CW- aa - .e a 07~.e - aa SO 1.2 a Ca ~ ~ aCa a C a eo
i .a Sl a6 a a.a . 0 -A -0 U Oa j.ce. a Za a
~ s- s a a ~ a a a 5 a a13S1
APPENDIX 1
1.s-K .t. . n 't.OrnoK o IN N0~-K -O ~ --.N P.NK I I~tC OSwp u n W sofnn "PIC
IlQLr K_ 00. 0 IN M K0 f V 0 I0no w
' . . . . . .: IF 1. tio K; *n o *Kw 0;0 . .
AJ1r -SO K- " m -Wu"m I K K m ~ . Kw m o- C i
44 I~. . . .-tQ'O'c' 0 01 'OUo..........C)O
onQ 8PV~m WW no. 9 Wm~wj. I 4M 'tO K r~y *KmKwK KOKA .gl r--~o I
Ns I foK0 a K0 1S WW.o Kneu it podm0OK mK. K 4 K .G O
K. 0 I1 *KOKO m4o.*wN. .o_. . . .I-00 *KK.00
KP.*4*Vl4~,IK to I WOK Kn KP K
I K K I I I K K K K K
4L K 1 ,~ o-t .mI Oo ao.t~oIm KO .r7.IOK 4 N 4 . 0
~C IWI K K .. .. . .I ~ I411 '.. K KK .U . . . .
K KS.KmKOI .N~WC40~KI. . . . K4KU GO~W0 * V0C..0UC.1
m~ IK K 1 O~N WIWOWII 44 04WO IWNW: 0 trWN ~IKN ~ 0~o
IKNKWII ~ ~ ~ a N 0. Ito4~ IK K K K K
a No ~ 4 J~ g I bW4 P_ K N K KM _$WI %i N IIa K4a S IC K0 K'Z K W I
:: KK 49 a 0z K K 01 m K 4
at0 INK0 11- K 08& wilAQ OU. Ic Q' ~ 0.0 1' K CK MI KW QKAI QMI0 O- ~ ~ ~ 4 K doiv- KS *KIO I'OI.- . 1K... KWKK IIK -K IIrmda
I 9 5K I I Kp K K W Ke ma2
L~~~~C I II K K
4~~~~3 I II K K
O~~z K KZ: I K K K
I K K I I I K K K K1K2
APPENDIX I
~e00Aw* 0 1 V0*f%V I .. . 0 6 N0 000.V006~~ 6 1*00V*
00 0. a. O .. W . . . 0 .. ..... ~~ , 43.60,10. CoQ 010.4 0 6 0106! IN FI6 # M" V I vP.-w"C p 1. 09 0D .
40.P.N4 91Q .N . . .. .* .o . .C . .W ..40 u0*.W0PW v0 . . .0O.'. .
.6 0* 0 0 .£ F . 0.0 6 wt tv Of%* on0 I
,.oP~~~~cP.,-cw"I IV 0W1* -D40 6 6* -- 6 V*P00*.0 %0~ I 4NOI
* . . . . .- .P . .
i si WI M v- V- I
Z to 6 6 6 6m:
.*.. 1.110. .%p86. 6 :**00 .00 .00 o.N4* Add*A.* P n:70i*V.t0~00A6O04
0*omP.V~ - - 1 ve - . . . .* 0 4 . 2. 0. 4 . 6 . '-~ tm0* ~ ~ ~ ~ ~ 400om C. e0*0.04N-O60
(0*W.J ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ lop 0O qpvw 6 * 006 * A FnN0N0*%A%*J0 0CL* * 6 - 0 -4L6 *
z 00- 69
IL 16, CL_ 41 wo 0N Pu .
*, I,6
~O64.0**Pl6 *i~ * N 7a0i~#~w*00*m6*V6* iQ7.w w:-i* AV
~ :~ I *0* SA P6 ,0CZ 0*-! 1 69 6t WO
. .0* 6 6 6
A. - a- I 3Z IO-
7. 6L Z0* 6AL -g -g z 11 4 :
-1I' 0 oI I PA~~~~ ~ 6W 0 I6*16
-00 -* - - ~ -60 -------, 6n ~%* l 6n8 SE
a 0* *64163
ILs.
APPENDIX 1
w4 i fma P W avq r n arnan00PN9 C
* . .. .aa 0 0 1OIQ N-.V 9U~NW P~V~ ~'U.4I ~m'C C0 .SwqO 0wo00o.m
mmv w: mm H "I N COM- GOO. lot"ma~~C fma4 99
-4MM ml WI m .^ 0-U el W IE C aQ.u 0tm0WC% 9mm- r. uvmanD.y
. m m0.u ~ .. ~f0 .0 m m 0 e u " . .* .. ajWINN~~~~~~~~~m. M F.v#N0r~vm..0.~
ma evq~aaCo -.. 4a -0 o I I N Oa or_ aOUN01 4)a00. C W.V O .A.Cmtra OFWU9C Nv
N4rdN 01010 M ~4 WI f aNWI IC0.o 9So *m ~ jt MAR wamapw-l.oN I~w_ a C aCa.c9 I A.P
d o l 1 U aW & a 7 I G . -a: *
0.zuvo Na'aa *am lNw9 I.0
0:W qQ.QlA_0ry~A A I PI I 00.-VC .0 . .. . . . . . Wn1m NN0 a a, fuIP VlI Iam mak 0 1 a 0 r'a
N-m 0 al-. 00~4a-m..C., ~-.ov.cc a~m~.~91,Na.............. .IL . *a410WN C . N . 0 W . . ~ . . . . . .. . . 99 a .~ m * .
0n.Na v ~.O O~WN .0 Wa~F. 0 IV UN
- a m.NM *0A.D ~ fWO~I NN N NC9
.j a 0maa........................... OIL.
ma0 0 a--0: a 9 m'60c a
-1 no am l ! OKP - K_ Iu. '! amx a ad...-ftea.aama maV a 11 Clot
U,,"m NZ aavI0o ,.ea. a Cz Oal t.0w IN s.Or ,. ace -C .4 a l no F-0' jIC-*.O.Wla .4 00-tI.0anO a .0 ~ u O ' a........ .. mair ,a .9 1 a
mu ma g Ma a a cz Kz 0 0z K- aat so I 4a - .z a9 '" - S
a. ma ma a . c I ~ a-m- a ma am-9 mu ~ A.41 maW z9 aa~ Imu ma mamum amuC ~ -
if am ~ ma m ~ mmu C mu2-Ww ma man - a ~ c a. a..c
mu mu. a~ mm w 134
APPENDIX 1
m . 4: . .. . .a-. a:. .~.~ai .m . .- .- as~ .. f~lO . .fI on tv 5 I s Q@fa %ovs
* 5 I55 vsIN a run.
* a a a a................................................. ....................-~~odt-.a~~ 0i 00OC .mOWI4%la% ~ '.Q dcW amffe5 i~~.
........................... ............. 5 .. 54w55 *.-*~ue~.a rm-N.asw .mS V0.~fi.-5%JU 5,t0%O~~~N5Im s mt m sa0.@
55~e~, .. . .! . . . . . *a 11 .. ass -0-
5v~ Owev 5~ I e 5wo .0:AD 4*Df a 1EOV1It 01,
ma- f" 0 0 as as0-at~~~~~~~w~ a onvrkm0nSa A~..WN W~~a 0 , .r~~wO sN Om a as.
................ 5.........................................Rv5l ***O
h aW n J . O fa u AI L a w man as
- ~ ~ - 0-K co .4 Q coa m a .4E0asas 0.uo. CI foaf :G a. 21 as 9 .' as au.
aj- asJ j w N as Z 6 l11 ~ a .4ewz- -:: as uas1 0 v - " - w z " , , Z as w - a w m m~
.................... a...........a ... Z.as'"6 .. cw
cr ccaIl-I cl9a apI a AcpoppopI a as ej Z - as o a. 'ww
o ~ - a - - a i a v~ a . W v a ar woa a - a -a - as ~ as n'a~~~4 % a aCW
* - - Ma. a s as..oa
40a ovwmGom a or Go*asW"C a -
5 S a ss13a
APPENDIX 1
ItO I C~r@od 1 0 1
E%..I -O..- I Ad . QCaNCp WI 1N~~- 0evN.fJ
1 0 .0 S '
1 O . rvS * m I * c l ) -lem
.pr4... .. 4 6
*. W-9 ME*..0 0.aj .
fu*'0 *I 0 f_ U 4 c
C0 1 0I.5V 0 mI.*ee -
0.6 1 0 "- *N I UC 0 n w% w-I ~:'~' 1 C-0'r "'. : orn- *e6
... I 0.o.._I t-I ** Un%fv I~ %" . ' m I * - *
4L-
zI I z.03.t. Qj~w~ 21 ~ *
zD.Q ffWU %cw%" w S c a.I%0V 4
lo1 I N .I 0f.g C
- - - - - - - - .
...................
REFERENCES
ti33Ker 72) F.T. Baker. Chief programmer team management of
production Programming. H-T SY eMA !JVuAn, vol. 11,
no. 1 (1972), pp. 56-73.
[63ker 75) C.T. Baker. Structured programming in a
production programming environment. EE IrftA!c1tioQ1
oQ loftwari E n~teina, vol. 1, no. 2 (June 1975), pp.
Z41-252.
[33siLi baker 77] V.R. Basiti and F.T. Baker. Tutorial
of Structured Prg2rrmmin . IEEE Catalog No.
75CH1O49-6, Tutorial from the Eleventh IEEE Computer
Society Conference (COMPCON 75 Fall), revised 1977.
[!3sili 9 Reiter 7S] V.R. basili and R.W. Reiter, Jr.
Investigating software development approaches.
Technical Report TR-688, Department of Computer
Science, University of maryland, August, 197a.
C33sili 5 Peiter 79aJ V.R. assili and R.W. Reiter, Jr.
&valuating automatable measures of software
oevelopment. Proceedings of the IEEE/PoLy Workshop on
uantitative Software Models for Reliability,
Complexity, and Cost (October 1979), Kiameshia Lake,
New York.
lasili F, Reiter 79b] V.R. Basiti and R.W. Reiter, Jr. An
investigation of nuvan factors in software development.
-o2uMtI _ "agazine, vol. 12, no. 12 (December 1979), pp.
[dasiLi 9. Reiter 8C] V.R. 6asili and R.W. Reitert Jr, A
controlled experiment quantitatively comparing software
development approaches, to be published in [
ftra!3 1i-202 20 k29-t[r.- Enginjerin2, 1q60.
[33sili . Turner 75] V.P. basili and A.J. Turner.
Iterative enhancement: a practical technioue for
software developmento ILLE TIaD1a1ig 2oz gO e
E!i!nejin/, vol. 1, no. 4 (December 1975), po,
137
REFERENCES
[aasiLi ? Turner 76) V.R. Basili and A.J. Turner. SIMPL-I,
A Structured Prrammino Lafn uaae. Geneva, Illinois:
P3tain House, 1976.
[63sili K Zelkowitz 78] V.R. 9asili and M.V. Zelkowitz.
Analyzing medium-scale software development. IEEE
Catalog No. 7?CH1317-7C, Proceedings of the Thira
international Conference on Software Engineering (may
1975), AtLanta, Georgia, pp. 116-123.
[i3sili S. Zetkowitz 79] V.R. 9asili and M.V. Zelkowitz.
Measuring software develooment characteristics in the
Local environment. C p er _ tu r vol. 10
(1979), pp. 39-43.
(53sili et at. 77) v.R. Basili, M.v. Zelkowitz, F.E.
McGarry, R.W. Reiter, Jr., W.F. Truszkowski, and D.L.
weiss. The software engineering Laboratory. Technical
Report TR-535, Department of Computer Science,
University of maryland, r'-ay, 1977.
[Belady 9 Lehman 761 L.A. Belady and M.M. Lehman. A model
of large program development. IBM In J2murn
vol. 15, no. 3 (1975), Dp. 225-251.
( ery 73J C. Berge. Graphs and Ho2erra2hs. Amsterdam,
The Netherlands: North-Holland, 1973.
[arooks 75J F.P. 9 rooks, Jr. The MyIhicpi Tan-Aonth.
Reading, Massachusetts: Adoison-Wesley, 1975.
[Camooeli 1, Stanley 63) D.T. Campbell and J.C. Stanley.
x6xerimentat and Qujai-E xojr.Linta Desins fgr
rIearch. Reprintei from Handbook of Research on
I ifjbinS , edited by N.L. Gage. Chicago, Illinois: Rand
McNally, 1963.
[Conover 71) w.J. Conover. Prlctija_1 N2noaramtrik
jtttii. New York, %ew york: John wiley & Sons,
1 ?71
[Curtis et at. 79) 9. Curtis, S.3. Sheppard, P. MilLiman,
M.A. Borst, and T. Love. Measuring the psychological
138
RE FERENCES
complexity of software maintenance tasks with Halstead
and McCabe metrics. 1EL _r ania 2n o1.w £
fooqine.riog, voL. 5, no. 2 (March 1979), pp. 95-104.
[D3hlo Dijkstra S Hoare 72) 0.-J. Dahl, E.W. Dijkstra, and
C.A.R. Hoare. St. ruCuLrte Proi£aMming New York, New
York: Academic Press, 1972.
[D3Ley 77) E.B. Daley. Management of software development.
IEEE Trans ctlionj 20 S2tta:r Jniineerjnq, vol. 3, no.
3 ('ay 1977), pp. 229-242.
[Dinsmore 78) H.E. Duns1ore. The influence of programming
factors on programming complexity. Ph.D. Dissertation.
Department of Computer Science, University of Maryland,
July, 1078. available as Technical Report TR-679.
[Dunsmore & Gannon 77) H.E. Dunsmore and J.D. Gannon.
Lxperimental investigation of programming complexity.
Proceedings of the ACM/NBS Sixteenth Annual Technical
Symposium: Systems and Software (June 1q77),
washington, D.C., po. 117-125.
[Elshoff 70a) JL. ELshoff. Measuring commercial PL/I
programs using 4alstead's criteria. AC"M SIGPLAN
No1ice, vol. 11, no. 5 (M ay 1976), pp. 38-46.
[Elshoff 76b] JoL. Elshoff. An analysis of some commercial
FL/i programs. IEEE Transactions on Software
.njineerin., vol. 2, no. 2 (June 1 76), pp. 113-12C,
[F3gan 76) ".E. Fagan. Design and code inspections to
reauce errors in program development. W! S_1CeMs
J2oE01n , vol. 15, no. 3 (19'6), pp. 132-211.
[Fitzsimmons & Love 78) A. Fitzsimmons and T. Love, A
review and evaluation of software science. C2ePutin
.uryex: vol. 10, no. 1 (1arch 1978), pp. 3-18.
[Gannon 75) J.D. Gannon. Language design to enhance
programming reliability. Ph.D. Thesis. Department of
Computer Sciencet University of Toronto, January, 1975.
available as Technical Report CSRG-47.
[Gannon 77) J.D. Gannon. An experimental evaluation of
139
RE FERENCES
uata tyoe conv'.ntions. £QmnEi idLiO1 Q 2 h ILL Vol.
.. , no. 6 (August 1777), op. 5o4-595.
[Gannon ?. Horninq 75) J.D. Gannon and J.J. Homing.
Language design for programming reliability. 1on sf e Enineerin, vol. 1, no. 2
(June 1975), pp. 179-191.
[Gilo 772 T. Gilb. Sofkare Metrics. Cambridge,
Massachusetts: Winthrop, 1977.
[Gordon 79) R.D. Gordon. A qualitative justification for a
measure Of program clarity. UEE _rPQs1JQ!on on
2fiware EnQineerina, vol. 5, no. 2 (varch 1979), pp.
121-12A.
[Green 77] T.R.G. Green. Conditional program statf ments
and their comprehensibility to professional
programmers. Journal of Occupational PsYct21Q29, vol.
50 (1977), pp. 93-1)9.
[HaLstead 77) M. Halstead. ELt~ens 2 2ltare Sience.
New York, New York: ELsevier, 1977.
[Hughes 792 J.W. Hughes. A formalization and explication
of the "ichael Jackson method of program design.
Software--Pra ic Exerience, vol. 9, no. 3 (March
1 79) , op. 191-202.
CJackson 75] %'.A. Jackson. Principles of Program Desi2n.
New York, New York: Academic Press, 1075.
(Kirk 53) R.E. Kirk. Jx12r:inen.J!J Desiqn: Pro eLres 12r
th 3eta rat Sciienqej. Belmont, California:
4 aisworth, 1956 .
[Knutn 71) D.E. Knuth. An empirical study of FORTRANprograms. VofIware--PraCt _n _ ri n , vol. 1,
no. 2 (April-June 1971), pp. 105-133.
(Lin-ier, Mills , witt 79) R.C. Linger, H.D. Mills, and 9.I.
witto StrucJ~ureo Prog!rgmminZ: Ibrc2!r tog !rS-Jl
Reading, "assachusetts: Addison-wesley, 1979.
[Love ! nowman 76) L.T. Love and A.B. dowman. An
independent test of the theory of software physics.
140
REFERENCES
2M 1l No vol. 11, no. 11 (November 1976),
pp. 42-49.
[ ,cCdoe 76. T.J. McCabe. A complexity measure. IEEE
TaasA1.tigoi 20 glt.±.re En2i.nZering, vot 2, no. 4
(0ecember 1976), pp. 308-320.
rmilts 72) H.0. Mills. Mathematical foundations for
structured programming. FSC 72-6012, IBM Corporation,
Gaithersburg, Maryland, Feuruary, 1972.
[,MiLLs 73) H.D. Mills. The complexity of programs. in
_.rg.m I111 Methods, eoited by W.C. Hetzel, pp.
225-238. Englewood Cliffs, New Jersey: Prentice-Hall,
197 3.
Imills 76) H.D. Mills. Software development. IEjj
Irs1_o i 21ns 22 5S2oit wA__:e En)2i nee rin , vol. 2, no. 4
(December 1976), pp. 265-273.
Elyers 75) GeJ. Myers. alliabjS jofare through jgmpei
.tesin. New York, 4ew York: Petrocelli/Charter, 1975.
[,Myers 77) G.J. Myers. An extension to the cyclomatic
measure of program complexity. M SIGPLAN N2tics,
vot. 12, no. 10 (October 1977), pp. 61-64.
.tyers 7S ] Ge. Myers. A controlled experiment in program
testing and code walkthroughs/inspections.
Communications of the ACM, vol. 21, no. 9 (September
1978) , pp. 760-768.
['emenyi et al. 773 P. emenyi, S.K. Dixon, N.B. White,
Jre, and --L- Heastrom. 2__Iit±C fr2M Scratch. San
Francisco, California: Hotden-Day, 1977.
[31dehoeft 7?) P.R. Oldehoeft. A contrast between language
level measures* I§EE Transajtiofns 22 Sofwi~re
En2ine!rin5, vot. 3, no. 6 (November 1977), PO.
476-473.
J3stLe & Mensing 75) B OstLe and R.W. Mensing. S_1jistill
in Rstrch, Third Edition. Ames, Iowa: Iowa State
university Oress, 1775.
(Putnam 78) L.H. Putnam. A general empirical solution to
141
KEFERENCES
the macro software sizing and estimation problem. ILLL
TrCiaos.i2o. go 12jII FOttrq VOl. 4, no. 4
(July 1979), pp. 331-316.
[Reiter 79) R.,. Reiter, Jr. Empirical investigation of
computer program development approaches and computer
programming metrics. Ph.. Dissertation. Department
of Computer Science, University of Maryland, Decemoer,
1?79.
(Sheppard et at. 79) S.3. Sheppard, B. Curtis, P. Miltiman,
and T. Love. Modern coding practices ano programmer
performance. C2omoer 'agazine, voL. 12, no. 12
(December 1079), pp. 41-49.
[Shneiderman 76) 6. shneiderman. Exploratory experiments
in programmer behavior. IoIernati2n JouaLU 2f
M2uter and Information Iciences, vol. 5, no. 2 (June
1976), pp. 123-143.
[Shneiderman do] B. Shneiderman. Software Psqh22a:
HuMa Eators in C u~Ir i MUtJ2a 5X1tC21-Cambridge, Massachusetts: winthrop, 1980.
[Shneiderman et at. 77) B. Shneidermang R. Mayer, D. McKay,
and P. Heller. Experimental investigations of the
utility of detailed flowcharts in programming.Communications of the ACM, vol. 20, no. o tJvne 1977),
pp. 373-381.
[Shooman 78] M. Shooman. Private communication, in
conjunction with a colloquium (on Software Reliaoility
;mooels) given at the Department of Computer Science,
university of Maryland, October, 1978.
[Siele1 56) S. Siegel. on2EaraJrc Sttistils: for the
eir lj~rL litoce_. New York, New York: McGraw-Hill,
1956o
[Sime, Green & Guest 73) M.E. Sime, T.R.Go Green, and D.J.
Guest. Psychological evaluation of two conditional
constructs useo in computer languages. nIntrnodi2nai
2WJC01i 2! an-machine Studies, vol. 5, no. 1 (January
142
REFERENCES
1973), pp. 105-113.
(Simon 69 3 W.A. Simon. The architecture of complexity. in
U Sh 2_1D¢ of te liltizial, pp. 84-118. Camoridge,
M'assachusetts: MIT Press, 1969.
(Stevens 46) S.S. Stevens. On the theory of scales of
measurement. JrjearC vot. 103 (1946), pp. 677-6$0.
[Stevens, "yers Constantine 743 w.P. Stevens, G.J. Myers,
and L.L. Constantine. Structured design. lk_ 12sem
J2urna., vol. 13, no. 2 (1974), pp. 115-139.
[Tukey 69) J.W. Tukey. Analyzing data: sanctification or
uetective work? American P1Zlolot1ist, vol. 24, no. 2
(February 1969), pp. 83-91.
[Turner 76) A.J. Turner, Jr. Iterative enhancement: a
practical technique for software development. Ph.D.
Dissertation. Department of Computer Science,
university of Maryland, May, 1976.
EaLston & Felix 77] C.E. watston and C.P. Felix. A method
of programming measurement and estimation. IBM 1zstIlt
J2gr~ia_ vot. 16, no. 1 (1977), pp. 54-73.
[oeinberg 7.; G.M. Weinberg. The PsXcnj22oa 9. 2MI11
Pjr2grSA~m!inq. New York, New York: Van Nostrand
Reinhold, 1971.
Leissman 74a] L.m. weissman. Psychological complexity of
computer programs: an experimental methoaolo ;y. AL!
SIGPAN N2ticej, vol. 9, no. 6 (June 1974), po. 25-36.
C.eissman 74b] L.m. weissman. A methoaology for studying
the psychological complexity of computer programs.
Ph.D. Thesis. Department of Computer Science,
University of Toronto, August, 1974. available as
Technical Report CSRG-37.
%airtn 71] NY. Wirth. Program development by stepwise
refinement. 2oMuni~ain1 gf 2 he , vol. 14, no. 4
(Aorit 1971), pp. 221-227.
14
CURRICULUM VITAE
" e: kobert Williai Reiter, Jr.
Permanent address: 900 Jamieson Road,
Luthervitte, MaryLand 21043.
Degree and date to be conferred: Ph.D., December, 1979.
.t e of birth: June 7, 195C.
l'tce cf birth: aaltimore, Marytand.
Seconuary education: Loyola High School, Towson, Maryland,
June, 1963.
Collegiate Institutions Dates Degree Date of Degree
Uassachusetts Institute
of Technology 1968-72 S.8. June, 1972.
University of Mirytjna 1972-76 M.S. May, 1976.
UTniversity of Marytand 1976-79 Ph.D. December, 1979.
Major: Conputer Science.
Prifessional puctications:
V.R. F)asiti, -. V. Zetkowitz, F.E. McGarry, R.W. Reiter,Jr., .F. Truszkowski, and D.L. Weiss.
P'e software engineering Laboratory.Technical Report TR-535, Department of Computer
Science, University of Maryland, May, 1977.
V.t. Basiti and R.w. Reiter Jr.Investigating software development approaches.Technical Report TR-68b, Department of Computer
Science, University of Marytand, August, 19?8.
V.R. Basili and R.W. Reiter, Jr.Lvatuating automatabte neasures of software
deve Looment.Proceedings of the IEEE/Pol, yworkshop on Quantitative
Software. ModeLs for Re iaoility, Comptexity, anuCost (IEEE Catalog No. 9), Kiameshia Lake, NY(October 1979), pp. t-ft.
i!
V.R. Basili and R.W. Reiter, Jr.An investigation of human factors in software
development.Computer Magazine, vol. 12, no. 12 (December 1979), pp.
21-38.
V.R. Basili and R.W. Reiter, Jr.A controlled experiment quantitatively comparing
software development approaches.(to be published in IEEE Transactions on Software
Engineering, 1980)
Professional positions held:
1970-1973 (Summers only) -- Programmer/AnalystInformation Systems Department, Baltimore Gas &
Electric Company, Baltimore, Maryland 21203.
1972-1976 -- Graduate Teaching AssistantDepartment of Computer Science, University of Maryland,
College Park, Maryland 20742.
1976-1979 -- Graduate Research AssistantDepartment of Computer Science, University of Maryland,
College Park, Maryland 20742.
1979- -- Faculty Research AssistantDepartment of Computer Science, University of Maryland,
College Park, Maryland 20742.