v r-i-L n K Error Detection Using Path Testing and Static · testing techniques for each class of...

6
How many types of errors can be detected through static analysis and branch testing? How many man-hours and machine hours do these techniques require? Here -are some empirically determined answers. Error Detection Using Path Testing and Static Analysis Carolyn Gannon General Research Corporation Two software testing techniques-static analysis and dynamic path (branch) testing' - are receiving a great deal of attention in the world of software ehigneerixt these days. However, empirical evidence of their ability to detect errors is very limited, as is data concefning the resource investment their use re- quires. Researchers such as Goodenough2 and Howden3 have estimated or graded these testing methods, as well as such other techniques as inter- face consistency, symbolic testing, and special values testing. However, this paper seeks (1) to detnonstrate empirically the types of errors one can expect to uncover and (2) to measure the engineering and computer time which may be required by the two testing techniques for each class of errors during system-level testing. The experiment To provide the material for our testing demonstra- tion, a medium-size, 5000-source-statement Fortran program was seeded with errors one at a time, and analyzed by a testing tool4 that has both static analysis and dynamic path analysis capabilities. The test object. The test object program was chosen carefully. In addition to size and functional variety, we needed a program as error-free as possible to eliminate camouflaging the seeded errors. The For- tran program5 chosen consists of 56 utility routines that compute coordinate translation, iftegration, flight maneuvers, input and output handling, and data base presetting and definition. The collection of routines has been extensively used in software pro- jects for over 10 years. The main program directs the functional action according to the user's input data. The program is highly computational and has com- plex control logic. Sample data and expected output were provided with the program's functional descrip- tion, but specifications and requirements for the pro- gram's algorithms were not available. For this experi- ment, the correct output for the supplied data sets was used as a specification of proper program be- havior. Error seeding. Of the several studies describing er- ror types and frequencies, TRW's6 is the most rele-. vant. We have used the Project 5 data from that report as the basis for selecting error typ6s and fre- quencies. Several error categories (namely documen- tation, problem report, and "other"-which includes software design, compilation errors, and time or storage limitations) which were not relevant to the test object or test environment were omitted. Table 1 shows the types of errors represented by the experi- ment and summarizes the results. Error frequencies by major category are shown in Figure 1. There are seven applicable major error categories. Table 2 shows the frequencies of errors by category for the Project 5 data and (adjusted for the omitted "other" category) the frequencies and number of er- rors used in the experiment. Inasmuch as the dif- ference between columns 5 and 6 affect the evalua- tion of path testing, they require some explanation. In addition to generating errors whose type and fre- quency have their bases in a published study, the location of each error and the program's resulting behavior were also prime concerns in maintaining an objective experiment. In the TRW study, no data linking the error type to software property (e.g., statement type) is presented. Using the error types 0018-9162/79/0800-0026$00.75 © 1979 IEEE v n K r-i-L I 26 COM PUTER

Transcript of v r-i-L n K Error Detection Using Path Testing and Static · testing techniques for each class of...

How many types of errors can be detected through staticanalysis and branch testing? How many man-hours and machine

hours do these techniques require? Here -are someempirically determined answers.

Error Detection UsingPath Testing and Static Analysis

Carolyn GannonGeneral Research Corporation

Two software testing techniques-static analysisand dynamic path (branch) testing'-are receiving agreat deal of attention in the world of softwareehigneerixt these days. However, empirical evidenceof their ability to detect errors is very limited, as isdata concefning the resource investment their use re-quires. Researchers such as Goodenough2 andHowden3 have estimated or graded these testingmethods, as well as such other techniques as inter-face consistency, symbolic testing, and specialvalues testing. However, this paper seeks (1) todetnonstrate empirically the types of errors one canexpect to uncover and (2) to measure the engineeringand computer time which may be required by the twotesting techniques for each class of errors duringsystem-level testing.

The experiment

To provide the material for our testing demonstra-tion, a medium-size, 5000-source-statement Fortranprogram was seeded with errors one at a time, andanalyzed by a testing tool4 that has both staticanalysis and dynamic path analysis capabilities.

The test object. The test object program waschosen carefully. In addition to size and functionalvariety, we needed a program as error-free as possibleto eliminate camouflaging the seeded errors. The For-tran program5 chosen consists of 56 utility routinesthat compute coordinate translation, iftegration,flight maneuvers, input and output handling, anddata base presetting and definition. The collection ofroutines has been extensively used in software pro-jects for over 10 years. The main program directs the

functional action according to the user's input data.The program is highly computational and has com-plex control logic. Sample data and expected outputwere provided with the program's functional descrip-tion, but specifications and requirements for the pro-gram's algorithms were not available. For this experi-ment, the correct output for the supplied data setswas used as a specification of proper program be-havior.

Error seeding. Of the several studies describing er-ror types and frequencies, TRW's6 is the most rele-.vant. We have used the Project 5 data from thatreport as the basis for selecting error typ6s and fre-quencies. Several error categories (namely documen-tation, problem report, and "other"-which includessoftware design, compilation errors, and time orstorage limitations) which were not relevant to thetest object or test environment were omitted. Table 1shows the types of errors represented by the experi-ment and summarizes the results. Error frequenciesby major category are shown in Figure 1.There are seven applicable major error categories.

Table 2 shows the frequencies of errors by categoryfor the Project 5 data and (adjusted for the omitted"other" category) the frequencies and number of er-rors used in the experiment. Inasmuch as the dif-ference between columns 5 and 6 affect the evalua-tion of path testing, they require some explanation.

In addition to generating errors whose type and fre-quency have their bases in a published study, thelocation of each error and the program's resultingbehavior were also prime concerns in maintaining anobjective experiment. In the TRW study, no datalinking the error type to software property (e.g.,statement type) is presented. Using the error types

0018-9162/79/0800-0026$00.75 © 1979 IEEE

v

nK

r-i-L

I

26 COMPUTER

made it necessary to establish correlations betweeneach error type and quantifiable test software proper-ties. Furthermore, since the test object consistsprimarily of general utility subroutines, many havingalternative segments of code whose execution de-pends upon their input parameter data, we felt thatthe errors should reside on segments of code that areexecuted by a thorough (in terms ofprogram functionand structure) set of test data, and that the errorsshould manifest themselves by some deviation in theprogram's normal output. To generate errors withthese properties, the following steps were performed:

(1) The test software was analyzed by the test tool4to classify source statements, to obtain soft-

ware documentation reference material (e.g.,symbol set/usage, module interaction hierar-chy, location of all invocations), to guide inser-tion of errors, and to generate an expanded setof test data that provided thorough pathcoverage. The percentage of path coveragevaried from module to module depending uponthe main program's application of the utilitysubroutines.

(2) Amatrix showing error types versus statementclassification was manually derived.

(3) The information from steps 1 and 2 was com-bined into a matrix showing potential sites inthe software for each error type.

Table 1.Error detection for each error type.

Path ErrorsStatic Testing Manifested

Category Analysis Phase In OutputA. COMPUTATIONAL

Incorrect operand in equationIncorrect use of parenthesesUnits or data conversion error.Missing computation

LOGICIncorrect operand in logical expressionLogic activities out of sequenceWrong variable being checkedMissing logic or condition testsToo many/few statements in loopLoop interacted incorrect number of times (including endless loop)

INPUT/OUTPUTInput read from incorrect data fileData written according to the wrong format statementData written in wrong formatIncomplete or missing outputOutput field size too small

P,AS P,O

A,O1 5

S,S

2

D. DATA HANDLINGD050 Data file not rewound before readingD100 Data initialization not doneD200 Data initialization done improperlyD400 Variable referred to by the wrong nameD900 Subscripting error

F. INTERFACEF200 Call to subroutine not made or made in wrong p.laceF700 Software/software interface error

G. DATA DEFINITIONG100 Data not properly defined/dimensionedG200 Data referenced out of bounds

H. DATA BASEH100 Data not initialized in data baseH200 Data initialized to incorrect valueH300 Data units are incorrect

p*pl,I,PP,A,O.P,A,A,OA

10

1,1S P

1 6

S

S

2

S

1S

0

8

1,1

A,Op

6

P,I*1,1

3

1,1

A,l4

P,lIA,lP,I,A*,O

6

40

22127

3341

* 13

211

1

6

221

7

224

224

2248

49

S = Static Analysis (16% of total errors)P = Path Testing only (25% of total errors)A = Inspection aided by Path Testing (20% of total errors)

= Inspection only (41% of total errors)0 = Error not detected (16% of total errors)* = Error site located or improper correction made (counted as not detected)

August 1979

Al 00A200A400A800

B.B100B200B300B400B500B600

C/E.C200E200E300E500E600

27

(4) From the potential site matrix, a list of can-didate error sites was randomly generated.

(5) At each site in the list either an error of thedesignated type was manually inserted or thesite was rejected as being unsuitable for the er-ror type.

(6) Errors were eliminated from the error set whichcaused a compiler or loader diagnostic.

(7) The 86 errors shown in column 5 of Table 2 wereselected from the remaining errors-using Pro-ject 5 error frequency data. Errors from this setwere eliminated if they caused no change in theoutput. Fifteen errors were rejected due to lackof coverage with the test data, and 22 wereeliminated for which coverage was

without affecting the output. The surerrors, shown in column 6, were used.

COMPUTATIONAL

LOGIC

INPUT/OUTPUT

DATA HANDLING

INTERFACE

DATA DEFINITION

DATA BASE

OTHER

17' TRW STUDY

I | GRC EXPERIMENT

Figure 1. Error frequency in major categories.

Error site execution or reference was verified by anoutput message placed, for the case of executablestatements, at the error site or, for the case of non-executable statements, at the site of reference bysome executable statement on a covered path. The-impact of this evidence is that path testing with thesole goal of execution coverage is not an adequateverification measure. Most software tool developerswhose verification tools include a path testingcapability advocate their usage with data thatdemonstate all specific functions of the software.Even then, stress and other performance testingshould enter into the total test plan.

achieved The experiment. Errors from the seven major

*viving 49 categories Were seeded, one at a time, into the Fortranprogram. The tester (not the error-seeder) was given acompilation and execution listing which gave noclues to the error's location. He was told what was

12.1% wrong with the output and had, as a specification of14% proper program performance, a listing of the correct

output. The task was to find the error using execution24.5% coverage analysis (path testing) or inspection, which-28% ever seemed more appropriate, correct the source,

and execute the corrected program to verify the out-7.8% put. Human and computer times were accounted for9% from the time the tester received the erroneous listing

to the time he delivered the corrected listing.11% To evaluate the types of errors detected by static

13% analysis, all 49 errors were simultaneously seeded in-to the program after determining that they did not in-

7%

terfere with each other in the static sense. Only one8%/o computer required for this evaluation.

8.9%10%

Experimental results16.2%18% Unlike static analysis, which explicitly detects in-

consistencies and locates the offending statement(s),12.5% path testing is a technique that demands skill to in-0% terpret the execution coverage data as well as to

recognize improper program performance from theprogram's output. The analyst must determine, byinspection, which sequences of paths might be in-feasible (unexecutable for any data) and which might

Table 2.Error frequency in major categories.

(1 ) (2) (3) (4) (5) (6)Total Percent Errors

Project 5 Major Project 5 of Percent Errors ManifestedError Categories* Errors* Errors*. Applicable Generated In Output

Computational (A) 92 1 2.1 14 14 7Logic (B) 169 24.5 28 25 13Data Input (C)and7.996

Data Output (E) 55 7.8 9 9 6Data Handling (D) 65 11.0 13 10 7Interface (F) 48 7.0 8 7 4Data Definition (G) 62 8.9 10 8 4Data Base (H) 112 16.2 18 13 8Other (J) 86 12.5 - - -

689 100.0 100 86 49*Data derived from Table 4-2 of TRW study.

28 COMPUTER

I

II

be critical from a functional point of view. Thus, effec-tive path testing requires considerable knowledgeabout the program's intended behavior.For the path testing evaluation phase, we found

that the errors were located using, three detectionmethods: path testing alone, inspection aided by pathtesting, and inspection alone. Some errors were easilydetected without the necessity of instrumenting thecode to get path coverage. Some errors were foundwhen the path coverage reports narrowed the searchto a set of suspicious paths-but then inspection wasused to actually determine the error. Other errorswere found directly by observing the control pathbehavior from the coverage reports and the pathstatement definition listing. In a few cases the wrong"'error"y was found and only some of the incorrectsymptoms disappeared; these are noted in Table 1.Figure 2 shows the frequencies of error categories

detected by the methods described above. Thedashed lines show the effect of some degree of pathtesting coverage by reporting the sum ofpath testingalone and inspection aided by path testing. As one

might expect, logic errors and computation errors

(since they often cause a change in control flow) are

the best candidates for path testing. Errors in thesetwo categories are often the most difficult to locate,unless a detailed design and specification are alsoavailable. Input/output and data definition errors areusually easily detected by inspection alone.More comprehensive results are shown in Table 1.

Note that not all error types were seeded into the pro-gram, owing to project limitations. For each error

COMPUTATIONAL

LOGIC

INPUT/OUTPUT

DATAHANDLING

INTERFACE

DATADEFINITION

DATA BASE

COMPUTATIONAL:7 ERRORS

LOGIC:13 ERRORS

INPUT/OUTPUT:6 ERRORS

DATA HANDLING:7 ERRORS

INTERFACE:4 ERRORS

DATA DEFINITION:4 ERRORS

DATA BASE:8 ERRORS

.1

\\\\\\I

:W.4.4.4.44---w I I I I I I I I I I I I i-M

Tr I I I I I 1 1,1 1 1 1

TFT-nmmumiimmm

Imoomm mzmi

s,\\\\\\\\79

1 2 3 4 5 6 7 8 9ENGINEERING HOURS

29%29%14%29%

35%28%14%14%

17%0%85%0%.

14%14%56%14%

25%0%50%25%

0%25%'75%0%

25%13%37%25%

DETECTION METHODA PATH TESTING ONLYB 1111"ll I INSPECTION AIDED BY PATH TESTINGc INSPECTION ONLY

UNDETECTED ERRORSr METHODS A + B TOGETHER

Figure 2. Path tesling: frequency of detected errors by category-

mLI II HII II IF]1

DETECTION METHODPATH .1 FM INSPK\ TESTING ONLY ECTION ONLY

...INSPECTION AIDED BY PATH TESTING I IUNDETECTED ERROR (TESTER GAVE UP)Figure 3. Path testing: average time expended per error.

August 1979- 29

n....!..... ....'.. ...

i I I I I I

seeded, Table 1 shows the technique used to detect it.An asterisk next to the technique's indicator signi-fies that the erroneous statement was located but the"correction" was not the proper one, or that more in-formation (such as a specification) was needed tomake the proper changes.To assess the value of path testing, an account was

kept of the resources expended. The average engi-neering time in hours for finding each error is shownin Figure 3. Most of the errors detected by inspectionrequired only about 11/2 hours to find and correct. Onthe other hand, the more difficult errors requiringpath testing took about 4 hours.

Static analysis has capabilities for detecting in-finite loops, unreachable code, uninitialized vari-ables, and inconsistencies in variable and parametermode. Some sophisticated compilers have a few ofthese capabilities. Until this experiment was con-ducted, we had no evidence concerning the kinds oferrors that static analysis might find in programs ofsubstantial size. As a preliminary task7 to the experi-ment, we applied static analysis and path testing toeight small programs from TheElements ofProgram-ming Style.8 In that task we found that staticanalysis detected 38 percent of the total 26 errors,and path testing found 70 percent. However, theseeight programs contained numerous uninitializedvariables which weighted the measurement in favorof static analysis.

In our experiment, static analysis detected 16 per-cent (8 errors) of the total 49 seeded errors. We chosenot to use the static assertion consistency checkingcapability which is intended to detect inconsistenciesin input/output parameters. When the experimentbegan, the assertions had to be manually inserted.We also had wanted to use theDAVE9 static analysistool, but the test object had too many violations ofANS Fortran to make the effort feasible. Figure 4shows the frequency of detected errors by major cate-gory, and Table 1 lists each error type found by static

COMPUTATIONAL: 14%7 ERRORS14LOGIC: 14%13 ERRORS

INPUT/OUTPUT: __ 17%6 ERRORS17DATA HANDLING: _8%7 ERRORS 28%

INTERFACE:

4 ERRORS 25%

DATA DEFINITION: 25%4 ERRORS 25%o

DATA BASE:8 ERRORS 0%

Figure 4. Static analysis: frequency of detected errors bycategory.

analysis. One error detected by the graph checkingcapability of the static analyzer was unreachablecode due to a missing IF statement. This error (B400)was not detected by either path testing or inspection.Unreachable code can be very difficult to locate incode filled with statement labels and three-way IFstatements, as was the test object for the experiment.Unreachable code may or may not be harmless, but itis always a warning of possible dangers or inefficientuse of computer resources.While static analysis did not detect a high percent-

age of errors, and while most of the errors it did findwere also detected by path testing, it has the distinctadvantage of being a very economical tool. Only twoengineering hours and 24 seconds of CDC 7600 timewere required to review the static analysis outputand locate the errors. A disadvantage is that if pro-gramming practice allows frequent intentionalmixed mode constructs or mismatching number ofactual and formal parameters, the static analyzer willissue frequent warnings and errors (133 in our experi-ment) that are harmless to the proper execution of theprogram.

Conclusions

Both the error-seeding and error-detection'activ-ities of the experiment provided concrete data forseveral conclusions about the two testing techniques.While the experiment was designed andimplementedin an objective manner and can be repeated by otherinterested researchers, it is not our intention to applya metric or statistical significance to the error detec-tion capabilities of the testing methods. It is our pur-pose, however, to report the types of errors that canbe detected by these techniques. The results ofthe ex-periment can also be used as a reference for tooldevelopers seeking to sharpen their tools for morerigorous error detection.The first major conclusion of the experiment is that

one can probably expect as many as 25 percent of theerrors in a program of 5000 statements to slipthrough acceptance testing, if acceptance testing isbased primarily on executing all possible paths atleast once. It is possible that many of the errors areharmless in one specific application of a general-pur-pose program (e.g., incorrect computations are neverused or are corrected before harm is done). It is morelikely, however, that the data generated to satisfypath testing requirements (percentage of coverage)causes control flow to execute sequences of pathswhich do not exhibit the errors. This is one reasonwhy path testing should always be coupled withstress or boundary condition testing. Overall pathcoverage may not be increased, but the right se-quence ofpaths may be executed to expose the errors.Of the two testing techniques, path testing is the

most effective at detecting logic, computational, anddata base errors. The last two error types are foundwhen their incorrect results affect the control flow. Inour experiment, path testing alone detected 25 per-cent of the seeded errors. This is consistent with the

COMPUTER30

21 percent (of 28 errors) that Howden stated could bereliably found by path testing.3 Path testing togetherwith inspection aided by path testing detected 45 per-cent of the errors seeded in the program. This percen-tage is lower than the 7a-percent path testing errordetection rate we encountered in testing of the eightprograms from TheElements ofProgramming Style.However, in this experittient 41 percent of the errorswere found by inspection, without the need to applythe path testing capability.In a recent report,10 Taylor lists useful testing

techniques for each of the 'TRW error types. Our ex-perience demonstrated that path testing is capable ofdetecting many more error types than he credited it,With. However, his evaluation (not based on empiricaldata) is aimed at grading the usefulness of testingtechniques. We set out to measure capability. Theprimary disadvange of path testing is that it is a time-consuming, ddlnanding process that (as currently im-plemented) ddes not recognize "key" paths or impor-tant sequences of paths.

Static analysis is effective at finding data han-dling, interface, and data definition errors. Accordingto theTRW study, these error categories usually con-stitute only about 20 percent of the total program er-rors. However, static analysis is v'ery economical andshould be utilized aiong with program compilationduring the software development. U

Acknowledgment

This experiment Was sponsored by the Air ForceOffice of Scientific Research under contractF49620-78-C-0103. The United States Government isauthorized to reproduce and distribute reprints forgovernmental purposes notwithstanding any copy-right notation hereon. Nancy Brooks performed theresearch on error type to software linkages andgenerated the candidate errors. Reg Meeson per-formed all path and static analysis testing for the 49individual errors.

References

1. R. E. Fairley, "Tutorial: Static Analysis and DynamicTesting of Computer Software," Computer, Vol. 11,No. 4, Apr. 1978, pp. 14-23.

2. J. B. Goodenough, "Survey of Program TestingIssues," Proc. of the Conf on Research Directions inSoftware Technology, Oct. 1977.

3. W. E. Howden, "Theoretical and Empirical Studies ofProgram Testing," IEEE Trans. Software Eng., Vol.SE-4, July 1978, pp. 293-297.

4. D. M. Andrews and J. P. Benson, Software QualityLaboratory User's Manual, General Research Cor-poration CR-4-770, May 1978.

5. T. Plambeck, The Compleat Traidsman, GeneralResearch Corporation IM-711/2, Sept. 1969.

August1979

6. T. A. Thayer, et al., Sdftware Reliability Study, TRWDefense ahd Space Systems Group, RADC-TR-76-238, Redondo Beach, Calif., 1976.

7. C. Gannon, "Empirical Results for Static Analysisand Path Testing of Small Programs," Ceneral Re-search Corporation RM-2225, Sept. 1978.

8. B. W. Kernighan and P. J. Plauger, The Elements ofProgramming Style, McGraw Hill, 1974.

9. L. D. Fosdick and C. Miesse, The DAVE SystemUser's Manual, University of Colorado, CU-CS-106-77, Mar. i977.

10. R. N. Taylor, Integrdted Testing and VerifitationSystem for Research Flight Software- Design Docu-ment, Boeing Computer Services, NASA ContractReport 159008, Feb. 1979.

S<> Carolyn Gannon is,a member of the soft-ware Quality Department of GeneralResearch Corporation in Santa Bar-bara. For five years she has been in-volved in the development of softwaretesting tools for Fortran, Jovial, andPascal, Her technical interests also in-clude research in software specifica-tion, design, and analysis.Gannon received her BA in mathe-

matics and MSEE in computer science from the Universityof California at Santa Barbara.

so to r f r iesAnL

-F¢t's { sl We'i*sis'sf.'a'a-y"''if

-i-* Tn^ reafr ..-,..,-.-86

TOttnWt YaSJ, n./: Ti W l

\4/'n''an>da''22 W-r''te''Road, Rerda1e,^n

,,,,,, Xprie,s~ste c,,,,, o,,,c,,;t4 g Z w hQ.itnpUs.07s N - -tN- r -e5-t

Reader Service Number 6 31