00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

Post on 14-Apr-2018

217 views 0 download

Transcript of 00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

7/27/2019 00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

http://slidepdf.com/reader/full/00265db4a18919f0bbe40da1ab350a8be93bdpdf 1/1

Modeling Software Quality with Classification Trees 249

rep ort s using CART to model software project pro du ctivi ty [Kitc hen ham (1998)] to

our know ledge, CART has seldom b een used to m odel software quality.

This paper presents practical lessons learned on building classification trees forsoftware quality m odeling. Pr elim ina ry results indica ted th at CART can be useful for

software quality modeling [Khoshgoftaar et al. (1998c), Kho shgoftaar et al. (1998d)].

A case study of a very large telecommunications system used CART to build soft

ware quality models [Naik (1998)], focusing on problems discovered in the field by

customers . The models predicted whether or not modules were fault-prone, based

on various sets of software product and process metrics as predictor variables.

The remainder of this paper presents background on software quality modeling,

background on classification trees, including a summary of the CART algorithm, our

modeling methodology, details on our case study, and conclusions.

2 . S o f t w a r e Q u a l i t y M o d e l i n g

Due to the high cost of correcting problems discovered by customers, the goal of

our modeling is identification of fault-prone mo dules early in develop me nt. Software

quality models are tools for focusing efforts to find faults. Such models yield timely

predictions on a module-by-module basis, enabling one to target high-risk modules.

The field of software metrics assumes that characteristics of software productsand development processes strongly influence the quality of the released product,

and its residual faults, in par ticula r. Th e more complex th e prod uc t is, the mo re

likely it is that developers will make mistakes that cause faults and consequent

failures. Since pro duc t cha racteristics can be measured earlier th an quality, software

metrics can guide improvements to quality before the product is released.

Software metrics research has emphasized models based on software product

metrics. Commonly measured software abstractions include call graphs, control

flow graphs, and statements. For example, fan-in and fan-out [Myers (1978)] are

attributes of a node in a call graph, where each node is an abstraction of a mod

ule and each edge represents a call from one to anothe r. M any software pro duc t

metrics are attributes of a control flow graph in which the nodes represent decision

statements or branch destinations and the edges represent potential f low of control.

McCabe's cyclomatic complexity is one of the best known in this category [McCabe

(1976)]. Lines of code is th e best known stat em en t m etric. Oth er exa mp les are Hal-

stead's counts of operators and operands [Halstead (1977)]. Commercially available

"code analyzers" measure more than fifty static software product metrics at a t ime.

Software process metrics can be derived from project management, problem

reporting, and configuration management systems [Henry et al. (1994)]. Process

metrics are especially important for legacy systems and developments with signifi

cant reuse [Evanco and Agresti (1994)]. For example, preliminary empirical research

on a military system found that software quality models based on process metrics

alone had similar accuracy to models based on the combination of product metrics

and process metrics [Khoshgoftaar et al. (1998a)]. Case studies by K hoshg oftaar