00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

2
Modeling Software Quality with Classification Trees  249 reports using CART to model software project productivity [Kitchenham (1998)] to our knowledge, CART has seldom been used to model software quality. This paper presents practical lessons learned on building classification trees for software quality modeling. Preliminary results indicated that CART can be useful for software quality modeling [Khoshgoftaar  et al.  (1998c), Khoshgoftaar  et al.  (1998d)]. A case study of a very large telecommunications system used CART to build soft ware quality models [Naik (1998)], focusing on problems discovered in the field by customers. The models predicted whether or not modules were  fault-prone based on various sets of software product and process metrics as predictor variables. The remainder of this paper presents background on software quality modeling, background on classification trees, including a summary of the CART algorithm, our modeling methodology, details on our case study, and conclusions. 2 Software Quality Modeling Due to the high cost of correcting problems discovered by customers, the goal of our modeling is identification of  fault-prone  modules early in development. Software quality models are tools for focusing efforts to find faults. Such models yield timely predictions on a module-by-module basis, enabling one to target high-risk modules. The field of software metrics assumes that characteristics of software products and development processes strongly influence the quality of the released product, and its residual faults, in particular. The more complex the product is, the more likely it is that developers will make mistakes that cause faults and consequent failures. Since product characteristics can be measured earlier than quality, software metrics can guide improvements to quality before the product is released. Software metrics research has emphasized models based on  software product metrics.  Commonly measured software abstractions include call graphs, control flow graphs, and statements. For example, fan-in and fan-out [Myers (1978)] are attributes of a node in a call graph, where each node is an abstraction of a mod ule and each edge represents a call from one to another. Many software product metrics are attributes of a control flow graph in which the nodes represent decision statements or branch destinations and the edges represent potential flow of control. McCabe's cyclomatic complexity is one of the best known in this category [McCabe (1976)].  Lines of code is the best known statem ent metric. Other examples are Hal- stead's counts of operators and operands [Halstead (1977)]. Commercially available code analyzers measure more than fifty static software product metrics at a time. Software process metrics  can be derived from project managem ent, problem reporting, and configuration management systems [Henry  et al.  (1994)]. Process metrics are especially important for legacy systems and developments with signifi cant reuse [Evanco and Agresti (1994)]. For example, preliminary empirical research on a military system found that software quality models based on process metrics alone had similar accuracy to models based on the combination of product metrics

Transcript of 00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

Page 1: 00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

7/27/2019 00265___db4a18919f0bbe40da1ab350a8be93bd.pdf

http://slidepdf.com/reader/full/00265db4a18919f0bbe40da1ab350a8be93bdpdf 1/1

Modeling Software Quality with Classification Trees 249

rep ort s using CART to model software project pro du ctivi ty [Kitc hen ham (1998)] to

our know ledge, CART has seldom b een used to m odel software quality.

This paper presents practical lessons learned on building classification trees forsoftware quality m odeling. Pr elim ina ry results indica ted th at CART can be useful for

software quality modeling [Khoshgoftaar et al. (1998c), Kho shgoftaar et al. (1998d)].

A case study of a very large telecommunications system used CART to build soft

ware quality models [Naik (1998)], focusing on problems discovered in the field by

customers . The models predicted whether or not modules were fault-prone, based

on various sets of software product and process metrics as predictor variables.

The remainder of this paper presents background on software quality modeling,

background on classification trees, including a summary of the CART algorithm, our

modeling methodology, details on our case study, and conclusions.

2 . S o f t w a r e Q u a l i t y M o d e l i n g

Due to the high cost of correcting problems discovered by customers, the goal of

our modeling is identification of fault-prone mo dules early in develop me nt. Software

quality models are tools for focusing efforts to find faults. Such models yield timely

predictions on a module-by-module basis, enabling one to target high-risk modules.

The field of software metrics assumes that characteristics of software productsand development processes strongly influence the quality of the released product,

and its residual faults, in par ticula r. Th e more complex th e prod uc t is, the mo re

likely it is that developers will make mistakes that cause faults and consequent

failures. Since pro duc t cha racteristics can be measured earlier th an quality, software

metrics can guide improvements to quality before the product is released.

Software metrics research has emphasized models based on software product

metrics. Commonly measured software abstractions include call graphs, control

flow graphs, and statements. For example, fan-in and fan-out [Myers (1978)] are

attributes of a node in a call graph, where each node is an abstraction of a mod

ule and each edge represents a call from one to anothe r. M any software pro duc t

metrics are attributes of a control flow graph in which the nodes represent decision

statements or branch destinations and the edges represent potential f low of control.

McCabe's cyclomatic complexity is one of the best known in this category [McCabe

(1976)]. Lines of code is th e best known stat em en t m etric. Oth er exa mp les are Hal-

stead's counts of operators and operands [Halstead (1977)]. Commercially available

"code analyzers" measure more than fifty static software product metrics at a t ime.

Software process metrics can be derived from project management, problem

reporting, and configuration management systems [Henry et al. (1994)]. Process

metrics are especially important for legacy systems and developments with signifi

cant reuse [Evanco and Agresti (1994)]. For example, preliminary empirical research

on a military system found that software quality models based on process metrics

alone had similar accuracy to models based on the combination of product metrics

and process metrics [Khoshgoftaar et al. (1998a)]. Case studies by K hoshg oftaar