Nguyen Dissertation

183
IMPROVED SIZE AND EFFORT ESTIMATION MODELS FOR SOFTWARE MAINTENANCE by Vu Nguyen A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2010 Copyright 2010 Vu Nguyen

description

software estimation

Transcript of Nguyen Dissertation

  • IMPROVED SIZE AND EFFORT ESTIMATION MODELS

    FOR SOFTWARE MAINTENANCE

    by

    Vu Nguyen

    A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL

    UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the

    Requirements for the Degree DOCTOR OF PHILOSOPHY

    (COMPUTER SCIENCE)

    December 2010

    Copyright 2010 Vu Nguyen

  • ii

    DEDICATION

    Knh tng ba m,

    Nguyn nh Hong, Nguyn Th Thin

    You survived and struggled through one of the darkest chapters in the history of Vietnam

    to raise and love your children.

  • iii

    ACKNOWLEDGEMENTS

    I am deeply indebted to my advisor, Dr. Barry Boehm, for his encouragement,

    support, and constructive advice. I have learned not only from his deep knowledge and

    experience in software engineering but also from his remarkable personality. I would also

    like to thank A Winsor Brown who brought me to work on the CodeCount project and

    make the tool more useful for the software development community.

    I am grateful to Dr. Bert Steece, whom I regard as my unofficial advisor, for his

    statistical insights that shape my understanding of statistical analysis applied to this

    dissertation work. My thanks also go to other members of my qualifying exam and

    defense committees, Dr. Nenad Medvidovi, Dr. Ellis Horowitz, and Dr. Rick Selby.

    Their criticism, encouragement, and suggestion effectively helped to shape my work.

    I am thankful to my friends and faculty for their constructive feedback on my

    work, including Dan Port, Tim Menzies, Jo Ann Lane, Brad Clark, Don Reifer,

    Supannika Koolmanojwong, Pongtip Aroonvatanaporn, and Qi Li. Other colleagues Julie

    Sanchez of the USC Center for Systems and Software Engineering, Marilyn A Sperka,

    Ryan E Pfeiffer, and Michael Lee of the Aerospace Corporation, and Lori Vaughan of

    Northrop Grumman also assisted me in various capacities.

    This work was made possible with the support for data collection from the

    Affiliates of Center for Systems and Software Engineering and two organizations in

    Vietnam and Thailand. Especially, Phuong Ngo, Ngoc Do, Phong Nguyen, Long Truong,

  • iv

    Ha Ta, Hoai Tang, Tuan Vo, and Phongphan Danphitsanuphan have provided me

    tremendous help in collecting historical data from the organizations in Vietnam and

    Thailand.

    I awe much to the Fulbright Program for financial support during my Masters

    program at USC, giving me an opportunity to fulfill my dream of studying abroad and

    doing research with remarkable researchers in the software engineering research

    community. My cultural and educational experiences in the United States made possible

    by the program are priceless.

  • v

    TABLE OF CONTENTS

    Dedication ........................................................................................................................... ii

    Acknowledgements............................................................................................................ iii

    List of Tables .................................................................................................................... vii

    List of Figures .................................................................................................................... ix

    Abbreviations...................................................................................................................... x

    Abstract ............................................................................................................................. xii

    Chapter 1. Introduction................................................................................................. 1 1.1 The Problem........................................................................................................ 2 1.2 A Solution ........................................................................................................... 2 1.3 Research Hypotheses .......................................................................................... 3 1.4 Definitions........................................................................................................... 6

    Chapter 2. Related Work .............................................................................................. 7 2.1 Software Sizing................................................................................................... 7

    2.1.1 Code-based Sizing Metrics ......................................................................... 7 2.1.2 Functional Size Measurement (FSM) ....................................................... 10

    2.2 Major Cost Estimation Models ......................................................................... 16 2.2.1 SLIM......................................................................................................... 17 2.2.2 SEER-SEM ............................................................................................... 19 2.2.3 PRICE-S.................................................................................................... 22 2.2.4 KnowledgePlan (Checkpoint)................................................................... 24 2.2.5 COCOMO................................................................................................. 25

    2.3 Maintenance Cost Estimation Models .............................................................. 30 2.3.1 Phase-Level Models.................................................................................. 30 2.3.2 Release-Level Models............................................................................... 32 2.3.3 Task-Level Models ................................................................................... 36 2.3.4 Summary of Maintenance Estimation Models.......................................... 44

    Chapter 3. The Research Approach............................................................................ 46 3.1 The Modeling Methodology ............................................................................. 46 3.2 The Calibration Techniques.............................................................................. 52

    3.2.1 Ordinary Least Squares Regression .......................................................... 52 3.2.2 The Bayesian Analysis.............................................................................. 53 3.2.3 A Constrained Multiple Regression Technique........................................ 55

  • vi

    3.3 Evaluation Strategies ........................................................................................ 58 3.3.1 Model Accuracy Measures ....................................................................... 58 3.3.2 Cross-Validation ....................................................................................... 60

    Chapter 4. the COCOMO II Model for Software Maintenance ................................. 62 4.1 Software Maintenance Sizing Methods ............................................................ 62

    4.1.1 The COCOMO II Reuse and Maintenance Models.................................. 64 4.1.2 A Unified Reuse and Maintenance Model................................................ 68

    4.2 COCOMO II Effort Model for Software Maintenance..................................... 75

    Chapter 5. Research Results ....................................................................................... 80 5.1 The Controlled Experiment Results.................................................................. 80

    5.1.1 Description of the Experiment .................................................................. 80 5.1.2 Experiment Results ................................................................................... 84 5.1.3 Limitations of the Experiment .................................................................. 91

    5.2 Delphi Survey Results....................................................................................... 92 5.3 Industry Sample Data........................................................................................ 96 5.4 Model Calibrations and Validation................................................................. 104

    5.4.1 The Bayesian Calibrated Model.............................................................. 105 5.4.2 The Constrained Regression Calibrated Models..................................... 110 5.4.3 Reduced Parameter Models .................................................................... 114 5.4.4 Local Calibration .................................................................................... 117

    5.5 Summary ......................................................................................................... 122

    Chapter 6. Contributions and Future Work .............................................................. 123 6.1 Contributions................................................................................................... 123 6.2 Future Work .................................................................................................... 125

    Bibliography ................................................................................................................... 129

    Appendix A. UFNM and AA Rating Scale .................................................................... 138 Appendix B. Delphi Survey Form .................................................................................. 139 Appendix C. Data Collection Forms............................................................................... 156 Appendix D. The COCOMO II.2000 Parameters Used in the Experiment.................... 158 Appendix E. Histograms for the Cost Drivers ................................................................ 159 Appendix F. Correlation Matrix for Effort, Size, and Cost Drivers ............................... 170

  • vii

    LIST OF TABLES

    Table 2-1. COCOMO Sizing Models ............................................................................... 27

    Table 2-2. COCOMO II Calibrations ............................................................................... 29

    Table 2-3. Maintenance Cost Estimation Models............................................................. 42

    Table 4-1. Maintenance Models Initial Cost Drivers ...................................................... 76

    Table 4-2. Ratings of Personnel Experience Factors (APEX, PLEX, LTEX).................. 77

    Table 4-3. Ratings of RELY ............................................................................................. 78

    Table 5-1. Summary of results obtained from fitting the models ..................................... 89

    Table 5-2. Differences in Productivity Ranges................................................................. 93

    Table 5-3. Rating Values for Cost Drivers from Delphi Survey ...................................... 94

    Table 5-4. RELY Rating Values Estimated by Experts.................................................... 96

    Table 5-5. Maintenance Core Data Attributes .................................................................. 98

    Table 5-6. Summary Statistics of 86 Data Points ........................................................... 101

    Table 5-7. Differences in Productivity Ranges between Bayesian Calibrated Model and COCOMO II.2000......................................................................................... 105

    Table 5-8. Rating Values for Cost Drivers from Bayesian Approach ............................ 107

    Table 5-9. Estimation Accuracies Generated by the Bayesian Approach ...................... 109

    Table 5-10. Estimation Accuracies of COCOMO II.2000 on the Data Set.................... 110

    Table 5-11. Retained Cost Drivers of the Constrained Models ...................................... 111

    Table 5-12. Estimation Accuracies of Constrained Approaches .................................... 112

    Table 5-13. Estimation Accuracies of Constrained Approaches using LOOC Cross-validation ..................................................................................................... 112

  • viii

    Table 5-14. Correlation Matrix for Highly Correlated Cost Drivers .............................. 115

    Table 5-15. Estimation Accuracies of Reduced Calibrated Models ............................... 116

    Table 5-16. Stratification by Organization ..................................................................... 119

    Table 5-17. Stratification by Program ............................................................................ 119

    Table 5-18. Stratification by Organization on 45 Releases ............................................ 120

  • ix

    LIST OF FIGURES

    Figure 3-1. The Modeling Process.................................................................................... 48

    Figure 3-2. A Posteriori Bayesian Update in the Presence of Noisy Data RUSE ............ 54

    Figure 3-3. Boxplot of mean of PRED(0.3) on the COCOMO II.2000 data set .............. 56

    Figure 3-4. Boxplot of mean of PRED(0.3) on the COCOMO 81 data set ...................... 56

    Figure 4-1. Types of Code ................................................................................................ 63

    Figure 4-2. Nonlinear Reuse Effects................................................................................. 65

    Figure 4-3. AAM Curves Reflecting Nonlinear Effects ................................................... 72

    Figure 5-1. Effort Distribution.......................................................................................... 85

    Figure 5-2. Maintenance Project Collection Range.......................................................... 97

    Figure 5-3. Distribution of Equivalent SLOC ................................................................. 101

    Figure 5-4. Correlation between PM and EKSLOC ....................................................... 103

    Figure 5-5. Correlation between log(PM) and log(EKSLOC)........................................ 103

    Figure 5-6. Adjusted Productivity for the 86 Releases ................................................... 104

    Figure 5-7. Adjusted Productivity Histogram for the 86 Releases ................................. 104

    Figure 5-8. Productivity Ranges Calibrated by the Bayesian Approach ........................ 108

    Figure 5-9. Productivity Ranges Generated by CMRE .................................................. 113

    Figure 5-10. Distribution of TIME ................................................................................. 116

    Figure 5-11. Distribution of STOR................................................................................. 116

  • x

    ABBREVIATIONS

    COCOMO Constructive Cost Model

    COCOMO II Constructive Cost Model version II

    CMMI Capability Maturity Model Integration

    EM Effort Multiplier

    PM Person Month

    OLS Ordinary Least Squares

    MSE Mean Square Error

    MAE Mean Absolute Error

    CMSE Constrained Minimum Sum of Square Errors

    CMAE Constrained Minimum Sum of Absolute Errors

    CMRE Constrained Minimum Sum of Relative Errors

    MMRE Mean of Magnitude of Relative Errors

    MRE Magnitude of Relative Errors

    PRED Prediction level

    ICM Incremental Commitment Model

    PR Productivity Range

    SF Scale Factor

    MODEL PARAMETERS Size Parameters AA Assessment and Assimilation

    AAF Adaptation Adjustment Factor

    AAM Adaptation Adjustment Multiplier

    AKSLOC Kilo Source Lines of Code of the Adapted Modules

    CM Code Modified

    DM Design Modified

    EKSLOC Equivalent Kilo Source Lines of Code

  • xi

    ESLOC Equivalent Source Lines of Code

    IM Integration Modified

    KSLOC Kilo Source Lines of Code

    RKSLOC Kilo Source Lines of Code of the Reused Modules

    SLOC Source Lines of Code

    SU Software Understanding

    UNFM Programmer Unfamiliarity

    Cost Drivers ACAP Analyst Capability

    APEX Applications Experience

    CPLX Product Complexity

    DATA Database Size

    DOCU Documentation Match to Life-Cycle Needs

    FLEX Development Flexibility

    LTEX Language and Tool Experience

    PCAP Programmer Capability

    PCON Personnel Continuity

    PERS Personnel Capability

    PLEX Platform Experience

    PMAT Equivalent Process Maturity Level

    PREC Precedentedness of Application

    PREX Personnel Experience

    PVOL Platform Volatility

    RELY Required Software Reliability

    RESL Risk Resolution

    SITE Multisite Development

    STOR Main Storage Constraint

    TEAM Team Cohesion

    TIME Execution Time Constraint

    TOOL Use of Software Tools

  • xii

    ABSTRACT

    Accurately estimating the cost of software projects is one of the most desired

    capabilities in software development organizations. Accurate cost estimates not only help

    the customer make successful investments but also assist the software project manager in

    coming up with appropriate plans for the project and making reasonable decisions during

    the project execution. Although there have been reports that software maintenance

    accounts for the majority of the software total cost, the software estimation research has

    focused considerably on new development and much less on maintenance.

    In this dissertation, an extension to the well-known model for software estimation,

    COCOMO II, is introduced for better determining the size of maintained software and

    improving the effort estimation accuracy of software maintenance. While COCOMO II

    emphasizes the cost estimation of software development, the extension captures various

    characteristics of software maintenance through a number of enhancements to the

    COCOMO II size and effort estimation models to support the cost estimation of software

    maintenance.

    Expert input and an industry data set of eighty completed software maintenance

    projects from three software organizations were used to build the model. A number of

    models were derived through various calibration approaches, and these models were then

    evaluated using the industry data set. The full model, which was derived through the

    Bayesian analysis, yields effort estimates within 30% of the actuals 51% of the time,

    outperforming the original COCOMO II model when it was used to estimate these

  • xiii

    projects by 34%. Further performance improvement was obtained when calibrating the

    full model to each individual program, generating effort estimates within 30% of the

    actuals 80% of the time.

  • 1

    Chapter 1. INTRODUCTION

    Software maintenance is an important activity in software engineering. Over the

    decades, software maintenance costs have been continually reported to account for a

    large majority of software costs [Zelkowitz 1979, Boehm 1981, McKee 1984, Boehm

    1988, Erlikh 2000]. This fact is not surprising. On the one hand, software environments

    and requirements are constantly changing, which lead to new software system upgrades

    to keep pace with the changes. On the other hand, the economic benefits of software

    reuse have encouraged the software industry to reuse and enhance the existing systems

    rather than to build new ones [Boehm 1981, 1999]. Thus, it is crucial for project

    managers to estimate and manage the software maintenance costs effectively.

    Software cost estimation plays an important role in software engineering practice,

    often determining the success or failure of contract negotiation and project execution.

    Cost estimations deliverables, such as effort, schedule, and staff requirements are

    valuable pieces of information for project formation and execution. They are used as key

    inputs for project bidding and proposal, budget and staff allocation, project planning,

    progress monitoring and control, etc. Unreasonable and unreliable estimates are a major

    cause of project failure, which is evidenced by a CompTIA survey of 1,000 IT

    respondents in 2007, finding that two of the three most-cited causes of IT-project failure

    are concerned with unrealistic resource estimation [Rosencrance 2007].

    Recognizing the importance of software estimation, the software engineering

    community has put tremendous effort into developing models in order to help estimators

  • 2

    generate accurate cost estimates for software projects. In the last three decades, many

    software estimation models and methods have been proposed and used, such as

    COCOMO, SLIM, SEER-SEM, and Price-S.

    1.1 THE PROBLEM

    COCOMO is the most popular non-proprietary software estimation model in

    literature as well as in industry. The model was built using historical data and

    assumptions of software (ab initio) development projects. With a few exceptions, the

    models properties (e.g., forms, cost factors and constants) are supposed to be applicable

    to estimating the cost of software maintenance. However, inherent differences exist

    between software development and maintenance. For example, software maintenance

    depends on quality and complexity of the existing architecture, design, source code, and

    supporting documentation. The problem is that these differences make the models

    properties less relevant in the software maintenance context, resulting in low estimation

    accuracies achieved. Unfortunately, there is a lack of empirical studies that evaluate and

    extend COCOMO or other models, in order to better estimate the cost of software

    maintenance.

    1.2 A SOLUTION

    Instead of using the COCOMO model that was designed for new development,

    what if we build an extension that takes into account characteristics of software

    maintenance. Thus, my thesis is

  • 3

    Improved COCOMO models that allow estimators to better determine the

    equivalent size of maintained software and estimate maintenance effort can improve the

    accuracy of effort estimation of software maintenance.

    The goal of this study is to investigate such models. I will test a number of

    hypotheses testing the accuracy of alternative models that predict the effort of software

    maintenance projects, the explanatory power of such cost drivers as execution time and

    memory constraints, and potential effects on quality attributes as better determinants of

    the effort involved in deleting code. Using the industry dataset that I have collected, I will

    also evaluate the differences in the effects of the COCOMO cost drivers on the project's

    effort between new development and maintenance projects. Finally, COCOMO models

    for software maintenance will be introduced and validated using industry data sets.

    1.3 RESEARCH HYPOTHESES

    The main research question that this study attempts to address, reflecting the

    proposed solution, is as follows

    Are there extended COCOMO models that allow estimators to better determine the

    equivalent size of maintained software and estimate maintenance effort that can improve

    the accuracy of effort estimation of software maintenance?

    To address this question, I implement the modeling process as described in

    Section 3.2. The process involves testing a number of hypotheses. The inclusion of

    hypothesis tests in the process helps frame the discussion and analysis of the results

  • 4

    obtained during the modeling process. This section summarizes the hypotheses to be

    tested in this dissertation.

    It is clear from prior work discussed in Chapter 2 that there is no common

    maintenance size metric used as compared to the size for new development. In software

    (ab initio) development, a certain level of consistency is achieved; either SLOC or

    Function Point is widely used. In software maintenance, on the other hand, some

    maintenance effort estimation models use the sum of SLOC added, modified, and

    deleted, others do not include the SLOC deleted. Due to this inconsistency, evidence on

    whether the SLOC deleted is a significant predictor of effort is desired. Thus, the

    following hypothesis is investigated.

    Hypothesis 1: The measure of the SLOC deleted from the modified modules is not

    a significant size metric for estimating the effort of software maintenance projects.

    This hypothesis was tested through a controlled experiment of student

    programmers performing maintenance tasks. The SLOC deleted metric was used as a

    predictor of effort in linear regression models, and the significance of the coefficient

    estimate of this predictor was tested using the typical 0.05 level of significance.

    Software maintenance is different from new software development in many ways.

    The software maintenance team works on the system of which the legacy code and

    documentation, thus it is constrained by the existing systems requirements, architecture,

    design, etc. This characteristic leads to differences in project activities to be performed,

    system complexity, personnel necessary skill set, etc. As a result, we expect that the

    impact of each cost driver on the effort of software maintenance would be significantly

  • 5

    different in comparison with the COCOMO II.2000 model. We, therefore, investigate the

    following hypothesis.

    Hypothesis 2: The productivity ranges of the cost drivers in the COCOMO II

    model for maintenance are different from those of the cost drivers in the COCOMO

    II.2000 model.

    The productivity range specifies the maximum impact of a cost driver on effort. In

    other words, it indicates the percentage of effort increase or decrease if the rating of a

    cost driver increases the lowest to the highest level. This hypothesis was tested by

    performing simple comparisons on the productivity ranges of the cost drivers between the

    two models.

    Finally, the estimation model must be validated and compared with other

    estimation approaches regarding its estimation performance. As the model for

    maintenance proposed in this study is an extension of the COCOMO II model, we will

    analyze its performance with that of the COCOMO II model and other two

    unsophisticated but commonly used approaches including the simple linear regression

    and the productivity index.

    Hypothesis 3: The COCOMO II model for maintenance outperforms the

    COCOMO II.2000 model when estimating the effort of software maintenance projects.

    Hypothesis 4: The COCOMO II model for maintenance outperforms the simple

    linear regression model and the productivity index estimation method.

  • 6

    Hypotheses 3 and 4 were tested by comparing the estimation performance of

    various models using the estimation accuracy metrics MMRE and PRED.

    1.4 DEFINITIONS

    This section provides definitions of common terms and phrases referred to

    throughout this dissertation. Software maintenance or maintenance is used to refer to the

    work of modifying, enhancing, and providing cost-effective support to the existing

    software. Software maintenance in this definition has a broader meaning than the IEEE

    definition given in [IEEE 1999] as it includes minor and major functional enhancements

    and error corrections. As opposed to software maintenance, new (ab initio) development

    refers to the work of developing and delivering the new software product.

    COCOMO II is used to refer to a set of estimation models that were developed

    and released in [Boehm 2000b] as a major extension of COCOMO to distinguish itself

    from the version originally created and published in [Boehm 1981]. COCOMO II.2000

    refers to the COCOMO II model whose constants and cost driver ratings were released in

    [Boehm 2000b]. COCOMO II model for maintenance is used to indicate the sizing

    method and effort estimation model for software maintenance.

  • 7

    Chapter 2. RELATED WORK

    Software cost estimation has attracted tremendous attention from the software

    engineering research community. A number of studies have been published to address

    cost estimation related problems, such as software sizing, software productivity factors,

    cost estimation models for software development and maintenance. This chapter presents

    a review of software sizing techniques in Section 2.1, major cost estimation models in

    Section 2.2, and maintenance cost estimation models in Section 2.3.

    2.1 SOFTWARE SIZING

    Size is one of the most important attributes of a software product. It is a key

    indicator of software cost and time; it is also a base unit to derive other metrics for

    software project measurements, such as productivity and defect density. This section

    describes the most popular sizing metrics and techniques that have been proposed and

    applied in practice. These techniques can be categorized into code-based sizing metrics

    and functional size measurements.

    2.1.1 CODE-BASED SIZING METRICS

    Code-based sizing metrics measure the size or complexity of software using the

    programmed source code. Because a significant amount of effort is devoted to

    programming, it is believed that an appropriate measure correctly quantifying the code

  • 8

    can be a perceivable indicator of software cost. Halsteads software length equation based

    on programs operands and operators, McCabes Cyclometic Complexity, number of

    modules, source lines of code (SLOC) among others have been proposed and used as

    code-based sizing metrics. Of these, SLOC is the most popular. It is used as a primary

    input by most major cost estimation models, such as SLIM, SEER-SEM, PRICE-S,

    COCOMO, and KnowledgePlan (see Section 2.2).

    Many different definitions of SLOC exist. SLOC can be the number of physical

    lines, the number of physical lines excluding comments and blank lines, or the number of

    statements commonly called logical SLOC, etc. To help provide a consistent SLOC

    measurement, the SEI published a counting framework that consists of SLOC definitions,

    counting rules, and checklists [Park 1992]. Boehm et al. adapted this framework for use

    in the COCOMO models [Boehm 2000b]. USCs Center for Systems and Software

    Engineering (USC CSSE) has published further detailed counting rules for most of the

    major languages along with the CodeCount tool1. In COCOMO, the number of

    statements or logical SLOC, is the standard SLOC input. Logical SLOC is less sensitive

    to formats and programming styles, but it is dependent on the programming languages

    used in the source code [Nguyen 2007].

    For software maintenance, the count of added/new, modified, unmodified,

    adapted, reused, and deleted SLOC can be used. Software estimation models usually

    aggregate these measures in a certain way to derive a single metric commonly called

    effective SLOC or equivalent SLOC [Boehm 2000b, SEER-SEM, Jorgensen 1995, Basili 1 http://csse.usc.edu

  • 9

    1996, De Lucia 2005]. Surprisingly, there is a lack of consensus on how to measure

    SLOC for maintenance work. For example, COCOMO, SLIM, and SEER-SEM exclude

    the deleted SLOC metric while KnowledgePlan and PRICE-S include this metric in their

    size measures. Several maintenance effort estimation models proposed in the literature

    use the size metric as the sum of SLOC added, modified, and deleted [Jorgensen 1995,

    Basili 1996].

    SLOC has been widely accepted for several reasons. It has been shown to be

    highly correlated with the software cost; thus, they are relevant inputs for software

    estimation models [Boehm 1981, 2000b]. In addition, code-based metrics can be easily

    and precisely counted using software tools, eliminating inconsistencies in SLOC counts,

    given that the same counting rules are applied. However, the source code is not available

    in early project stages, which means that it is difficult to accurately measure SLOC until

    the source code is available. Another limitation is that SLOC is dependent on the

    programmers skills and programming styles. For example, an experienced programmer

    may write fewer lines of code than an inexperienced one for the same purpose, resulting

    in a problem called productivity paradox [Jones 2008]. A third limitation is a lack of a

    consistent standard for measurements of SLOC. As aforementioned, SLOC could mean

    different things, physical lines of code, physical lines of code excluding comments and

    blanks, or logical SLOC. There is also no consistent definition of logical SLOC [Nguyen

    2007]. The lack of consistency in measuring SLOC can cause low estimation accuracy as

    described in [Jeffery 2000]. A forth limitation is that SLOC is technology and language

  • 10

    dependent. Thus, it is difficult to compare productivity gains of projects of varying

    technologies.

    2.1.2 FUNCTIONAL SIZE MEASUREMENT (FSM)

    Having faced these limitations, Albrecht developed and published the Function

    Point Analysis (FPA) method as an alternative to code-based sizing methods [Albrecht

    1979]. Albrecht and Gaffney later extended and published the method in [Albrecht and

    Gaffney 1983]. The International Function Point Users Group (IFPUG), a non-profit

    organization, was later established to maintain and promote the practice. IFPUG has

    extended and published several versions of the FPA Counting Practices Manual to

    standardize the application of FPA [IFPUG 2004, 1999]. Other significant extensions to

    the FPA method have been introduced and widely applied in practice, such as Mark II

    FPA [Symons 1988] and COSMIC-FFP [Abran 1998].

    2.1.2.1 FUNCTION POINTS ANALYSIS (FPA)

    FPA takes into account both static and dynamic aspects of the system. The static

    aspect is represented by data stored or accessed by the system, and the dynamic aspect

    reflects transactions performed to access and manipulate the data. FPA defines two data

    function types (Logical Internal File, External Interface File) and three transaction

    function types (External Input, External Output, Query). Function point counts are the

    sum of the scores that are assigned to each of the data and the transactional functions.

    The score of each of the data and transaction functions is determined by its type and its

  • 11

    complexity. Function counts are then adjusted by a system complexity factor called Value

    Adjustment Factor (VAF) to obtain adjusted function point counts.

    The IFPUGs Function Point Counting Practices Manual (CPM) provides

    guidelines, rules, and examples for counting function points. The manual specifies three

    different types of function point counts for the Development project, the Enhancement

    project, and the Application count. The Development project type refers to the count of

    functions delivered to the user in the first installation of the new software. The

    Enhancement project type refers to function point counts for modifications made to the

    preexisting software. And the Application function point count measures the functions

    provided to the user by a software product and is referred to as the baseline function point

    count. It can be considered the actual function point count of the development project

    developing and delivering the system. Thus, the application function point count can be

    determined by applying the same process given for the development project if the user

    requirements can be obtained from the working software.

    In FPA, the enhancement project involves the changes that result in functions

    added, modified or deleted from the existing system. The procedure and complexity

    scales are the same as those of the development, except that it takes into account changes

    in complexity of the modified functions and the overall system characteristics. The

    process involves identifying added, modified, deleted functions and determining value

    adjustment factors of the system before and after changes.

  • 12

    The Effective Function Point count (EFP) of the enhancement project is computed

    using the formula

    EFP = (ADD + CHGA) * VAFA + DEL * VAFB (Eq. 2-1)

    Where,

    ADD is the unadjusted function point count of added functions.

    CHGA is the unadjusted function point count of functions that are obtained by

    modifying the preexisting ones. It is important to note that CHGA counts any

    function that is modified, regardless of how much the function is changed.

    DEL is the unadjusted function point count of deleted functions.

    VAFA is the Value Adjustment Factor of the application after the

    enhancement project is completed.

    VAFB is the Value Adjustment Factor of the preexisting application.

    2.1.2.2 MARK II FUNCTION POINT ANALYSIS

    In 1988, Symons [1988] proposed Mark II Function Point Analysis (MkII FPA)

    as an extension to Albrechts FPA. MkII FPA was later extended and published by the

    United Kingdom Software Metrics Association (UKSMA) in the MkII FPA Counting

    Practices Manual [UKSMA 1998]. The MkII FPA method is a certified ISO as an

    international standard FSM method [ISO 2002].

  • 13

    MkII FPA measures the functionality of a software system by viewing the

    software system as consisting of logical transactions. Each logical transaction is a finest-

    grained unit of a self-consistent process that is recognizable by the user. The logical

    transaction consists of three constituent parts: input, processing, and output components.

    The input and output components contain data element types that are sent across the

    application boundary and processed by the processing component. The processing

    component handles the input and output by referencing data entity types. Conceptually,

    MkII FPAs definitions of data entity types and data element types are similar to those of

    ILFs/EIFs and DETs in Albrechts FPA.

    The size of the input and output components is the count of data element types,

    and the size of the processing component is the count of data entity types. The MkII

    function point count, which is referred to as MFI, is determined by computing a weighted

    sum of the sizes of the input, processing, and output components of all logical

    transactions. That is,

    MFI = Wi Ni + WeNe + WoNo (Eq. 2-2)

    Where, Wi, We, Wo are the weights of input data element types, data entity types,

    and output data element types, respectively; and Ni, Ne, No are the counts of input data

    element types, data entity types, and output data element types, respectively. The weights

    Wi, We, Wo are calibrated using historical data by relating MFI to the effort required to

    develop the respective functions. In the CPM version 1.3.1 [UKSMA 1998], the industry

    average weights are as Wi = 0.58, We = 1.66, and Wo = 0.26.

  • 14

    For sizing the software maintenance work (referred to as changes in CPM

    1.3.1), the MkII function point count includes the size of the logical transactions that are

    added, changed, or deleted. This involves counting all individual input/output data

    element types and data entity types that are added, modified, or deleted. The formula (Eq.

    2-2) can be used to calculate the MkII function point count of maintenance work, where

    Ni, Ne, No are treated as the counts of input data element types, data entity types, and

    output data element types that are added, modified, or deleted.

    2.1.2.3 COSMIC FFP

    Full Function Point, which is often referred to as FFP 1.0, is a FSM method

    proposed by St-Pierre et al. [St-Pierre 1997]. FFP was designed as an extension to IFPUG

    FPA to better measure the functional size of real-time software. The method was later

    extended and renamed to COSMIC-FFP by the Common Software Measurement

    International Consortium (COSMIC). Several extensions have been published, and the

    latest version is COSMIC-FFP 2.2 [COSMIC 2003]. COSMIC-FFP has been certificated

    as an ISO international standard [ISO 2003].

    COSMIC-FFP views the functionality of a software application as consisting of

    functional processes, each having sub-processes. There are two types of sub-processes,

    data movement type and data manipulation type. COSMIC-FFP does not handle the data

    manipulation type separately, but it assumes some association between the two types. It

    defines four types of data movement sub-processes, including Entry, Exit, Read, and

    Write. An Entry is a movement of the data attributes contained in one data group from

  • 15

    the outside to the inside of the application boundary; an Exit moves a data group in the

    opposite direction to that of the entry; and a Read or a Write refers to a movement of a

    data group from or to storage. The COSMIC-FFP measurement method determines the

    functional size by measuring the data movement sub-processes, each moving exactly one

    data group. That is, the COSMIC-FFP function point count, which is measured in cfsu

    (COSMIC Functional Size Unit), is computed by summing all Entries, Exits, Reads, and

    Writes. The complexity of the measurement procedure involves identifying application

    layers, application boundaries, and data groups that each sub-process handles. Detailed

    rules and guidelines for this process are given in the COSMIC-FFP measurement manual

    [COSMIC 2003].

    In COSMIC-FFP, the functional size of the software maintenance work is the

    simple sum of all Entry, Exit, Read, and Write data movement sub-processes whose

    functional processes are affected by the change. By this calculation, COSMIC-FFP

    assumes that three different types of change to the functionality (add, modify, and delete)

    have the same level of complexity (e.g., the size of adding a function is counted the same

    as that of modifying or deleting the function). That is, the COSMIC-FFP count for the

    maintenance work (Sizecfsu) is computed as

    Sizecfsu = Sizeadded + Sizemodified + Sizedeleted (Eq. 2-3)

    Where, Sizeadded, Sizemodified, and Sizedeleted are the COSMIC-FFP counts measured

    in csfu of the added, modified, and deleted data movements, respectively.

  • 16

    The above-discussed functional size measurement methods measure the size of

    software maintenance consistently in the sense that added, modified, and deleted

    functions are all included in the count, and they are assigned the same weight. Thus, the

    number of function point counts assigned for a function is a constant regardless of

    whether the function is added, modified, or deleted. This calculation also implies that the

    effort required to add a function is expected to be the same as the effort to delete or

    modify the same function.

    2.2 MAJOR COST ESTIMATION MODELS

    Many estimation models have been proposed and applied over the years. Instead

    of describing them all, this section provides a brief review of major estimation models

    that have been developed, continued to be applied, and marketed by respective

    developers. These models include SLIM, SEER-SEM, PRICE-S, KnowledgePlan, and

    COCOMO. There are several reasons for this selection. First, they represent the core set

    of models that was developed in the early 1980s and 1990s. Second, they are still being

    investigated and used widely in practice and literature. Their long history of extensions

    and adoptions is proof of their robustness and usefulness. Third, these models perform

    estimation for a broad range of software development and maintenance activities,

    covering a number of phases of software lifecycles such as requirements, architecture,

    implementation, testing, and maintenance.

  • 17

    2.2.1 SLIM

    SLIM is one of the most popular cost estimation models that has been in the

    market for decades. The model was originally developed in the late 1970s by Larry

    Putnam of Quantitative Software Measurement2, and its mathematical formulas and

    analysis were published in [Putnam and Myers 1992]. As the model is proprietary, the

    subsequent upgrades of the model structures and mathematical formulas are not available

    in the public domain.

    Generally, the SLIM model assumes that the staffing profile follows a form of

    Rayleigh probability distribution of project staff buildup over time. The shapes and sizes

    of the Rayleigh curve reflect the project size, manpower buildup index, and other

    productivity parameters. The Rayleigh staffing level at time t is presented as

    = 22

    22)(

    dtt

    d

    tetKtp (Eq. 2-4)

    Where K is the total lifecycle effort and td the schedule to the peak of the staffing

    curve. The quality 2dt

    KD = is considered staffing complexity of the project. The total

    lifecycle effort is calculated using the project size S, the technology factor C, and td, and

    is defined as

    433 1

    dtx

    CSK = (Eq. 2-5)

    2 www.qsm.com

  • 18

    Boehm et al. note that some assumptions of staffing profile following the

    Rayleigh distribution do not always hold in practice [Boehm 2000a]. For example, some

    development practices such as maintenance and incremental development may employ a

    constant level of staff. In subsequent adjustments, SLIM handles this limitation by

    allowing the staffing profile to be adjusted by a staffing curve parameter. There are

    multiple project lifecycle phases defined by the model, such as feasibility study,

    functional design, main build (development), and maintenance. Each lifecycle phase may

    have a different staffing curve parameter. In the main build phase, for example, the

    staffing curve parameter can specify the curve as Medium Front Load (for staff peaking

    at 40% of the phase), Medium Rear Load (peaking at 80% of the phase), or Rear Load

    (peaking at the end of the phase). In the maintenance phase, this parameter specifies a flat

    staffing curve.

    SLIM views software maintenance as a software lifecycle phase following the

    main build or development phase. The maintenance phase may have major

    enhancements, minor enhancements, and baseline support including emergency fixes,

    help desk support, infrastructure upgrades, operational support, small research projects,

    etc. The maintenance phase can be estimated independently with the other phases.

    The model uses the effective SLOC as a unit of project size. Function points and

    user-defined metrics such as the number of modules, screens, etc. can be used, but they

    have to be converted to effective SLOC using a gear factor. SLIM counts new code and

    modified code, but it excludes deleted code. Clearly, the model assumes that new code

    and modified code have the same influence on the maintenance effort.

  • 19

    2.2.2 SEER-SEM

    SEER-SEM is a commercial and proprietary model developed and marketed by

    Galorath, Inc3. This model is an extension of the Jensen model [Jensen 1983] from which

    model structures and parameters are extended while sharing the same core formulas. Like

    SLIM, the model uses the Rayleigh probability distribution of staffing profile versus time

    to determine development effort. The size of the project and other parameters can change

    the Rayleigh curve which then gives estimates for effort, time, and peak staff

    accordingly.

    In SEER-SEM, the traditional Rayleigh staffing level at time t is calculated as

    = 22

    22)(

    dtt

    d

    tetKtp (Eq. 2-6)

    Where K is the total life cycle effort and td the time to the peak of the staffing

    curve; these terms are calculated as

    2.13

    4.0

    =

    teCS

    xDKe

    (Eq. 2-7)

    4.0

    2.0

    =

    te

    ed C

    SDt (Eq. 2-8)

    Where D is the staffing complexity, Se the effective size, and Cte the effective

    technology. Although sharing the same meaning of D with SLIM, SEER-SEM defines D

    3 www.galorath.com

  • 20

    as 3dt

    K , which is slightly different from that of SLIM. The derivative of the Rayleigh curve

    p(t) at t = 0 is defined as staffing rate, generally measuring the number of people added

    to the project per year. The effective technology Cte is an aggregated factor determined by

    using a set of technology and environment parameters. The effective size Se is the size of

    the project that can be measured in source lines of code (SLOC), function points, or other

    units.

    In SEER-SEM, the software maintenance cost covers necessary maintenance

    activities performed to ensure the operation of the software after its delivery. These

    software maintenance activities involve correcting faults, changing the software to new

    operating environments, fine-tuning and perfecting the software, and minor

    enhancements. They are usually triggered by change requests and software faults that are

    found during the operation of the software. SEER-SEM uses a number of maintenance-

    specific parameters to estimate the maintenance cost for a given period of time, in

    addition to the development cost of the software. The maintenance cost is allocated into

    corrective, adaptive, perfective maintenance, and minor enhancement cost categories.

    Major enhancements and re-engineering are not included in the software

    maintenance cost, but are treated separately as a new software development project.

    SEER-SEM handles major enhancements and new development similarly except that

    their differences are reflected in the way that the effective size of the software is

    determined.

  • 21

    The effective size, Se, of the software is calculated as

    Se = New + [P x (0.4 A + 0.25 B + 0.35 C)] (Eq. 2-9)

    Where,

    New and P are the new size and the pre-existing software size, respectively. The

    size can be measured in either function points or lines of code. Discarded code

    is not included in calculating the effective size of the pre-existing software (P).

    A, B, and C are the respective percentages of code redesign, code

    reimplementation, and code retest required to adapt the pre-existing software.

    The element [P x (0.4 A + 0.25 B + 0.35 C)] in formula (Eq. 2-9) is the

    equivalent size of rework or reuse of the pre-existing software. The parameters A, B, and

    C are subjective, and their values range from 0 to 100%. The SEER-SEM user manual

    provides formulas and guidelines to help determine these parameters [Galorath 2002]. It

    does not give details, however, on how to reevaluate these parameters when the

    completed software is available (it is important to collect the actual size data because the

    actual size data is used to recalibrate the model and to measure the actual productivity,

    defect density, etc). The parameters A, B, and C maximum limit of 100% might

    potentially result in underestimation of the rework because it does not account for

    possible code expansion, full retest and integration of the pre-existing code.

  • 22

    2.2.3 PRICE-S

    PRICE-S was originally developed by Frank Freiman of RCA for estimating

    acquisition and development of hardware systems in the late 1960s and then released as

    a first commercial software cost estimation model in 1977. After multiple upgrades,

    extensions, and changes of ownership, the current model, True S, is now implemented as

    a component in the TruePlanning tool marketed by PRICE Systems4. As True S is built

    on the same core methodology as its predecessor, this review uses the name PRICE-S to

    maintain its originality.

    PRICE-S is an activity based estimation model that estimates the effort required

    for each activity. In PRICE-S, an activity represents the work that people, equipment,

    technologies, or facilities perform to produce a product or deliver a service [PRICE-S

    2009]. As described in [IPSA 2008], the effort E of each activity is modeled in the form

    of

    E = S x PB x PA (Eq. 2-10)

    Where,

    S is the software size that can be measured in source lines of code, function

    points, predictive object points (POPs) or use case conversion points (UCCPs).

    POPs is an object oriented (OO) metric introduced by PRICE Systems to

    measure the size of OO projects [Minkiewicz 1998], and UCCPs is a metric

    used to quantify the size of use cases.

    4 www.pricesystems.com

  • 23

    BP is the ideal baseline productivity of an industry or an application domain.

    PA is the productivity adjustment factor that accounts for the overall effects of

    cost drivers on the productivity of the project.

    PRICE-S views the software maintenance phase as an activity that focuses on

    fixing latent defects, deployment, and changing the software release to improve

    performance, efficiency or portability. Thus, the maintenance cost is determined by these

    activities other than as a function of how much the software cost is modified. PRICE-S

    assumes that the maintenance cost only includes changes required to improve

    performance, efficiency, and portability or to correct defects. Other changes not included

    in the maintenance cost (i.e., functional enhancements) are estimated the same way as the

    development project. The model uses a number of maintenance-specific parameters to

    calculate the cost of software maintenance.

    The costs associated with functional enhancements and adaptations for pre-

    existing software involve specifying the amount of new, adapted, reused, changed, and

    deleted code. Other size measures are also used such as the Percentage of Design

    Adapted, Percentage of Code Adapted, and Percentage of Test Adapted, which are

    similar to those of SEER-SEM. But unlike SEER-SEM, PRICE-S does not use the

    effective size aggregated from different types of code in its model calculations. Rather, it

    uses the different size components separately in various calculations to determine the

    effort.

  • 24

    2.2.4 KNOWLEDGEPLAN (CHECKPOINT)

    KnowledgePlan is a commercial software estimation tool developed and first

    released by Software Productivity Research (SPR5) in 1997. KnowledgePlan is an

    extension of several previous tools, Checkpoint and SPQR/20, which were originally

    based on Capers Jones works [Jones 1997]. According to the KnowledgePlan users

    guide, the model relies on knowledge bases of thousands of completed projects to provide

    estimation and scheduling capabilities [KnowledgePlan 2005].

    KnowledgePlan estimates effort at project, phase, and task levels of granularity.

    In addition to the effort, the model main outputs include resources, duration, defects,

    schedules and dependencies. Estimating and scheduling project tasks are enabled through

    the knowledge bases containing hundreds of standard task categories to represent typical

    activities in software projects. These standard task categories cover planning,

    management, analysis, design, implementation, testing, documentation, installation,

    training, and maintenance. Each of these task categories is associated with predefined

    inclusion rules and algorithms that are used to suggest relevant tasks and determine the

    size of deliverables and productivity for the given size of the project being estimated.

    The KnowledgePlan tool supports different project types such as new

    development, enhancement, reengineering, reverse engineering, and maintenance and

    support of a legacy system. KnowledgePlan uses eight sizing categories called code

    types, including New, Reused, Leveraged, Prototype, Base, Changed, Deleted, and

    System Base. Differences in project types may be reflected in the distribution of size 5 www.spr.com

  • 25

    estimates of these categories. For example, a new development project typically has a

    high proportion of New code while enhancement may have a high value of Base code.

    2.2.5 COCOMO

    The Constructive Cost Model (COCOMO), a well-known cost and schedule

    estimation model, was originally published in the text Software Engineering Economics

    [Boehm 1981]. This original model is often referred to as COCOMO 81. The model was

    defined based on the analysis of 63 completed projects from different domains during the

    1970s and the early 1980s. To address the issues emerging from advancements and

    changes in technologies and development processes, the USC Center for Systems and

    Software Engineering has developed and published COCOMO II. The model was

    initially released in [Boehm 1995] and then published in the definitive book [Boehm

    2000b]. Among the main upgrades are the introduction of new functional forms that use

    scale factors, new cost drivers, and a new set of parameters values.

    COCOMO II comprises three sub-models, Applications Composition, Early

    Design, and Post-Architecture. The Applications Composition model is used to compute

    the effort and schedule to develop the system that is integrated from reusable components

    and other reusable assets using integrated development tools for design, construction,

    integration, and test. The Applications Composition model has a different estimation

    form from the other models. It uses a size input measured in terms of Application Points

    or Object Points [Kauffman and Kumar 1993, Banker 1994] and a productivity rate to

    calculate effort. The Early Design model is used in the early stages of the project when

  • 26

    the project information is not detailed enough for a fine-grained estimate. When the

    detailed information is available (i.e., the high level design is complete, development

    environment is determined), the Post-Architecture model is used alternatively. The Early

    Design and Post-Architecture models use source lines of code as the basic size unit and

    follow the same arithmetic form.

    The general form of the effort formulas of the COCOMO 81, Early Design, and

    Post-Architecture models can be written as

    * *p

    Bi

    iPM A Size EM= (Eq. 2-11)

    Where,

    PM is effort estimate in person months.

    A is a multiplicative constant, which can be calibrated using historical data.

    Size is an estimated size of the software, measured in KSLOC.

    B is an exponential constant (COCOMO I) or scale factors (COCOMO II).

    EM specifies effort multipliers, which represent the multiplicative component of

    the equation.

    In COCOMO 81, the B term is an exponential constant, which is usually greater

    than 1.0, indicating diseconomies of scale. In COCOMO II, B is defined as a function of

    scale factors, in form of =

    +=5

    110

    iiSFB where 0 and 1 are constants, and SFi is one

    of the five scale factors. The COCOMO 81 model identifies 15 effort multipliers.

  • 27

    COCOMO II uses 7 in the Early Design model and 17 in the Post-Architecture model.

    The effort multipliers are the cost drivers that have multiplicative impact on the effort of

    the project while the scale factors have exponential impact. The Early Design and Post-

    Architecture models have the same set of scale factors while the cost drivers in the Early

    Design model were derived from those of the Post-Architecture model by combining

    drivers that were found to be highly correlated. The COCOMO II Post-Architecture

    models rating values of the model cost drivers were calibrated using the Bayesian

    technique on a database of 161 project data points [Chulani 1999a].

    Table 2-1. COCOMO Sizing Models

    Development Type Source Code Types Sizing Model

    New

    New/Added New SLOC New System

    Pre-existing Adapted, Reused Reuse

    Major enhancements Pre-existing Adapted, Reused Reuse

    Maintenance (repairs

    and minor updates)

    Pre-existing Added, Modified Maintenance

    COCOMO II provides different models to determine the source code size of the

    project, dependent on the origin of the code (new, pre-existing) and types of change

    (addition, modification, reuse, automatic translation). Table 2-1 shows the sizing models

    for various development types. In COCOMO, software maintenance involves repairs and

    minor updates that do not change its primary functions [Boehm 1981]. Major

    enhancements, which include changes that are not considered as maintenance, are

  • 28

    estimated similarly to software new development except that the size is determined

    differently. The model uses the same ratings of the cost drivers for both maintenance and

    new software development with a few exceptions. The Required Development Schedule

    (SCED) and Required Reusability (RUSE) cost drivers are not used in the estimation of

    effort for maintenance, and the Required Software Reliability (RELY) cost driver has a

    different impact scale (see Section 0).

    2.2.5.1 MODEL CALIBRATIONS AND EXTENSIONS

    As COCOMO is a non-proprietary model, its details are available in the public

    domain, encouraging researchers and practitioners in the software engineering

    community to independently evaluate the model. There have been many extensions

    independently reported, e.g., [Kemerer 1987, Jeffery and Low 1990, Gulezian 1991,

    Menzies 2006]. Menzies et al. use machine learning techniques to generate effort models

    from the original COCOMO model [Menzies 2006]. Gulezian proposed a calibration

    method by transforming the model equation into a linear form and estimating the model

    parameters using standard linear regression techniques [Gulezian 1991]. This calibration

    method has been adopted by the COCOMO development team in their calibration work,

    e.g., [Clark 1998, Chulani 1999b, Yang and Clark 2003, Chen 2006, Nguyen 2008].

    COCOMO has also been a model used to validate new estimation approaches such as

    fuzzy logic and neural networks (e.g., [Idri 2000, Huang 2005, Reddy and Raju 2009]).

    The COCOMO development team continues to calibrate and extend the model

    using different calibration approaches on more augmented data sets [Boehm and Royce

  • 29

    1989, Clark 1998, Chulani 1999b, Yang and Clark 2003, Chen 2006, Nguyen 2008].

    Table 2-2 shows the best results obtained by main studies performed by the team.

    PRED(0.30) is the percentage of estimates that fall within 30% of the actuals. For

    example, PRED(0.30) = 52% indicates that the model in Clark [1998] produces estimates

    that are within 30% of the actuals, 52% of the time. Two types of results are often

    reported, fitting and cross-validation. The fitting approach uses the same data set for both

    training and testing the model while the cross-validation approach uses different data

    sets: one for training and the other for testing. The accuracies reported from cross-

    validation better indicate the performance of the model because the main reason to build

    a model is to use it in estimating future projects.

    Table 2-2. COCOMO II Calibrations

    Study Fitting PRED(0.30)

    Cross-validation

    PRED(0.30)

    COCOMOII.1997 [Clark 1998] 52% -

    COCOMOII.2000 [Chulani 1999b] 75% 69%

    COCOMOII.2003 [Yang and Chark 2003] 56% -

    Chen [2006] - 76%

    Nguyen [2008] 80% 75%

    In COCOMOII.1997, the parameters are weighted averages of data-driven

    regression results and expert-judgment rating scales, in which the latter scales account for

    90% of the weight. The COCOMOII.2000 calibration was based on the Bayesian analysis

    using a data set of 161 data points, and the COCOMOII.2003 calibration used the same

    Bayesian analysis, but it included 43 additional new data points. On the same data set as

  • 30

    COCOMOII.2000, Chen et al. [2006] used a subset selection technique to prune model

    parameters, and Nguyen et al. [2008] applied constraints on regression models; they both

    attempted to reduce variances in the model. As shown in Table 2-2, their results indicate

    noticeable improvements in estimation accuracies, suggesting that using appropriate

    machine learning techniques can potentially improve the model performance.

    2.3 MAINTENANCE COST ESTIMATION MODELS

    Although the area of software maintenance estimation has received less attention

    as compared to that of new development, given the importance of software maintenance,

    a number of models have been introduced and applied to estimating the maintenance

    costs (see Table 2-3). These models address diverse sets of software maintenance work,

    covering, for instance, error corrections, functional enhancements, technical renovations,

    and reengineering. They can be roughly classified into three types based on the

    granularity level of the estimation focus: phase-, release-, and task-level maintenance

    estimation models.

    2.3.1 PHASE-LEVEL MODELS

    A set of the maintenance models focuses on the effort of routine maintenance

    work for a certain period or the whole phase of the software maintenance. The routine

    maintenance work refers to all activities performed during the operation of a software

    system after it is delivered. It involves fault corrections, minor functional changes and

    enhancements, and technical improvements of which the main purpose is to ensure the

    regular operation of the system. The maintenance models integrated in COCOMO,

  • 31

    SEER-SEM, PRICE-S, SLIM, and KnowledgePlan are of this kind. In these models,

    maintenance costs are usually a part of the estimates produced when estimating the cost

    of a new system to be developed. Thus, the size of the system is a key input to estimate

    the maintenance effort. Most of these models use additional cost drivers that are specific

    to software maintenance. For example, COCOMO uses two drivers, namely software

    understanding (SU) and the level of unfamiliarity of the programmer (UNFM) for its

    sizing of the maintenance work; SEER-SEM uses such parameters as maintenance size

    growth over time and maintenance rigor (the thoroughness of maintenance activities to

    be performed) in its maintenance cost calculations [Galorath 2002].

    Estimating the maintenance cost for a system of which the development cost is

    being estimated is important for the architectures trade-off analysis and making

    investment decisions on the system being evaluated. However, because many

    assumptions are made about the system that has yet to be developed, it is difficult to

    estimate the system maintenance cost accurately. This difficulty could be a reason that, to

    the best of my knowledge, there is no empirical study published to evaluate and compare

    the estimation accuracies of these models. Another possible reason is that these models,

    with the exception of COCOMO, are proprietary and their details have not been fully

    published, making them difficult to be investigated in the research context.

    To estimate the adaptation and reuse work, COCOMO, SEER-SEM, PRICE-S,

    SLIM, and KnowledgePlan provide methods to size the work and compute the effort and

    schedule estimates using the same models developed for estimating new software

    development. Obviously, these models assume that the adaptation and reuse work has the

  • 32

    same characteristics with new software development. Unfortunately, this assumption has

    never been validated empirically.

    2.3.2 RELEASE-LEVEL MODELS

    Instead of estimating the cost of the maintenance phase as a whole, another group

    of models focuses on the maintenance cost at a finer-grained level, estimating the effort

    of a planned set of maintenance tasks or a planned release. This approach usually

    involves using data from the past releases and analyzing the changes to estimate the cost

    for the next release.

    Basili et al., together with characterizing the effort distribution of maintenance

    releases, described a simple regression model to estimate the effort for maintenance

    releases of different types such as error correction and enhancement [Basili 1996]. The

    model uses a single variable, SLOC, which was measured as the sum of added, modified

    and deleted SLOC including comments and blanks. The prediction accuracy was not

    reported although the coefficient of determination was relatively high (R2 = 0.75),

    indicating that SLOC is a good predictor of the maintenance effort.

    Considering the maintenance work after the initial delivery of the system as being

    organized into sequences of operational releases, Ramil and Lehman introduced and

    evaluated linear regression models to estimate the effort required to evolve the system

    from a release to the next [Ramil 2000, 2003]. Their models take into account all

    maintenance tasks necessary to grow the system, which can include error corrections,

    functional enhancements, technical improvements, etc. The model predictors are size

  • 33

    metrics measured at coarse granularity levels, modules (number of added, changed

    modules) and subsystems (number of added, changed), and number of changes to

    modules plus all changed modules. The models were calibrated and validated on the data

    sets collected from two case studies. In terms of MMRE and MdMRE, the best model

    achieved MMRE = 19.3%, MdMRE=14.0% and PRED(0.25)=44%. This best model

    seems to be based on coarse-grained metrics (subsystems), which is consistent with a

    prior finding by Lindvall [1998], in which coarse-grained metrics, e.g., the number of

    classes, were shown to estimate change effort more accurately than other finer-grained

    metrics. However, it is important to note that this best model did not generate the best

    (highest) PRED(0.25), indicating that the model evaluation and ensuing inferences are

    likely contingent upon the estimation accuracy indicators used [Myrtweit 2005].

    Caivano et al. described a method and a supporting tool for dynamic calibration

    of the effort estimation model of renewal (reverse engineering and restoration) projects

    [Caivano 2001]. The model accepts the change information gathered during the project

    execution and calibrates itself to better reflect dynamics, current and future trends of the

    project. At the beginning of the project, the model starts with its most common form

    calibrated from completed projects. During the project execution, the model may change

    its predictors and constants using the stepwise regression technique. The method uses

    fine-grained metrics obtained from the source code such as number of lines of source

    code, McCabes cyclomatic complexity, Halsteads complexity, number of modules

    obtained after the renewal process. They validated the method using both data from a

    legacy system and simulation. They found that fine-grained estimation and model

  • 34

    dynamic recalibration are effective for improving the model accuracy and confirmed that

    the estimation model is process-dependent. A later empirical study further verifies these

    conclusions [Baldassarre 2003]. Other studies have also reported some success in

    improving the prediction accuracies by recalibrating estimation models in the iterative

    development environment [Trendowicz 2006, Abrahamsson 2007].

    Sneed proposed a model called ManCost for estimating software maintenance

    cost by applying different sub-models for different types of maintenance tasks [Sneed

    2004]. Sneed grouped maintenance tasks into four different types, hence, four sub-

    models:

    - Error correction: costs of error correction for a release.

    - Routine change: maintenance costs implementing routine change requests.

    - Functional enhancement: costs of adding, modifying, deleting, improving

    software functionality. The model treats this type the same as new development in

    which the size can be measured in SLOC or Function Point, and the effort is

    estimated using adjusted size, complexity, quality, and other influence factors.

    - Technical renovation: costs of technical improvements, such as performance and

    optimization.

    Sneed suggests that these task types have different characteristics; thus, each

    requires an appropriate estimation sub-model. Nonetheless, the adjusted size and

    productivity index are common measures used in these sub-models. The adjusted size

    was determined by taking into account the effects of complexity and quality factors.

  • 35

    Although examples were given to explain the use of the sub-models to estimate the

    maintenance effort, there was no validation reported to evaluate the estimation

    performance of these sub-models. Sneed also proposed several extensions to account for

    reengineering tasks [Sneed 2005] and maintenance of web applications [Sneed and

    Huang 2007].

    The wide acceptance of the FSM methods has attracted a number of studies to

    develop and improve maintenance estimation models using the FSM metrics. Most of

    these studies focused on the cost of adaptive maintenance tasks that enhance the system

    by adding, modifying, and deleting existing functions. Having found that the FPA did not

    reflect the size of small changes well, Abran and Maya presented an extension to the FPA

    method by dividing the function complexity levels into finer intervals [Abran and Maya

    1995]. This extension uses smaller size increments and respective weights to discriminate

    small changes that were found to be common in the maintenance environment. They

    validated the model on the data obtained from a financial institution and demonstrated

    that a finer grained sizing technique better characterizes the size characteristics of small

    maintenance activities.

    Niessink and van Vliet described a Maintenance Function Point (MFP) model to

    predict the effort required to implement non-corrective change requests [Niessink and

    van Vliet 1998]. The model uses the same FPA procedure for enhancement to determine

    the FP count, but the Unadjusted FP count was adjusted by a multiplicative factor,

    namely Maintenance Impact Ratio (MIR), to account for the relative impact of a change.

    The approaches were validated using the data set collected from a large financial

  • 36

    information system, the best model producing relatively low prediction accuracies

    MMRE = 47% and PRED(0.25) = 28%. The result also shows that the size of the

    component to be changed has a higher impact on the effort than the size of the change.

    This result indicates that the maintainers might have spent time to investigate not only the

    functions affected by the change but also other functions related to the change.

    Abran et al. reported on the application of the COSMIC-FFP functional size

    measurement method to building effort estimation models for adaptive maintenance

    projects [Abran 2002]. They described the use of the functional size measurement

    method in two field studies, one with the models built on 15 projects implementing

    functional enhancements to a software program for linguistic applications, the other with

    19 maintenance projects of a real-time embedded software program. The two field studies

    did not use the same set of metrics but they include three metrics, effort, Cfsu, and the

    level of difficulty of the project. The authors showed that while project effort and

    functional size has a positive correlation, this correlation is strong enough to build good

    effort estimation models that use a single size measure. However, as they demonstrated, a

    more reliable estimation model can be derived by taking into account the contribution of

    other categorical factors, such as project difficulty.

    2.3.3 TASK-LEVEL MODELS

    The task-level model estimates the cost of implementing each maintenance task

    which comes in the form of error reports or change requests. This type of model deals

    with small effort estimates, usually ranging from a few hours to a month.

  • 37

    Sneed introduced a seven-step process and a tool called SoftCalc to estimate the

    size and costs required to implement maintenance tasks [Sneed 1995]. The model uses

    various size measures, including SLOC (physical lines of code and statements), function

    points, object-points, and data points (the last two were originally proposed in the same

    paper). The size of the impact domain, the proportion of the affected software, was

    determined and then adjusted by complexity, quality, and project influence factors. The

    maintenance effort was computed using the adjusted size and a productivity index.

    Rather than generating an exact effort estimate, it would be beneficial to classify

    maintenance change requests in terms of levels of difficulty or levels of effort required,

    and use this classification information to plan resources respectively. Briand and Basili

    [1992] proposed a modeling approach to building classification models for the

    maintenance effort of change requests. The modeling procedure involves four high-level

    steps, identifying predictable metrics, identifying significant predictable metrics,

    generating a classification function, and validating the model. The range of each

    predictable variable and effort was divided into intervals, each being represented by a

    number called difficulty index. The effort range has five intervals (below one hour,

    between one hour and one day, between one day and one week, between one week and

    one month, above one month) which were indexed, from 1 to 5 respectively. To evaluate

    the approach, Briand and Basili used a data set of 163 change requests from four different

    projects at the NASA Goddard Space Flight Center. The approach produced the

    classification models achieving from 74% to 93% classification correctness.

  • 38

    Briand and Basilis modeling approach can be implemented to dynamically

    construct models according to specific environments. Organizations can build a general

    model that initially uses a set of predefined metrics, such as types of modification; the

    number of components added, modified, deleted; the number of lines of code added,

    modified, deleted. This general model is applied at the start of the maintenance phase, but

    it will then be then refined when data is sufficient. However, as Basili et al. pointed out,

    it is difficult to determine the models inputs correctly as they are not available until the

    change is implemented [Basili 1997].

    Basili et al. presented a classification model that classifies the cost of rework in a

    library of reusable software components, i.e. Ada files [Basili 1997]. The model, which

    was constructed using the C4.5 mining algorithm [Quinlan 1993], determines which

    component versions were associated with errors that require a high correction cost (more

    than 5 hours) or a low correction cost (no more than 5 hours). Three internal product

    metrics, the number of function calls, the number of declaration statements, the number

    of exceptions, were shown to be relevant predictors of the model. As these metrics can be

    collected from the component version to be corrected, the model can be a useful

    estimation tool.

    Jorgensen evaluated eleven different models to estimate the effort of individual

    maintenance tasks using regression, neural networks, and pattern recognition approaches

    [Jorgensen 1995]. In the last approach, the Optimized Set Reduction (OSR) was used to

    select the most relevant subset of variables for the predictors of effort [Briand 1992b]. All

    of the models use the maintenance task size, which is measured as the sum of added,

  • 39

    updated, and deleted SLOC, as a main variable. Four other predictors were selected as

    they were significantly correlated with the maintenance productivity. These are all

    indicator predictors: Cause (whether or not the task is corrective maintenance), Change

    (whether or not more than 50% of effort is expected to be spent on modifying of the

    existing code compared to inserting and deleting the code), Mode (whether or not more

    than 50% of effort is expected to be spent on development of new modules), Confidence

    (whether or not the maintainer has high confidence on resolving the maintenance task).

    Other variables, type of language, maintainer experience, task priority,

    application age, and application size, were shown to have no significant correlation with

    the maintenance productivity. As Jorgensen indicated, this result did not demonstrate that

    each of these variables has no influence on maintenance effort. There were possible joint

    effects of both investigated and non-investigated variables. For example, experienced

    maintainers wrote more compact code while being assigned more difficult tasks than

    inexperienced maintainers, and maintainers might be more experienced in large

    applications than in the small ones. To validate the models, Jorgensen used a data set of

    109 randomly selected maintenance tasks collected from different applications in the

    same organization. Of the eleven models built and compared, the best model seems to be

    the types of log linear regression and hybrid type based on pattern recognition and

    regression. The best model could generate effort estimates with MRE = 100% and

    PRED(.25) = 26% using a cross-validation approach [Bradley and Gong 1983]. This

    performance is unsatisfactorily low. Considering low prediction accuracies achieved,

    Jorgensen recommended that a formal model be used as supplementary to expert

  • 40

    predictions and suggested that the Bayesian analysis be an appropriate approach to

    combining the estimates of the investigated models and expert predictions.

    Fioravanti et al. proposed and evaluated a model and metrics for estimating the

    adaptive maintenance effort of object-oriented systems [Fioravanti 1999, Fioravanti and

    Nesi 2001]. Using the linear regression analysis, they derived the model and metrics

    based on classical metrics previously proposed for effort estimation models. They

    evaluated the model and metrics using the data collected from a real project, showing that

    the complexity and the number of interfaces have the highest impact on the adaptive

    maintenance effort.

    Several previous studies have proposed and evaluated models for exclusively

    estimating the effort required to implement corrective maintenance tasks. Lucia et al.

    used the multiple linear regression to build effort estimation models for corrective

    maintenance [De Lucia 2005]. Three models were built using coarse-grained metrics,

    namely the number of tasks requiring source code modification, the number of tasks

    requiring fixing of data misalignment, the number of other tasks, the total number of

    tasks, and SLOC of the system to be maintained. They evaluated the models on 144

    observations, each corresponding to a one-month period, collected from five corrective

    maintenance projects in the same software services company. The best model, which

    includes all metrics, achieved MMRE = 32.25% and PRED(0.25) = 49.31% using leave-

    more-out cross-validation. In comparison with the non-linear model previously used by

    the company, the authors showed that the linear model with the same variables produces

    higher estimation accuracies. They also showed that taking into account the difference in

  • 41

    the types of corrective maintenance tasks can improve the performance of the estimation

    model.

  • Table 2-3. Maintenance Cost Estimatio

    Model/Study Maintenance Task Type Effort Estimated For Modeling Approach Ke

    COCOMO Regular maintenance Major enhancement (Adaptation and reuse)

    Maintenance period Adaptation and reuse project

    Linear and Nonlinear Regression Bayesian Analysis

    SLSU

    KnowledgePlan Regular maintenance Major enhancement

    Maintenance period Adaptation and reuse project

    Arithmetic SLreucha

    PRICE-S Regular maintenance Major enhancement

    Maintenance period Adaptation and reuse project

    Arithmetic SLnew

    SEER-SEM Corrective Adaptive Perfective

    Maintenance period Adaptation and reuse project

    Arithmetic MaYe(m

    SLIM Corrective Adaptive Perfective

    Maintenance period

    Heuristic SL

    Basili 1996 Error correction and enhancement

    Release Simple linear regression

    SLdel

    Basili 1997 Corrective Component version Classification (C4.5 algorithm)

    Nusta

    Niessink and van Vliet 1998

    Adaptive (functional enhancement)

    Individual change request

    Linear regression IFPRa

    Abran 1995 Adaptive (functional enhancement)

    A set of activities resulting a maintenance work product (Release)

    Arithmetic Ex

    Abran 2002 Adaptive (functional enhancement)

    Release (project) Linear and nonlinear regression

    CO

    6 The selection of the best model is dependent on the performance indicators used [Myrtweiindicators MMRE and PRED(0.25) are reported, and MMRE is used to indicate the best mo

    42 4