Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

12
DREXEL UNIVERSITY Software Size and Effort Estimation An exploration of algorithmic and non- algorithmic techniques Brian Driscoll 11/8/2010

description

Abstract: There is little debate over the observation that accurate planning and estimation at an early stage of the Software Development Life Cycle (SDLC) is critical to the success of any software project. However, there is also little debate that accurate prediction of the size of a software product at an early project stage is difficult to achieve. Since software product size estimation is a primary driver in the allocation of resources to a project, there is clearly significant value in conducting the software size estimation process in such a way that it yields the most accurate results possible. To date there have been many different methods proposed to conduct software size estimation, as well as controlled and observational studies performed to assess the effectiveness of the same. In this paper I shall provide a constrained review of the literature on software size estimation, including two relatively traditional methods and two novel methods for predicting software product size at early stages in the SDLC. Specifically, I shall describe the method, scope, advantages, and disadvantages of using Function Point Analysis, Estimation by Analogy, and Use Case Analysis for software size and effort estimation.

Transcript of Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Page 1: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

DREXEL UNIVERSITY

Software Size and Effort Estimation

An exploration of algorithmic and non-algorithmic techniques

Brian Driscoll

11/8/2010

Page 2: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

2 Introduction | Brian Driscoll

Introduction

There is little debate over the observation that accurate planning and estimation at an early

stage of the Software Development Life Cycle (SDLC) is critical to the success of any software project [1],

[2], [5], [6], [19]. However, there is also little debate that accurate prediction of the size of a software

product at an early project stage is difficult to achieve [3], [4]. Since software product size estimation is a

primary driver in the allocation of resources to a project, there is clearly significant value in conducting

the software size estimation process in such a way that it yields the most accurate results possible. A

project that is over-estimated will result in under-utilization of resources, while a project that is under-

estimated will result in over-utilization of resources, which may cause higher defect rates, or else will

result in the utilization of premium resources (consultants) in order to complete the project on or close

to schedule. To date there have been many different methods proposed to conduct software size

estimation, as well as controlled and observational studies performed to assess the effectiveness of the

same. In this paper I shall provide a constrained review of the literature on software size estimation,

including two relatively traditional methods and two novel methods for predicting software product size

at early stages in the SDLC. Specifically, I shall describe the method, scope, advantages, and

disadvantages of using Function Point Analysis, Estimation by Analogy, and Use Case Analysis for

software size and effort estimation.

Function Point Analysis

Function Point Analysis (FPA) is an approach to software size estimation that has evolved to

become one of the most widely used methods for software size estimation since its introduction to the

software development industry in 1979 [7]. Additionally, there have been countless approaches

developed to date that are derived from Albrecht’s original work in developing the function point

analysis method, including MKII Function Points, Feature Points, 3D Function Points, Full Function

Points, IFPUG, and COSMIC-FFP [5], [8]. The principles underlying the FPA approach are based upon the

functionality that the developed system is to provide to the end user. More specifically, the approach

entails counting the number of inputs and outputs to be made available in the application, and

weighting each by the value it has to the customer or end user. The weighted sum of these input and

output counts is referred to as “function points” [17]. Albrecht’s claims regarding Function Point Analysis

are that the number of function points possessed by a proposed system correlates highly to the final

Page 3: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

3 Function Point Analysis | Brian Driscoll

number of Source Lines of Code (SLOC) of the developed system and to the work effort required to

produce the developed system [17].

One advantage of using FPA to estimate system size and the work effort required to produce a

system is that the analysis can be performed relatively early in the SDLC, usually at any point after

requirements gathering is performed [3]. Function Point Analysis can be performed so early in the

Software Life Cycle because only inputs to and outputs from the system need to be known in order to

perform the analysis [16], [17]. According to Fehlmann, the early stage at which FPA can be performed is

attractive to business managers because it reduces the temporal and financial costs sunk into

determining whether or not a particular project is feasible. Additionally, Fehlmann points out that case

studies performed to compare FPA to other functional analysis methods consistently showed that FPA

provides similar estimates to other functional analysis methods that are performed either

simultaneously or later in the SDLC [16]. Thus, it seems advantageous to use Function Point Analysis to

estimate system size because it can be performed at a relatively early stage of the SDLC with results

comparable to those provided by other techniques, thus reducing risk exposure without degrading the

quality of the estimate.

Another advantage of using Function Point Analysis to estimate system size is that it readily

outperforms other estimation methods that attempt to directly estimate a system’s Source Lines of

Code [7]. Both informal [7] and formal [3] validation studies performed to compare the accuracy of

Function Point Analysis to SLOC-based estimation methods showed that FPA more accurately predicts

system size and development effort than SLOC-based estimation methods such as SLIM, COCOMO, and

expert estimation (though it should be noted that expert estimation of SLOC is exceedingly rare).

Additionally, as pointed out in [17] and verified in [3], Function Point Analysis can also be used to

estimate a project’s SLOC. This proves extremely beneficial for project effort estimation, as in [3] it is

shown that Estimated SLOC has a much higher correlation to Actual Man-Months than does any type of

functional size measure, including Function Point Analysis.

A third and final advantage to using Function Point Analysis to estimate system size is that it is

one of the most widely used software size estimation techniques currently in use by practitioners [7],

[18]. Given that the FPA approach is so widely used, it follows that there is a large community of practice

around the approach, as well as a large research community dedicated to studying the approach. While

not trivial to do so, this can be seen readily in a review of the literature on functional sizing methods.

Page 4: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

4 Estimation by Analogy | Brian Driscoll

That there is a research community dedicated to FPA is important because it indicates that the approach

is being continuously examined, revised, and validated even as the types of software being developed

have changed.

While there are clearly advantages to using Function Point Analysis in software estimation, there

are also disadvantages to employing the approach. One of the more notable disadvantages of the

Function Point Analysis approach is that it is a poor estimator of actual financial cost of developing a

software product. A study performed by Heemstra and Kusters found that organizations that used FPA

to estimate project budget had more budget overruns than those that did not use FPA for large projects

(> 200 man-months) [7]. Heemstra and Kusters felt that the negative relationship between use of FPA

and cost overruns could be explained by the fact that successfully implementing FPA is difficult to do,

and often requires more time than can be dedicated to the task. Nonetheless, it seems to follow that

within normal business constraints FPA is not an effective cost estimator, and therefore project

managers must either choose a complementary cost estimation technique or choose a different

estimation technique altogether that provides acceptable estimation for cost as well as other factors.

Another disadvantage of using Function Point Analysis is that it seems also to be a poor

estimator of the actual development effort required for a software product. Results from [3] indicate

that Albrecht’s formula for deriving estimated project effort from function points does not correlate well

to the actual effort expended in developing the software product. Specifically, Kemerer’s experiments in

[3] yielded an R-squared value of only .553 when using function points to predict effort in man-months,

whereas other predictors (most notably KSLOC) yielded much higher R-squared values for their

respective linear regression formulas. It is also indicated in [3] that the correlation between function

points and project effort suffers for projects that are dissimilar to those used to formulate the FPA

methods, e.g.: the projects completed by the DP Services Group at IBM. Thus, Kemerer concludes that it

is likely best not to use FPA to predict project effort directly. It follows from Kemerer’s analysis that

project managers should use alternative approaches to estimate project effort, especially if the project

to be estimated is not a business data processing application such as those produced by IBM’s DP

Services Group at the time that the FPA approach was developed.

Page 5: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

5 Estimation by Analogy | Brian Driscoll

Estimation by Analogy

Estimation by Analogy is an approach to software size estimation that uses data recorded from

past projects to estimate new projects. To be more specific, the method entails characterizing a project

to be estimated, then finding a historical project with the same, or very nearly the same,

characterization to use as the basis for estimation [4], [9]. The actual values from the historical project

are used as the initial estimated values for the new project, however these values may be altered

depending upon specific conditions that are unique to the new project [9], [10], [11], [12]. The general

methods used in estimation by analogy follow from the design and implementation of case-based

reasoning systems, and fall into the realm of machine learning. In machine learning, software tools used

for estimation improve their target estimates as the number of historical project data increases.

One of the distinct advantages of using estimation by analogy is that the process by which

estimates are created closely resembles the processes conducted by human estimators. On the one

hand, this is advantageous because the estimation process, including the inputs and outputs to the

process, are more easily understood to the user than algorithmic models (such as FPA) [9]. On the other

hand, the similarity between estimation by analogy and human estimation is advantageous because the

results of the estimation process seem to be more trustworthy than results produced by other methods

[11]. Again, it seems this apparently increased trust is due to the relative opacity of algorithmic

estimation techniques when compared to estimation by analogy. Thus, it seems it would be

advantageous to use estimation by analogy because it is more easily understood by the user and thus its

resulting estimates are more trusted by the user as well.

Another advantage held by analogy-based estimation is that it can be applied to problem

domains that are difficult to model and in which an algorithmic method might not be suitable [9], [11]. It

tends to be the case that algorithmic methods fall short when there is a significant amount of “noise”, or

irrelevant data, in the models used to generate the algorithm. Unfortunately, it is often difficult to know

exactly what data is and is not to be considered “noisy data” when generating models for estimation,

and therefore it is often difficult to eliminate this data [9], [4]. It is also difficult to generate algorithmic

models when the interactions among factors that drive project effort and/or cost are not readily known

[9]. Fortunately, the methods and tools used in analogy-based estimation are able to account for “noisy

data”, and it is not necessary to know how factors contributing to project cost and effort interact with

one another in order to generate an estimate using these tools and techniques. Thus, it is advantageous

Page 6: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

6 Estimation by Analogy | Brian Driscoll

to use estimation by analogy when the problem domain in which the target project lies is difficult to

model, when there is incomplete information regarding the significance of relationships among the

factors that contribute to project cost and effort, or when it is difficult to determine what project data is

relevant and what project data is not relevant.

Of course, just as there are advantages with estimation by analogy, there are disadvantages as

well. One of the most obvious, and perhaps most crippling, disadvantages of the approach is that there

is relatively little consistent research proving that estimation by analogy is in fact better than algorithmic

approaches. For instance, the work done by Shepperd et al in [4] suggests that estimation by analogy

(using the ANGEL estimation tool) is superior to linear regression and stepwise multiple regression, two

algorithmic estimation techniques. However, Walkerden and Jeffrey concluded in [9] that neither the

results obtained ANGEL, nor those obtained by ACE, another analogy-based estimation tool, were

significantly different from those obtained using linear regression or those obtained from an unaided

human subject. Unfortunately, there does not seem to be a definitive explanation for why experimental

results using estimation by analogy differ, and thus it is difficult to propose estimation by analogy as a

superior alternative to algorithmic methods for this reason.

Another disadvantage of using estimation by analogy is that it is dependent upon the existence

of a database of suitable projects from which to select a project that is analogous to the target project

[9]. There are several reasons why there may not be sufficient data available with which to generate an

estimate for a target project based on a comparable source project. First, if the target project or its

parameters are sufficiently novel or unique, then there may not be relevant data from which to derive

an estimate. Given the rapid pace of change in the software development business currently, it is not

difficult to imagine a situation whereby existing source data is obsolete (and therefore useless) when it

comes time to estimate a new project [9]. Second, it may be the case that there is not a source project

that is similar enough (or close enough, in terms of n-dimensional Euclidean distance) to the target

project to be considered a good analogue [4], [9]. This is potentially the case in emergent software

development areas, such as Software-as-a-Service projects in recent years, where the nearest neighbors

were either simple web applications or enterprise-grade client-server applications. Thus, it may be that

estimation by analogy is unsuitable for projects for which there is not sufficient historical data from

which to generate an estimate.

Page 7: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

7 Use Case Points | Brian Driscoll

Use Case Points

The employment of use cases as inputs to functional size estimation was first proposed by

Karner in 1993 [15]. Therefore, it is a relatively new approach to functional sizing of a software system.

However, this particular approach has received much attention in recent years due to the rise of Unified

Modeling Language and Object-Oriented Programming techniques. The approach estimates project

effort in person-hours using sufficiently detailed use case descriptions. More specifically, the approach

assigns Use Case Points based on use case attributes such as Actors and Use Cases, then weights

(adjusts) those Use Case Point values based upon technical and environmental factors. The Adjusted Use

Case Point values are then used to estimate effort with the equation:

Estimated Effort = (Use Case Points) x (Person Hours/Use Case Point)

While the Use Case Points method bears some similarity to Function Point Analysis, and may

have been influenced by the same, there are some important differences between the two approaches.

The first and most notable difference is that the Use Case Points method strictly depends on the use of

Use Cases to describe the delivered functionality of the completed system, and as such the approach

cannot be employed for projects that do not utilize use cases to describe system functionality. On the

other hand, while the process of counting Function Points is standardized, the process of counting Use

Case Points has not yet been standardized [14]. Therefore, two separate evaluations of a project using

Function Point Analysis should yield the same number of Function Points, whereas two evaluations of a

project using Use Case Points may not.

The primary advantage of using the Use Case Points method to estimate software system size

and required effort is that it is tailored to more modern software design and development techniques,

namely the use of Unified Modeling Language and Use Cases in specifying software systems. Current

system design techniques and software development practices place an emphasis on creating a high-

level system specification up-front and deferring detailed specification until it is absolutely necessary to

do so [8]. This poses a problem for users of Function Point Analysis, as FPA requires a slightly more

detailed specification (down to the I/O level) in order to provide an estimate with relative confidence.

Use Case Points methods can be used as soon as a high-level design has been created, allowing for an

estimate to be obtained very early in the Software Development Life Cycle. Clearly, this limits exposure

to projects that are infeasible by filtering them out before a significant financial investment has been

made.

Page 8: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

8 Use Case Points | Brian Driscoll

Another advantage to using the Use Case Points method to estimate software system size and

required effort is that it provides results that are at least as accurate as those provided by human

estimators with less actual effort expended in producing those results. In independent studies, both

Kusumoto et al and Anda found that the results provided by Use Case Points estimation tools were as

accurate if not more accurate than human counterparts [8], [13]. Observational results provided in [8]

showed that estimates provided by the U-EST tool were 80-120% of the estimates provided by experts.

Meanwhile, the more formal results in [13] showed the mean Magnitude of Relative Error (MMRE) to be

.37 for human estimators, whereas the MMRE for the Use Case Points method implemented by the

author was either .21 or .29 depending upon the assignment of environmental factors. What’s more,

both studies found that results were obtained much more quickly with the use of software tools to aid

Use Case Points Analysis than were obtained from human estimators evaluating the same projects. Once

again, in current software development practices it is optimal to spend the least amount of time

possible on effort that does not contribute directly to the software product, so clearly it is advantageous

to have an estimation method and related tools that provide results comparable to those obtained from

human estimators.

An unfortunate disadvantage of using the Use Case Points method for estimating software size

and required effort is that there is no standard by which to evaluate and apply the Use Case Points

method. While the general process of the Use Case Points method is singular and universal, adopters of

this method may only use their own best judgment to determine how to measure Unadjusted Use Case

Points [14]. A side effect of this fact is that two different implementations of the Use Case Points

method may produce two completely different estimates for the same project. Of course, it is the case

that any one implementation of the method will be deterministic, and is therefore internally valid.

However, the method itself cannot be considered consistent because there is no deterministic method

of counting Use Case Points that applies to all implementations of the method.

Clearly another disadvantage to using Use Case Points is its dependency upon the Use Case as a

software design construct. One of the advantages of FPA is that the inputs to the approach can be

generalized such that function points can be calculated for 30-year-old systems developed in Cobol, as

well as for contemporary systems developed in contemporary languages. The same cannot be said for

the Use Case Points method. Although use cases are widely used today [14], there is no guarantee that

they will be nearly as widely used in the future. Thus, an organization that has made a significant

Page 9: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

9 Conclusion | Brian Driscoll

investment in adopting the Use Case Points method of estimating system size and required effort will

surely be disappointed should the construct of use case points fall out of fashion in favor of some other

design construct.

Conclusion

It is clear that all three of the approaches explored in this paper, Function Point Analysis,

Estimation by Analogy, and Use Case Points, hold promise for providing relatively accurate estimates of

software size and required development effort at a relatively early stage in the Software Development

Life Cycle. Just the same, it is clear that all three approaches have certain limitations that prevent any

one of them from being the single best approach to estimating project size or effort. Yet, despite any

limitations that these three approaches have, none should be removed from an organization’s

consideration when attempting to choose an appropriate estimation method. Rather, an organization’s

software development and business processes must factor into its decision to choose one method or

another.

Any of the methods described thus far would be an appropriate choice for organizations that

require an estimation of project size and effort as early as is reasonable in the Software Development

Life Cycle. It is known that Function Point Analysis can be performed as soon as requirements gathering

has been completed, and can be performed again after requirements have been clarified. It is also

known that the Use Case Points method can be applied as soon as use cases have been written during

the requirements gathering process, and can be performed again after use cases have been clarified.

Finally, it is known that Estimation by Analogy can be performed as soon as any pertinent information

about a proposed project is known, and can be performed again each time more information about a

proposed project is discovered.

Concerning consistency as a factor in an organization’s decision to choose a particular

estimation method, it seems as though Function Point Analysis is likely the desired approach. Since the

results of Estimation by Analogy techniques are heavily dependent upon a database of relevant

historical project data from which to select an analogous project, the results provided by different

analogy-based estimators (or even the same estimator at different times) is not likely to be consistent.

And, since a standard for Use Case Points measurement has not yet been developed, results from

different UCP systems are likely to differ – perhaps significantly – from one another. On the other hand,

Page 10: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

10 References | Brian Driscoll

obtaining Function Counts in the FPA process has been standardized by an international governing body,

so the function counts provided by one entity for one project should be the same as those provided by

another entity pertaining to the same project. Therefore, if consistency of results is a deciding factor for

an organization to choose a particular method, then Function Point Analysis is the most desirable of the

three approaches discussed.

Then again, it may be the case that there is another factor – or number of factors – that are keys

to an organization’s decision to use one estimation approach or another. In this paper I have provided

only brief descriptions of each of three approaches to software size and effort estimation that can be

used early in the Software Development Life Cycle. There are other approaches not discussed here, as

well as other advantages and disadvantages of the included approaches that have not been explored in

detail. Further research and experimentation is required to determine empirically which of the three

approaches discussed here provides the most accurate results given similar project conditions.

References

[1] Dolado, Jose Javier. “A Validation of the Component-based Method for Software Size Estimation.”

IEEE Transactions on Software Engineering, vol. 26, no. 10, pp. 1006-1021, 2000.

[2] Offen, Raymond J, and Ross Jeffrey. “Establishing Software Measurement Programs.” IEEE Software,

March/April issue, pp. 45-53, 1997.

[3] Kemerer, Chris F. “An Empirical Evaluation of Software Cost Estimation Models.” Communications of

the ACM, vol. 30, no. 5, pp. 416-429, 1987.

[4] Shepperd, Martin, Chris Schofield, and Barbara Kitchenham. “Effort Estimation Using Analogy.”

Proceedings of the ICSE-18, pp. 170-178, 1996.

[5] Hastings, T.E., and A.S.M. Sajeev. “A Vector-Based Approach to Software Size Measurement and

Effort Estimation.” IEEE Transactions on Software Engineering, vol. 27, no. 4, pp. 337-350, 2001.

[6] Zivkovic, Ales, Marjan Hericko, and Tomaz Kralj. “Empirical Assessment of Methods for Software Size

Estimation.” Informatica, vol. 27, no. 4, pp. 425-432, 2003.

Page 11: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

11 References | Brian Driscoll

[7] Heemstra, F.J., and R.J. Kusters. “Function point analysis: evaluation of a software cost estimation

model.” Eur J of Inf Systs, vol. 1, no. 4, pp. 229-237, 1991.

[8] Kusumoto, Shinji, et al. “Estimating Effort by Use Case Points: Method, Tool, and Case Study.”

Proceedings of the 10th Annual International Symposium on Software Metrics, 2004.

[9] Walkerden, Fiona, and Ross Jeffrey. “An Empirical Study of Analogy-based Software Effort

Estimation.” Empirical Software Engineering, no. 4, pp. 135-158, 1999.

[10] Idri, Ali, Alain Abran, and Taghi M. Khosgoftaar. “Fuzzy Analogy: A New Approach For Software Cost

Estimation.”International Workshop on Software Measurement, pp. 93-101, 2001.

[11] Li, Jingzhou, et al. “A flexible method for software effort estimation by analogy.” Empirical Software

Engineering, no. 12, pp. 65-106, 2007.

[12] Jeffrey, Ross, Melanie Ruhe, and Isabella Wieczorek. “Using Public Domain Metrics to Estimate

Software Development Effort.” Seventh International Software Metrics Symposium, pp. 16-27, 2001.

[13] Anda, Bente. “Comparing Effort Estimates Based On Use Case Points with Expert Estimates.”

(Unpublished), retrieved from http://de.scientificcommons.org/42390807 on 10/31/10.

[14] Anda, Bente, Hege Dreiem, et al. “Estimating Software Development Effort Based on Use Cases –

Experience from Industry.” Lecture Notes in Computer Science, Iss. 2185, pp. 487-502, 2001.

[15] Mohagheghi, Parastoo, Bente Anda, and Reidar Conradi. “Effort Estimation of Use Cases for

Incremental Large-Scale Software Development.” ICSE ’05, 2005.

[16] Fehlmann, Thomas. “When use COSMIC FFP? When use IFPUG FPA? A Six-Sigma View.”

IWSM/MetriKon, 2006.

[17] Albrecht, Allan J., and John E. Gaffney. “Software Function, Source Lines of Code, and Development

Effort Prediction: A Software Science Validation.” IEEE Transactions on Software Engineering, vol. SE-9,

no. 6, pp. 639-648, 1983.

[18] Pow-Sang, Jose Antonio and Ricardo Imbert. “Including the Composition Relationship among

Classes to Improve Function Points Analysis.” Proceedings VI Jornadas Peruanas de Computación-JPC'07,

2007.

Page 12: Software Size and Effort Estimation: An exploration of algorithmic and non-algorithmic techniques

Software Size and Effort Estimation Brian Driscoll

12 References | Brian Driscoll

[19] Fetcke, Thomas, Alain Abran, and Tho-Hau Nguyen. “Mapping the OO-Jacobson Approach into

Function Point Analysis.” Proceedings of TOOLS-23’97, pp. 1-11, 1998.