Benevol 2011

Similar Tasks, Different Effort:

Why the Same Amount of

Functionality Requires

Different Development Effort?

Alexander Serebrenik

Bogdan Vasilescu

Mark van den Brand

Why do some systems require more effort?

• Empirical study

• ISBSG version 11

• largest publically available collection: 5052 projects

• 118 project attributes, including

− amount of functionality

− work effort

• Not all projects are suited for the study

• self-reporting different data quality

• different ways of measuring project attributes

/ W&I / MDSE PAGE 123-4-2012

Project selection

ISBSG v.11 5052

Effort Staff hours (recorded) 3537

Full development lifecycle 2261

Project-specific activities only 2079

Functionality IFPUG 1661

Data quality “A” or “B” 1609

/ W&I / MDSE PAGE 223-4-2012

Effort and Functionality Distributions

• Effort:

• skewed, outliers

• Adjusted FP or unadjusted FP

• Adjusted is more reliable

[Kitchenham et al. JSS, 2002]

• skewed, outliers

/W&I / MDSE PAGE 323-4-2012

More functionality more effort required

• Log-transformation

for the skewness /

outliers problem

• Adequate

• p-value for the F-

stat ≤ 2.2*10-16,

• p-values intercept

and coefficient ≤

2.2*10-16,

• residuals show a

chaotic pattern

/ W&I / MDSE PAGE 423-4-2012

log(SWE) =

2.92717 +

0.84617 * log(AFP)

Why do some systems require more effort?

• Closer look at the residuals

• technical aspects:

− primary programming language, language type,

development type, platform, and architecture

• organization type

• intended market

• year of project

• Problem of ISBSG

• missing values due to self-reporting

/ W&I / MDSE PAGE 523-4-2012

What attributes impact the development effort?

• Goal: compare different project attributes

• ISBSG – 118 attributes

• Remove projects with missing values

• More attributes less projects

• Keep projects with missing values

• NA-category becomes too important

• We choose

• primary programming language, language type, organization

type, intended market, year of project, development type,

platform, architecture

/ W&I / MDSE PAGE 623-4-2012

• Partition individuals in groups

• Partition = explanation [Cowell, Jenkins 1995]

• Inequality within the groups and between the groups

− Inequality indices

• Better explanation: more inequality between the groups

− Lila is better than red

− Partition refinement doesn’t deteriorate the explanation

/ SET / W&I / TU/e PAGE 7

Explanation of impact

Which inequality index?

• We need a decomposable index applicable to

negative values

/ W&I / MDSE PAGE 823-4-2012

Results

/ W&I / MDSE PAGE 923-4-2012

Project attribute Explanation %

No missing values

N = 151

Missing values

N = 1609

Primary

programming

language

25,37% 16,11%

Organisation type 17,59% 18,36%

Year of the project 10,88% 5,41%

Architecture 8,68% 3,35%

Development

Platform

5,43% 5,05%

Intended Market 4,61% 1,57%

Language type 2,45% 1,28%

Development Type 0,05% 0,07%

Indonesia:

expenditure by

province 18.9%

Indonesia:

expenditure by

educ.level 32.6%

Indonesia:

expenditure by

gender 2.6%

Linux: LOC by

package 17.4%

Linux: LOC by

impl lang 5.32%

Linux: LOC by

maintainer 4.45%

Conclusions

• Three groups of attributes

• High-impact: primary programming language, organization type

• Middle-impact

− year of the project [cf. Kitchenham et al. 2002]

− architecture, development platform

• Low impact: intended market, language type, devel’t type

• A new technique for analysis of effort fp

/ W&I / MDSE PAGE 1023-4-2012

Future work

• Partition should be MECE

• “Wholesale & Retail Trade” and “Financial, Property &

Business Services”

• New aggregation/explanation techniques

• Conjecture: relative importance of attributes will be

the same for other datasets

• Models based on data from multiple companies are not

applicable when one company data is considered [Ruhe

1999]

• Both multi-company and company-specific studies are

needed

/ W&I / MDSE PAGE 1123-4-2012

Benevol 2011

Education

Transcript of Benevol 2011