Benevol 2011

12

Click here to load reader

description

Alexander used these slides during for his presentation at BeNeVol 2011 in Brussels, Belgium. That is after he blew the fuses in the entire building. Paper: Serebrenik A, Vasilescu B and van den Brand M (2011), "Similar tasks, different effort: Why the same amount of functionality requires different development effort?", In Proceedings of the 10th Belgian-Netherlands Software Evolution Seminar, pp. 4-5.

Transcript of Benevol 2011

Page 1: Benevol 2011

Similar Tasks, Different Effort:

Why the Same Amount of

Functionality Requires

Different Development Effort?

Alexander Serebrenik

Bogdan Vasilescu

Mark van den Brand

Page 2: Benevol 2011

Why do some systems require more effort?

• Empirical study

• ISBSG version 11

• largest publically available collection: 5052 projects

• 118 project attributes, including

− amount of functionality

− work effort

• Not all projects are suited for the study

• self-reporting different data quality

• different ways of measuring project attributes

/ W&I / MDSE PAGE 123-4-2012

Page 3: Benevol 2011

Project selection

ISBSG v.11 5052

Effort Staff hours (recorded) 3537

Full development lifecycle 2261

Project-specific activities only 2079

Functionality IFPUG 1661

Data quality “A” or “B” 1609

/ W&I / MDSE PAGE 223-4-2012

Page 4: Benevol 2011

Effort and Functionality Distributions

• Effort:

• skewed, outliers

• Adjusted FP or unadjusted FP

• Adjusted is more reliable

[Kitchenham et al. JSS, 2002]

• skewed, outliers

/W&I / MDSE PAGE 323-4-2012

Page 5: Benevol 2011

More functionality more effort required

• Log-transformation

for the skewness /

outliers problem

• Adequate

• p-value for the F-

stat ≤ 2.2*10-16,

• p-values intercept

and coefficient ≤

2.2*10-16,

• residuals show a

chaotic pattern

/ W&I / MDSE PAGE 423-4-2012

log(SWE) =

2.92717 +

0.84617 * log(AFP)

Page 6: Benevol 2011

Why do some systems require more effort?

• Closer look at the residuals

• technical aspects:

− primary programming language, language type,

development type, platform, and architecture

• organization type

• intended market

• year of project

• Problem of ISBSG

• missing values due to self-reporting

/ W&I / MDSE PAGE 523-4-2012

Page 7: Benevol 2011

What attributes impact the development effort?

• Goal: compare different project attributes

• ISBSG – 118 attributes

• Remove projects with missing values

• More attributes less projects

• Keep projects with missing values

• NA-category becomes too important

• We choose

• primary programming language, language type, organization

type, intended market, year of project, development type,

platform, architecture

/ W&I / MDSE PAGE 623-4-2012

Page 8: Benevol 2011

• Partition individuals in groups

• Partition = explanation [Cowell, Jenkins 1995]

• Inequality within the groups and between the groups

− Inequality indices

• Better explanation: more inequality between the groups

− Lila is better than red

− Partition refinement doesn’t deteriorate the explanation

/ SET / W&I / TU/e PAGE 7

Explanation of impact

Page 9: Benevol 2011

Which inequality index?

• We need a decomposable index applicable to

negative values

/ W&I / MDSE PAGE 823-4-2012

Page 10: Benevol 2011

Results

/ W&I / MDSE PAGE 923-4-2012

Project attribute Explanation %

No missing values

N = 151

Missing values

N = 1609

Primary

programming

language

25,37% 16,11%

Organisation type 17,59% 18,36%

Year of the project 10,88% 5,41%

Architecture 8,68% 3,35%

Development

Platform

5,43% 5,05%

Intended Market 4,61% 1,57%

Language type 2,45% 1,28%

Development Type 0,05% 0,07%

Indonesia:

expenditure by

province 18.9%

Indonesia:

expenditure by

educ.level 32.6%

Indonesia:

expenditure by

gender 2.6%

Linux: LOC by

package 17.4%

Linux: LOC by

impl lang 5.32%

Linux: LOC by

maintainer 4.45%

Page 11: Benevol 2011

Conclusions

• Three groups of attributes

• High-impact: primary programming language, organization type

• Middle-impact

− year of the project [cf. Kitchenham et al. 2002]

− architecture, development platform

• Low impact: intended market, language type, devel’t type

• A new technique for analysis of effort fp

/ W&I / MDSE PAGE 1023-4-2012

Page 12: Benevol 2011

Future work

• Partition should be MECE

• “Wholesale & Retail Trade” and “Financial, Property &

Business Services”

• New aggregation/explanation techniques

• Conjecture: relative importance of attributes will be

the same for other datasets

• Models based on data from multiple companies are not

applicable when one company data is considered [Ruhe

1999]

• Both multi-company and company-specific studies are

needed

/ W&I / MDSE PAGE 1123-4-2012