A defect prediction model based on the relationships between developers and changed files

University of Salerno

26th September 2014

A DEFECT PREDICTION MODEL BASED ON THE RELATIONSHIPS BETWEEN DEVELOPERS AND

CHANGED FILES

Candidate: Dario Di Nucci

Advisor: Andrea De Lucia

Outline1. Motivation

2. Related work

3. A new metric for defect prediction based on the

relationships between developers and files

4. Case study: empirical evaluation of the prediction

model

5. Case study: combination of prediction models

6. Conclusion and future works

Software Evolution and Defect Prediction

50% of development

costs

B. Beizer. - Software Testing Techniques (2nd ed.).

Van Nostrand Reinhold Co., New York, NY, USA, 1990

G. Myers - The Art of Software Testing

Wiley. ISBN 978-0-471-46912-4 (2004)

Knowing which software components contain defects could be crucial

Defect prediction process

V. R. Basili, L. C. Briand, and W. L. Melo

A validation of object-oriented design metrics as

quality indicators

IEEE Transactions on Software Engineering, 22 (10):

751 – 761, 1996.

There are a lot of prediction metrics

Based on the state of the

project

Based on the history of the

project

A. Bernstein, J. Ekanayake, and M. Pinzger

“Improving defect prediction using temporal features

and non linear models,”

Proceedings of IWPSE 2007, 2007, pp. 11–18.

There are a lot of prediction metrics

V. R. Basili, L. C. Briand, and W. L. Melo

A validation of object-oriented design metrics as

quality indicators

IEEE Transactions on Software Engineering, 22 (10):

751 – 761, 1996.

No one considers developers!

A. Bernstein, J. Ekanayake, and M. Pinzger

“Improving defect prediction using temporal features

and non linear models,”

Proceedings of IWPSE 2007, 2007, pp. 11–18.

Also the best developer could not make a good job when working on different tasks at the same time

Introducing Developer Based Changes ModelA new model for defect prediction based on the

relationships between developers and files

Developer tree construction

Two metrics for analyzing the developer confusion

Developer Semantical Confusion

Developer Structural Confusion

A new metric for file complexity based on developers confusion

Semantical File Complexity

Structural File Complexity

DBCM Process

Classifier training & test set

Lehman, M. M., “On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle” - Journal of Systems and Software (1980)

The system evolution processes are self-regulating with the distribution of product and process measures close to normal


How long should be a period?


How long should be a period?

3 Months

Ahmed E. Hassan

Predicting faults using the complexity of code changes.

ICSE 2009: 78 - 88


Which classifier to use?


Decision Table Majority

Which classifier to use?

Empirical Evaluation

Case study design

Apache AntApache Cassandra

Apache JMeterApache LenyaApache Log4jApache Poi

Apache Tomcat 7Apache Xerces-J

Even if the systems are of the same ecosystem, they have different size and different history

Case study design

RQ1: What is the accuracy of the prediction made by DBCM?

RQ2: How does DBCM compared with techniques based on number of changes?

Case study design

Case study design

We selected as competitive approach the BCCM proposed by Hassan

Ahmed E. Hassan

Predicting faults using the complexity of code changes.

ICSE 2009: 78 - 88

Higher the number of changes applied to a component, higher is the probability that the component is buggy

Results

DBCM and BCCM seems to capture the

same phenomenal

Principal Component Analysis

DBCM & BCCM: A Combined Approach

RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?



“Big Bang”

Combination



“Big Bang”

Combination

Selection Algorithm


Step 1: Identification of the characteristics of the periodsinfluencing the accuracy of the models

numOfChanges < 0.05 : DBCM

numOfChanges >= 0.05

| numOfCommittors >= 3

| averageCommitSize < 0.35 : BCCM

| averageCommitSize >= 0.35 : DBCM

numOfCommittors < 3 : BCCM


Step 2: Applying the models using the selection algorithm


Step 2: Applying the models using the selection algorithm

+5%

Summarizing

Summarizing

?

A defect prediction model based on the relationships between developers and changed files

Engineering

Transcript of A defect prediction model based on the relationships between developers and changed files