A defect prediction model based on the relationships between developers and changed files

49
University of Salerno 26th September 2014 A DEFECT PREDICTION MODEL BASED ON THE RELATIONSHIPS BETWEEN DEVELOPERS AND CHANGED FILES Candidate: Dario Di Nucci Advisor: Andrea De Lucia

description

A defect prediction model based on the relationships between developers and changed files

Transcript of A defect prediction model based on the relationships between developers and changed files

Page 1: A defect prediction model based on the relationships between developers and changed files

University of Salerno

26th September 2014

A DEFECT PREDICTION MODEL BASED ON THE RELATIONSHIPS BETWEEN DEVELOPERS AND

CHANGED FILES

Candidate: Dario Di Nucci

Advisor: Andrea De Lucia

Page 2: A defect prediction model based on the relationships between developers and changed files

Outline1. Motivation

2. Related work

3. A new metric for defect prediction based on the

relationships between developers and files

4. Case study: empirical evaluation of the prediction

model

5. Case study: combination of prediction models

6. Conclusion and future works

Page 3: A defect prediction model based on the relationships between developers and changed files

Software Evolution and Defect Prediction

Page 4: A defect prediction model based on the relationships between developers and changed files
Page 5: A defect prediction model based on the relationships between developers and changed files

50% of development

costs

B. Beizer. - Software Testing Techniques (2nd ed.).

Van Nostrand Reinhold Co., New York, NY, USA, 1990

Page 6: A defect prediction model based on the relationships between developers and changed files

G. Myers - The Art of Software Testing

Wiley. ISBN 978-0-471-46912-4 (2004)

Page 7: A defect prediction model based on the relationships between developers and changed files

Knowing which software components contain defects could be crucial

Page 8: A defect prediction model based on the relationships between developers and changed files

Defect prediction process

Page 9: A defect prediction model based on the relationships between developers and changed files

V. R. Basili, L. C. Briand, and W. L. Melo

A validation of object-oriented design metrics as

quality indicators

IEEE Transactions on Software Engineering, 22 (10):

751 – 761, 1996.

There are a lot of prediction metrics

Based on the state of the

project

Based on the history of the

project

A. Bernstein, J. Ekanayake, and M. Pinzger

“Improving defect prediction using temporal features

and non linear models,”

Proceedings of IWPSE 2007, 2007, pp. 11–18.

Page 10: A defect prediction model based on the relationships between developers and changed files

There are a lot of prediction metrics

V. R. Basili, L. C. Briand, and W. L. Melo

A validation of object-oriented design metrics as

quality indicators

IEEE Transactions on Software Engineering, 22 (10):

751 – 761, 1996.

No one considers developers!

A. Bernstein, J. Ekanayake, and M. Pinzger

“Improving defect prediction using temporal features

and non linear models,”

Proceedings of IWPSE 2007, 2007, pp. 11–18.

Page 11: A defect prediction model based on the relationships between developers and changed files

Also the best developer could not make a good job when working on different tasks at the same time

Page 12: A defect prediction model based on the relationships between developers and changed files

Introducing Developer Based Changes ModelA new model for defect prediction based on the

relationships between developers and files

Page 13: A defect prediction model based on the relationships between developers and changed files

Developer tree construction

Page 14: A defect prediction model based on the relationships between developers and changed files

Two metrics for analyzing the developer confusion

Developer Semantical Confusion

Developer Structural Confusion

Page 15: A defect prediction model based on the relationships between developers and changed files

A new metric for file complexity based on developers confusion

Semantical File Complexity

Structural File Complexity

Page 16: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 17: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 18: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 19: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 20: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 21: A defect prediction model based on the relationships between developers and changed files

DBCM Process

Page 22: A defect prediction model based on the relationships between developers and changed files

Classifier training & test set

Lehman, M. M., “On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle” - Journal of Systems and Software (1980)

The system evolution processes are self-regulating with the distribution of product and process measures close to normal

Page 23: A defect prediction model based on the relationships between developers and changed files

Classifier training & test set

How long should be a period?

Page 24: A defect prediction model based on the relationships between developers and changed files

Classifier training & test set

How long should be a period?

3 Months

Ahmed E. Hassan

Predicting faults using the complexity of code changes.

ICSE 2009: 78 - 88

Page 25: A defect prediction model based on the relationships between developers and changed files

Classifier training & test set

Which classifier to use?

Page 26: A defect prediction model based on the relationships between developers and changed files

Classifier training & test set

Decision Table Majority

Which classifier to use?

Page 27: A defect prediction model based on the relationships between developers and changed files

Empirical Evaluation

Page 28: A defect prediction model based on the relationships between developers and changed files

Case study design

Apache AntApache Cassandra

Apache JMeterApache LenyaApache Log4jApache Poi

Apache Tomcat 7Apache Xerces-J

Even if the systems are of the same ecosystem, they have different size and different history

Page 29: A defect prediction model based on the relationships between developers and changed files

Case study design

RQ1: What is the accuracy of the prediction made by DBCM?

RQ2: How does DBCM compared with techniques based on number of changes?

Page 30: A defect prediction model based on the relationships between developers and changed files

Case study design

Page 31: A defect prediction model based on the relationships between developers and changed files

Case study design

We selected as competitive approach the BCCM proposed by Hassan

Ahmed E. Hassan

Predicting faults using the complexity of code changes.

ICSE 2009: 78 - 88

Higher the number of changes applied to a component, higher is the probability that the component is buggy

Page 32: A defect prediction model based on the relationships between developers and changed files

Results

DBCM and BCCM seems to capture the

same phenomenal

Page 33: A defect prediction model based on the relationships between developers and changed files

Principal Component Analysis

Page 34: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?

Page 35: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?

“Big Bang”

Combination

Page 36: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?

“Big Bang”

Combination

Selection Algorithm

Page 37: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

Step 1: Identification of the characteristics of the periodsinfluencing the accuracy of the models

numOfChanges < 0.05 : DBCM

numOfChanges >= 0.05

| numOfCommittors >= 3

| averageCommitSize < 0.35 : BCCM

| averageCommitSize >= 0.35 : DBCM

numOfCommittors < 3 : BCCM

Page 38: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

Step 2: Applying the models using the selection algorithm

Page 39: A defect prediction model based on the relationships between developers and changed files

DBCM & BCCM: A Combined Approach

Step 2: Applying the models using the selection algorithm

+5%

Page 40: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 41: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 42: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 43: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 44: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 45: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 46: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 47: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 48: A defect prediction model based on the relationships between developers and changed files

Summarizing

Page 49: A defect prediction model based on the relationships between developers and changed files

Summarizing

?