A defect prediction model based on the relationships between developers and changed files
-
Upload
dario-di-nucci -
Category
Engineering
-
view
90 -
download
5
description
Transcript of A defect prediction model based on the relationships between developers and changed files
University of Salerno
26th September 2014
A DEFECT PREDICTION MODEL BASED ON THE RELATIONSHIPS BETWEEN DEVELOPERS AND
CHANGED FILES
Candidate: Dario Di Nucci
Advisor: Andrea De Lucia
Outline1. Motivation
2. Related work
3. A new metric for defect prediction based on the
relationships between developers and files
4. Case study: empirical evaluation of the prediction
model
5. Case study: combination of prediction models
6. Conclusion and future works
Software Evolution and Defect Prediction
50% of development
costs
B. Beizer. - Software Testing Techniques (2nd ed.).
Van Nostrand Reinhold Co., New York, NY, USA, 1990
G. Myers - The Art of Software Testing
Wiley. ISBN 978-0-471-46912-4 (2004)
Knowing which software components contain defects could be crucial
Defect prediction process
V. R. Basili, L. C. Briand, and W. L. Melo
A validation of object-oriented design metrics as
quality indicators
IEEE Transactions on Software Engineering, 22 (10):
751 – 761, 1996.
There are a lot of prediction metrics
Based on the state of the
project
Based on the history of the
project
A. Bernstein, J. Ekanayake, and M. Pinzger
“Improving defect prediction using temporal features
and non linear models,”
Proceedings of IWPSE 2007, 2007, pp. 11–18.
There are a lot of prediction metrics
V. R. Basili, L. C. Briand, and W. L. Melo
A validation of object-oriented design metrics as
quality indicators
IEEE Transactions on Software Engineering, 22 (10):
751 – 761, 1996.
No one considers developers!
A. Bernstein, J. Ekanayake, and M. Pinzger
“Improving defect prediction using temporal features
and non linear models,”
Proceedings of IWPSE 2007, 2007, pp. 11–18.
Also the best developer could not make a good job when working on different tasks at the same time
Introducing Developer Based Changes ModelA new model for defect prediction based on the
relationships between developers and files
Developer tree construction
Two metrics for analyzing the developer confusion
Developer Semantical Confusion
Developer Structural Confusion
A new metric for file complexity based on developers confusion
Semantical File Complexity
Structural File Complexity
DBCM Process
DBCM Process
DBCM Process
DBCM Process
DBCM Process
DBCM Process
Classifier training & test set
Lehman, M. M., “On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle” - Journal of Systems and Software (1980)
The system evolution processes are self-regulating with the distribution of product and process measures close to normal
Classifier training & test set
How long should be a period?
Classifier training & test set
How long should be a period?
3 Months
Ahmed E. Hassan
Predicting faults using the complexity of code changes.
ICSE 2009: 78 - 88
Classifier training & test set
Which classifier to use?
Classifier training & test set
Decision Table Majority
Which classifier to use?
Empirical Evaluation
Case study design
Apache AntApache Cassandra
Apache JMeterApache LenyaApache Log4jApache Poi
Apache Tomcat 7Apache Xerces-J
Even if the systems are of the same ecosystem, they have different size and different history
Case study design
RQ1: What is the accuracy of the prediction made by DBCM?
RQ2: How does DBCM compared with techniques based on number of changes?
Case study design
Case study design
We selected as competitive approach the BCCM proposed by Hassan
Ahmed E. Hassan
Predicting faults using the complexity of code changes.
ICSE 2009: 78 - 88
Higher the number of changes applied to a component, higher is the probability that the component is buggy
Results
DBCM and BCCM seems to capture the
same phenomenal
Principal Component Analysis
DBCM & BCCM: A Combined Approach
RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?
DBCM & BCCM: A Combined Approach
RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?
“Big Bang”
Combination
DBCM & BCCM: A Combined Approach
RQ3: Is it possible to combine the two approaches in order to increase the prediction accuracy?
“Big Bang”
Combination
Selection Algorithm
DBCM & BCCM: A Combined Approach
Step 1: Identification of the characteristics of the periodsinfluencing the accuracy of the models
numOfChanges < 0.05 : DBCM
numOfChanges >= 0.05
| numOfCommittors >= 3
| averageCommitSize < 0.35 : BCCM
| averageCommitSize >= 0.35 : DBCM
numOfCommittors < 3 : BCCM
DBCM & BCCM: A Combined Approach
Step 2: Applying the models using the selection algorithm
DBCM & BCCM: A Combined Approach
Step 2: Applying the models using the selection algorithm
+5%
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
Summarizing
?