An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School...

23
An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire

Transcript of An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School...

Page 1: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

An Empirical Study of the Relationship Between Code Bad Smells and Software Faults

Min ZhangSchool of Computer ScienceUniversity of Hertfordshire

Page 2: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Introduction

What is a Code Bad Smell?Problems using Code Bad SmellsAn overview of the empirical studyCode Bad Smell detectionFault identificationResult and discussionConclusionQ/A

Page 3: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Code Bad Smells

The 22 Code Bad Smells are bad structures in source code informally identified by Fowler et al. (1999).

Fowler et al. (1999) suggest that Code Bad Smells can give “indications that there is trouble that can be solved by a refactoring”.

They are widely used for detecting refactoring opportunities in software (Mens and Tourwe, 2004).

Page 4: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Problems in Using Code Bad Smells

Fowler et al. (1999) claim that Code Bad Smells are structures which cause detrimental effects on software. However, little empirical evidence has been provided.

Most existing Code Bad Smell detection tools are Metric-based. We argue about their accuracy.

Page 5: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

An Empirical Study of the Relationship between Code Bad Smells and Faults

Objective: Capture the relationship between Code Bad Smells and faults

Targeted Code Bad Smells: Data Clumps, Message Chains, Middle Man, Speculative Generality, and Switch Statements

Research Data: Eclipse Core Packages (Release 3.0, 3.0.1, 3.0.2, 3.1 and

3.2) Apache Common Packages (Common IO, Common Logging,

Common Codec, Common DbUtils, Common DBCP, and Common Net )

Page 6: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Code Bad Smell Detection

Pattern-based Code Bad Smell detectionDefine each Code Bad Smell as particular

code patterns Ideas from Gamma et al.’s (1995) definition

of the GoF Design Patterns

Use Recoder API to analyse Java source code

Page 7: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

An Example: The Pattern-based Definition of the Message Chains Bad Smell

The Pattern-based Definition of the Message Chains Bad Smell

Fowler et al.’s definition

You see message chains when a client asks one object for another object, which the client then asks for yet another object, which the client then asks for yet another another object, and so on. You may see these as a long line of getThis methods, or as a sequence of temps. (Fowler et al., 1999)

Pattern-based definition

An instance of the Message Chains Bad Smell is in one of the following situations:Situation 1:1.In order to access a data field in another class, a statement needs to call more than a threshold value of getter methods in a sequence. (E.g. int a=b.getC().getD();)2.This method call statement and the declarations of getter methods are in different classes.Situation 2:1.In order to access a data field in another class, source code use more than a threshold number of temp variable.2.A temp variable is that a variable only access data members (data fields/getter methods) of the other classes or other temp variables. (E.g. ClassC tmpC=b.getC(); int a=a1.getD();)

Page 8: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Fault Identification

Zimmerman et al.’s (2007) fault identification approach:

1. Locate “bug”, “fix(ed)” and “update(d)” token in CVS comment messages.

2. If a version entry in CVS contains one or more above tokens and those tokens are followed by numbers, this version entry is seen as a bug fixing update.

3. Those numbers are treated as bug ID.

4. Confirm the bug ID using Bugzilla database.

Page 9: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Results and Discussion: Binary Coding of the Existence of Code Bad Smells (1)

Existence of Code Bad Smells

CodingData Clumps Message Chains Middle MenSpeculative Generality Switch Statements

0 0 0 0 0 0

1 0 0 0 0 1

0 1 0 0 0 2

1 1 0 0 0 3

0 0 1 0 0 4

1 0 1 0 0 5

0 1 1 0 0 6

1 1 1 0 0 7

0 0 0 1 0 8

1 0 0 1 0 9

0 1 0 1 0 10

1 1 0 1 0 11

0 0 1 1 0 12

1 0 1 1 0 13

0 1 1 1 0 14

1 1 1 1 0 15

Page 10: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: Binary Coding of the Existence of Code Bad Smells (2)

Existence of Code Bad Smells

CodingData Clumps Message Chains Middle MenSpeculative Generality Switch Statements

0 0 0 0 1 16

1 0 0 0 1 17

0 1 0 0 1 18

1 1 0 0 1 19

0 0 1 0 1 20

1 0 1 0 1 21

0 1 1 0 1 22

1 1 1 0 1 23

0 0 0 1 1 24

1 0 0 1 1 25

0 1 0 1 1 26

1 1 0 1 1 27

0 0 1 1 1 28

1 0 1 1 1 29

0 1 1 1 1 30

1 1 1 1 1 31

Page 11: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: One-way Analysis of Variance Eclipse Data (1)

Page 12: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: One-way Analysis of Variance Eclipse Data (2)

The five profiles which indicate the existence of each of the five Code Bad Smells contain significantly lower mean number of faults than profile zero.

All profiles which have higher mean number of faults than profile zero contain the Message Chains and the Switch Statement Bad Smells.

Page 13: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: the Message Chains and Switch Statements

Page 14: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: the Message Chains and Switch Statements

All source code samples associated with more than 10 faults contain the Message Chains Bad Smell.

The Switch Statements Bad Smell does not show a clear relationship with high number of faults.

Page 15: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: One-way Analysis of Variance Apache Data (1)

Page 16: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Result and Discussion: One-way Analysis of Variance Apache Data (2)

The five profiles which indicate the existence of each of the five Code Bad Smells contain lower mean number of faults than profile zero.

All the Message Chains Bad Smell contained profiles do not show higher mean number of faults than the profile zero.

Page 17: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

A Detailed Investigation of Message Chains

Objective: To test whether the Message Chains Bad Smell is

directly associated with faults. To test whether the Message Chains Bad Smell is

directly associated with particular types of faults.

Method: Manually investigate 20 source code samples from

the Eclipse project

Page 18: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

An Detail Investigation of Message Chains: Direct Association with Faults

Association Type Detail of Change Number of Instances

Message Chains Touched During Fix Message Chains Increased 4

Message Chains Reduced 5

Message Chains Not Touched During Fix

45

Total 54

Page 19: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

A Detailed Investigation of Message Chains: Fault Classification

Type of Fault Number of Instances

Algorithm / Method 4

Checking 1

External Interface 2

Internal Interface 2

Non-functional Defects 0

Other 0

Classification Schema: An adopted version of Seaman et al.’s (2008) fault classification schema

Results:

Page 20: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

A Detailed Investigation of Message Chains: Result

Message Chains Bad Smell is not likely to be directly associated with faults, but it indicates a complicated software context.

Message Chains Bad Smell is likely to be associated with Algorithm/Method faults.

Page 21: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Conclusion

Source code containing only one of the five Code Bad Smells is not likely to be fault prone.

The Message Chains Bad Smell could cause a high number of faults and is likely to be associated with Algorithm/Method faults, so it deserves further attention.

The Message Chains Bad Smell may not be directly associated with faults but it may indicate a complicated software context.

Page 22: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

Q/A

Page 23: An Empirical Study of the Relationship Between Code Bad Smells and Software Faults Min Zhang School of Computer Science University of Hertfordshire.

References

FOWLER, M., BECK, K., BRANT, J., OPDYKE, W. & ROBERTS, D. (1999) Refactoring: Improving the Design of Existing Code, Addison Wesley.

GAMMA, E., HELM, R., JOHNSON, R. & VLISSIDES, J. (1995) Design patterns : elements of reusable object-oriented software, Reading, Mass., Addison-Wesley.

MENS, T. & TOURWE, T. (2004) A survey of software refactoring. Software Engineering, IEEE Transactions on, 30, 126-139.

SEAMAN, C. B., SHULL, F., REGARDIE, M., ELBERT, D., FELDMANN, R. L., GUO, Y. & GODFREY, S. (2008) Defect categorization: making use of a decade of widely varying historical data. Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. Kaiserslautern, Germany, ACM.

ZIMMERMANN, T., PREMRAJ, R. & ZELLER, A. (2007) Predicting Defects for Eclipse. IN PREMRAJ, R. (Ed.) Predictor Models in Software Engineering, 2007. PROMISE'07: ICSE Workshops 2007. International Workshop on.