Issues of consistency in defining slices for slicing metrics: ensuring comparability in research...
description
Transcript of Issues of consistency in defining slices for slicing metrics: ensuring comparability in research...
Issues of consistency in defining slices for slicing metrics: ensuring comparability in
research findingsTracy Hall,
Brunel UniversityDavid Bowes,
University of HertfordshireAndrew Kerr,
University of Hertfordshire
Why are we interested in replicating slices? What are slice-based coupling and cohesion
metrics? What did Meyers & Binkley do in their
study? What did we do in our replication of M&B’s
study? How do our results compare to M&B’s? Do slice results matter? What are the implications of our findings?
Schedule
Aimed to investigate whether sliced-based metrics can predict fault-prone code.
We needed to validate that we were collecting slice-based metrics data correctly.
Tried to identically re-produce Meyers and Binkley’s (2004, 2007) metrics values
Our replication highlights many ways in which the identification of program slices can vary.
Our results identify a need for consistency and/or full specification of slicing variables.
Why are we interested in replicating slices?
What are slice-based metrics? Original set of cohesion metrics proposed
by Weiser in 1981 and extended by Ott et al in the 1990’s
Harman et al. (1997) introduced slice-based coupling.
Green et al (2009) present a detailed overview showing the evolution of slice-based coupling and cohesion metrics.
Slice-based coupling metrics
Meyers and Binkley (2007, p.8), use Harman et al.’s (1997) definition of coupling to define the coupling of a function f to be a weighted average of its coupling to all other functions in the program:
Cohesion metric definition (Ott & Thuss, 1993)
Average ratio of the size of a slice to the size of the module. The average length of each slice compared to the length of the module
Smallest ratio of the size of a slice to the size of the module. The ratio of the shortest slice compared to the length of the module
Largest ratio of the size of a slice to the size of the module. The ratio of the longest slice compared to the length of the module
Average ratio of the size of the intersection to the size of a slice. The average proportion of common slices compared to each slice
Ratio of the size of the intersection to the size of the module. The proportion of the module which is common to all slices
Slice-based cohesion metrics
Meyers and Binkley (2004, 2007) first to collect and analyse large scale slice-based metrics data
Collected slice-based metrics data on 63 open source C projects.
Produced a longitudinal study showing the evolution of coupling and cohesion over many releases of Barcode and Gnugo projects
Used CodeSurfer to slice Wrote scripts to collect slice-based metrics
data
What did Meyers & Binkley do in their study?
Fermats Last Theorem“I have discovered a truly marvelous proof that it is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second into two like powers. This margin is too narrow to contain it.” (1637)
Replicated Wiles A (1995)
The problem in replicating studies
Insufficient space in a published paper to describe the methods to allow for replication….
Replicated only M&B’s longitudinal results for the evolution of cohesion in Barcode
Barcode has 65 functions & 49 releases The highest preset build option was used
on CodeSurfer We tried to replicate the method reported
by M&B. We discussed with Dave Binkley
methodological issues that were unclear. We wrote our own Scheme scripts (and were
provided with scripts from CREST (Youssef))
What did we do in our replication?
Longitudinal cohesion
Barcode - M&B Results Barcode - Our results
0.6
0.7
0.8
0.9
1.0
0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99
Tightness Min Coverage Coverage Max Coverage Overlap
Longitudinal cohesion
Barcode - M&B Results Barcode – Our results (full vertex removal)
0.6
0.7
0.8
0.9
1.0
0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99
Tightness Min Coverage Coverage Max Coverage Overlap
Trying to understand where we were going wrong…Looked in detail at one
data point (release 0.98)
Tried to examine all variations in the way that this data point could be calculated.
We sliced both on files and on projects
We varied the way lines of code are included in slices using:1. Formal Ins: Input
parameters for the function specified in the module declaration.
2. Formal Outs: Return variables.
3. Globals: Variables used by or affected by the module.
4. Printf: Variables which appear as Formal Outs in the list of parameters in an output statement.
(based on the variations reported in previous studies analysed by Green et al 2009)
Combinations of slicing settings testedIndividual slicing settings selected
Formal Ins
Formal outs
Globals Printf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
NB all these settings were sliced both on a file and project basis
Not possible
Average module metrics for different combinations of variables
Variables Average module metrics
Sliced as a project Files sliced individually
I O G pF Over- lap
Tight- ness
Cover-age
Min C
Max C Over- lap
Tight- ness
Cover-age
Min C Max C
0.859 0.814 0.919 0.828 0.984 0.649 0.481 0.691 0.523 0.901
0.861 0.820 0.926 0.833 0.984 0.643 0.482 0.705 0.524 0.901
0.903 0.857 0.917 0.870 0.984 0.712 0.551 0.717 0.588 0.898
0.905 0.852 0.926 0.863 0.977 0.759 0.563 0.712 0.587 0.892
0.898 0.837 0.918 0.842 0.966 0.745 0.519 0.671 0.543 0.845
0.911 0.869 0.929 0.881 0.984 0.728 0.560 0.743 0.590 0.898
0.891 0.840 0.927 0.852 0.981 0.772 0.518 0.653 0.538 0.820
0.947 0.895 0.928 0.905 0.975 0.839 0.672 0.764 0.694 0.885
0.920 0.844 0.915 0.847 0.953 0.767 0.521 0.653 0.544 0.761
0.911 0.869 0.929 0.881 0.984 0.728 0.560 0.743 0.590 0.898
0.949 0.883 0.914 0.886 0.956 0.820 0.591 0.688 0.610 0.792
0.972 0.929 0.951 0.933 0.975 0.944 0.823 0.856 0.832 0.885
1.000 0.897 0.897 0.897 0.897 1.000 0.612 0.612 0.612 0.612
0.907 0.859 0.941 0.866 0.971 0.851 0.538 0.639 0.547 0.717
0.917 0.851 0.896 0.866 0.968 0.749 0.464 0.597 0.496 0.778
I = Formal Ins, O = Formal Out, G = Globals, pF=printf; NB: Both forward and backward slices were used in all cases.
Meyers & Binkley results: O=0.51 T=0.26 cov=0.54 min=0.30 max=0.71
What issues impact on slice-based data?
Only use pdgs which are 'user-defined‘ and remove pdgs with zero vertices Keep globals identified n times? String constants considered as output variables (?) Slices are based on both data and control edges Slices of length zero are removed (would have a significant impact on tightness) Intersect all slices with the pdg vertices to remove vertices found outside of the pdg Remove vertex indices with an identifier <1 Remove vertices associated with body '{' and '}' Declaration vertices removed as not consistently included with forward and back
slices Return has auto generated value so if a variable is output via a global or written as
well as returned the script may catch the same (source code) variable twice. Global outputs from a function f include globals modified transitively by calls from f
("outgoing variables"), resulting in numerous slices. Selection of actual inputs to output functions is naïve; sometimes we may want
format string in printf statements Dealing with placeholder functions: if they have size zero after vertices are pruned
they are ignored Should only some types of variables not be included in slicing criteria, e.g. string
type? Should forward slices use may-kill or declaration vertices?
Time for variant performance analysis? Slide 19
For slice-based metrics:◦ Specifying precisely all parameters of a slice and
a metric is important but difficult.◦ Identifying the ‘best’ variant of a metric may be
useful. For replicating studies:
◦ Studies need to publish basic information that allows replication
For Software Engineering◦ We need to build bodies of evidence and this
must include replicated studies.
What are the implications of our findings?
References1. Green, P., Lane, P., Rainer, A., Scholz, S.-B. (2009). An
Introduction to Slice-Based Cohesion and Coupling Metrics. Technical Report No. 488, University of Hertfordshire, School of Computer Science.
2. Harman, M., Okunlawon, M., Sivagurunathan, B., Danicic, S. (1997). Slice-Based Measurement of Coupling. IEEE/ACM ICSE workshop on Process Modelling and Empirical Studies of Software Evolution, (pp. 28-32). Boston, Massachusetts.
3. Meyers, T. M., Binkley, D. (2004) A Longitudinal and Comparative Study of Slice-Based Metrics. International Software Metrics Symposium, Chicargo, USA, IEEE Procs
4. Meyers, T. M., Binkley, D. (2007). An Empirical Study of Slice-Based Cohesion and Coupling Metrics. ACM Transactions on Software Maintenance, 17(1), pp. 1-25.
5. Ott, L. M., &Thuss, J. J. (1993). Slice Based Metrics for Estimating Cohesion. In Proceedings of Internationl Software Metrics Symposium, Proceedings of the IEEE-CS, 71—81
Any questions?
Tracy HallReader in Software EngineeringBrunel UniversityUxbridge, [email protected]
David BowesSenior Lecturer in ComputingUniversity of HertfordshireHatfield, [email protected]
The impact of slice variants Some variants have a better relationship
with fault-prone code than other varients…
Another Cohesion metric: ◦ Proposed by Counsel et al 2006
Adapted for program slices : l= number of slicesk = number of vertices in the modulec = is the number of vertices for the slice based on
<variable, locus>j
Normalised Hamming Distance