Searching for Configurations in Clone Evaluation: A Replication Study
C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke J. H. Drake
CENTRE FOR RESEARCH ON EVOLUTION, SEARCH AND TESTING DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY COLLEGE LONDON
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Code Clone
2
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Clone Detectors
3
if (x==0) then y=y+1;
if (check==0) then count=count+1;
$p ($p==0) $p $p=$p+1;
$p ($p==0) $p $p=$p+1;
if_s
if ( cond_e ) then assign_e
if_s
if ( cond_e ) then assign_e
Deckard
CCFinder
SimianNiCad
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Oracle Problem in Code Clone
Absence of the possibility to establish a ground truth, we do not know if code is actually cloned
4
?
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Agreement
5
?
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Parameters Tuning
6
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
EvaClone
7
T. Wang, M. Harman., Y. Jia, & J. Krinke. Searching for Better Configurations: A Rigorous Approach to Clone Evaluation. in FSE’13
6 Clone Detectors:PMD, iClones ConQAT, Simian, NiCad, CCFinder
8 Software Projects:weltab, cook, snns, psql, javadoc, ant, jdtcore, swing15�year
s
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Maximising Agreement
8
C D N S
Maximise
Clone detectors
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
EvaClone (cont.)
9
EvaClone favors recall over precision and more candidates will be reported.
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Replication Study
10
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Fitness Function
11
4x3x2x1x ++ +4 x (All clone lines)
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Replication Study (cont.)
12
DeckardCCFinder
SimianNiCad 25 parameters
Population size 100No. of Generation 100
Crossover 0.8Mutation 0.1Elitism 0.25
2 x 1012
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake13
Ver. 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
SLOC (k) 5.5 6.7 6.78 6.82 7.2 7.6 8.4 8.9 10.1 12.4 17.9 22.8 23.6 25.3
%Inc N/A 21% 2% 1% 6% 5% 11% 7% 13% 23% 44% 28% 3% 8%
Note: there are 2 complete libraries (cglib and asm) embedded in release 1.5 — 1.9 and have been removed before the analysis
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
RQ1: Optimised Agreement How do the default parameters perform in terms of
clone agreement on each Mockito release compared to the optimised ones?
14
0.30
0.35
0.40
0.45
0.50
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44Mockito
Fitn
ess
Valu
e
DefaultEvaClone HighestEvaClone Lowest
Comparison of optimised tools agreement (the highest and the lowest in 20 runs) to the default agreement over 14 Mockito releases
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
RQ2: Stability of Optimised Parameters
15
Are there noticeable differences in the values of optimised parameters over releases?
Tool Parameter DFOptimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
CCFinder MinToken TKS
50 12
10 10
70 16
70 18
70 19
80 18
80 18
80 19
80 20
10 14
10 17
10 10
10 10
10 10
10 10
DeckardMinToken Stride Similarity
30 5
0.9
30 inf
0.9
50 8
1.0
50 8
1.0
50 8
1.0
50 8
1.0
50 8
1.0
50 8
1.0
50 8
1.0
50 16
0.95
50 5
1.0
50 inf
0.9
50 inf
0.9
50 inf
0.9
50 inf
0.9
NiCad
MinLine MaxLine UPI Blind Abstract
6 1K 0.3
0 0
5 200 0.3
1 4
7 100 0.0
0 6
7 100 0.1
0 6
7 400 0.0
0 6
6 400 0.0
0 6
6 200 0.1
0 5
6 200 0.1
0 5
7 200 0.0
1 6
6 200 0.3
1 6
5 100 0.1
1 2
5 100 0.3
1 4
5 100 0.3
1 4
5 200 0.3
1 4
5 200 0.3
1 4
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
RQ2: Stability of Optimised Parameters
16
Tool Parameter DFOptimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44
Simian
ignoreCurlyBraces 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0ignoreIdentifiers 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1ignoreIdentifierCase 0 ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱ ✱
ignoreStrings 0 1 0 0 0 0 0 0 0 1 0 ✱ ✱ ✱ ✱
ignoreStringCase 1 ✱ 1 1 0 0 0 0 0 ✱ 0 ✱ ✱ ✱ ✱
ignoreNumbers 0 1 0 1 0 1 1 0 1 1 0 ✱ ✱ ✱ ✱
ignoreCharacters 0 0 0 1 0 0 0 1 0 0 1 ✱ ✱ ✱ ✱
ignoreCharacterCase 1 0 0 ✱ 1 1 0 ✱ 1 1 ✱ ✱ ✱ ✱ ✱
ignoreLiterals 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1ignoreSubtypeNames 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1ignoreModifiers 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1ignoreVariableNames 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1balanceParentheses 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0balanceSquareBrackets 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0MinLine 6 5 6 6 6 6 6 6 6 7 7 5 5 5 5
Are there noticeable differences in the values of optimised parameters over releases?
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
RQ3: Clones over Releases
17
How many clones in Mockito are reported with the highest agreement over releases?
DefaultEvaClone
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Maximising Agreement
18
C D N S
Maximise
Clone detectors
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake
Open Challenge“A better fitness function for EvaClone is needed” It must not only rely on the number of cloned lines, but also include other aspects:
How often a line is found to be cloned to other places? Precision vs. Recall? Location of clones
19
???
Searching for Configurations in Clone Evaluation: A Replication Study — C. Ragkhitwetsagul, M. Paixao, M. Adham, S. Busari, J. Krinke, J. H. Drake20
0.30
0.35
0.40
0.45
0.50
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 2.0.0 2.0.44Mockito
Fitn
ess
Valu
e
DefaultEvaClone HighestEvaClone Lowest
Opt. params vs Def. params
Tool Parameter DF
Optimised
0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.102.0.0
2.0.44
CCFinder MinToken TKS
50 12
10 10
70 16
70 18
70 19
80 18
80 18
80 19
80 20
10 14
10 17
10 10
10 10
10 10
10 10
DeckardMinToken Stride Similarity
30 5
0.9
30 inf 0.9
50 8 1.0
50 8 1.0
50 8
1.0
50 8
1.0
50 8 1.0
50 8 1.0
50 8 1.0
50 16 0.95
50 5 1.0
50 inf 0.9
50 inf 0.9
50 inf 0.9
50 inf 0.9
NiCad
MinLine MaxLine UPI Blind Abstract
6 1K 0.3 0 0
5 200 0.3 1 4
7 100 0.0 0 6
7 100 0.1 0 6
7 400
0.0 0 6
6 400
0.0 0 6
6 200 0.1 0 5
6 200 0.1 0 5
7 200 0.0 1 6
6 200
0.3 1 6
5 100 0.1 1 2
5 100 0.3 1 4
5 100 0.3 1 4
5 200
0.3 1 4
5 200
0.3 1 4
Opt. params are not stable over releases
DefaultEvaClone
Fitness func. needs improvements
Top Related