Post on 14-Feb-2016
description
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Investigation of Coding Patterns over Version History
Hironori Date, Takashi Ishio, Katsuro Inoue
Osaka University, Japan
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Coding Patterns
• Frequent sequence of call elements and control elements– Call element
• Method call element• Constructor call element
– Control element• IF, END-IF• LOOP, END-LOOP
etc…• Implement a particular kind of
concerns– spread around source code
2012/10/262
JHotDraw Ver. 5.4b1
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Previous Research [1]
• Extracted coding patterns from 5 applications• Coding pattern type
– API usage patterns– Application-specific Patterns
2012/10/263
[1] T. Ishio, H. Date, T. Miyake, and K. Inoue, “Mining coding patterns to detect crosscutting concerns in java programs,” in Proceedings of the 15th Working Conference on Reverse Engineering, 2008, pp. 123–132.
Coding patterns are candidates of reusable code
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Previous Research [1]
2012/10/264
?? ??
Similar Patterns
<a(), b()> <a(), b(), c()>
<a(), c(), b()>
<IF, a(), b(), END-IF>
Which patterns are easier to reuse?
Assumption: Stable patterns are reusable
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Research Question
To answer this question …1. Extract coding patterns from multiple
versions of applications2. Investigate the life-span of coding patterns
Life-span: the number of versions where we find the identical pattern
Are the coding patterns generally stable
over the version history?
RQ
2012/10/265
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline of Experiment
• Mining coding patterns 1. Normalization of source
code2. Sequential pattern
mining for each version• Tracking coding patterns
– Compute life-span of each pattern
2012/10/266
.java.java.java .jav
a.java.java.jav
a.java.java
.xml .xml .xml…
…
Ver. 1 Ver. 2 Ver. N
Source Code
Coding Patterns
Ver. 1 Ver. 2 … Ver. N Life-span
Pat. 1 3 4 … 3 6
Pat. 2 0 0 … 2 4
… … … … … …
Pat. M 3 0 … 2 3
Life-span
Mining Coding Patterns(using Fung)
Tracking Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline of Experiment
• Mining coding patterns 1. Normalization of source
code2. Sequential pattern
mining for each version• Tracking coding patterns
– Compute life-span of each pattern
2012/10/267
.java.java.java .jav
a.java.java.jav
a.java.java
.xml .xml .xml…
…
Ver. 1 Ver. 2 Ver. N
Source Code
Coding Patterns
Ver. 1 Ver. 2 … Ver. N Life-span
Pat. 1 3 4 … 3 6
Pat. 2 0 0 … 2 4
… … … … … …
Pat. M 3 0 … 2 3
Life-span
Mining Coding Patterns(using Fung)
Tracking Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline of Experiment
• Mining coding patterns 1. Normalization of source
code2. Sequential pattern
mining for each version• Tracking coding patterns
– Compute life-span of each pattern
2012/10/268
.java.java.java .jav
a.java.java.jav
a.java.java
.xml .xml .xml…
…
Ver. 1 Ver. 2 Ver. N
Source Code
Coding Patterns
Ver. 1 Ver. 2 … Ver. N Life-span
Pat. 1 3 4 … 3 6
Pat. 2 0 0 … 2 4
… … … … … …
Pat. M 3 0 … 2 3
Life-span
Mining Coding Patterns(using Fung)
Tracking Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Normalization in Pattern Mining
• Translate each method into a sequence– Call elements– Control elements
• Normalize control elements (Table I)
2012/10/269
public class A { void a() { int i = x + y; callA(); callB(); callB(); }
void b() { if (cond()) { callA(); callB(); } }}
Source File Sequence Database
<callA(), callB(), callB()>
<cond(), IF, callA(), callB(), END-IF>
A.a()
A.b()
Norm
alization
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
public class A { void a() { int i = x + y; callA(); callB(); callB(); }
void b() { if (cond()) { callA(); callB(); } }}
Source File
Norm
alization
Coding Pattern
Sequential Pattern Mining
Sequence Database
<callA(), callB(), callB()>
<cond(), IF, callA(), callB(), END-IF>
2012/10/2610
A.a()
A.b()
Sequential Pattern Mining
• Minimum Length: 2 threshold of #pattern element • Minimum Support: 2 threshold of #pattern instance
class A { void a() { … }}
class A { void b() { … }}
<callA(), callB()>
Parameters
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University11
Identical Patterns Between Versions
• Exact match of pattern sequence
• Not care #instance
2012/10/26
<a(), b(), c()>
Ver. X Ver. Y
<a(), b(), c(), d()>
<a(), b(), c()>
… …
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class C { void c() { … }}
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University12
Identical Patterns Between Versions
• Exact match of pattern sequence
• Not care #instance
2012/10/26
<a(), b(), c()>
Ver. X Ver. Y
<a(), b(), c(), d()>
<a(), b(), c()>
… …
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class C { void c() { … }}
NOT Identical
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University13
Identical Patterns Between Versions
• Exact match of pattern sequence
• Not care #instance
2012/10/26
<a(), b(), c()>
Ver. X Ver. Y
<a(), b(), c(), d()>
<a(), b(), c()>
… …
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class B { void b() { … }}
class C { void c() { … }}
Identical
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span
2012/10/2614
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3
<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3
<d(), e(), f()> 2 0 2 2
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
Pattern Version
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span
2012/10/2615
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()>
<IF, b(), c(), END-IF>
<a(), IF, d(), ELSE, c(), END-IF>
<d(), e(), f()>
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
Pattern Version
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span
2012/10/2616
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()>
<IF, b(), c(), END-IF>
<a(), IF, d(), ELSE, c(), END-IF>
<d(), e(), f()>
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
Pattern Version
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3
<IF, b(), c(), END-IF>
<a(), IF, d(), ELSE, c(), END-IF>
<d(), e(), f()>
Pattern Version
Tracking Coding Patterns
2012/10/2617
Coding Patterns
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
V
<a(), b()>
class A { void a() { … }}
class B { void b() { … }}
class A { void a() { … }}
class C { void c() { … }}
class B{ void b() { … }}
Ver. 1 Ver. 2 Ver. 3
class A { void a() { … }}
class C { void c() { … }}
class B{ void b() { … }}2 instances
3 instances
3 instances
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
2012/10/2618
Coding Patterns
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
V
Ver. 1 Ver. 2 Ver. 3
class A { void a() { … }}
class B{ void b() { … }}Not Found
Not Found
2 instances
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3
<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF>
<d(), e(), f()>
Pattern Version
<IF, b(), c(), END-IF>
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
2012/10/2619
Coding Patterns
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
V
Ver. 1 Ver. 2 Ver. 3
class A { void a() { … }}
class B{ void b() { … }}
class A { void a() { … }}
class C { void c() { … }}
class B{ void b() { … }}
class A { void a() { … }}
class C { void c() { … }}
class B{ void b() { … }}
class D { void d() { … }}4 instances
2 instances
3 instances
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3
<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3
<d(), e(), f()>
Pattern Version
<a(), IF, d(), ELSE, c(), END-IF>
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Tracking Coding Patterns
2012/10/2620
Coding Patterns
Ver. 1 Ver. 2 Ver. 3
.xml .xml.xml
V
Ver. 1 Ver. 2 Ver. 3
class A { void a() { … }}
class B{ void b() { … }}Not Foundclass A {
void a() { … }}
class B{ void b() { … }}2 instances
2 instances
Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3
<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3
<d(), e(), f()> 2 0 2 2
Pattern Version
<d(), e(), f()>
Coding Patterns
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiments• Target applications
download source archive of release versions from project web sites– dnsjava
Version: 0.1 to 2.0.1 (51 versions)– JmDNS
Version: 0.2 to 3.4.1 (20 versions)
• Pattern mining parameters– Minimum length: 2
• Threshold of the number of elements of a pattern sequence– Minimum support: 2
• Threshold of the number of pattern instances
2012/10/2621
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of Experiment
• LOC and the number of patterns– Figure 2 and Figure 3
• Distribution of life-span– Figure 4 and Figure 5
• Distribution of life-span and pattern length– Figure 6 and Figure 7
• Show sample code of patterns with longest life-span– Picked up from Table III and Table IV
2012/10/2622
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
LOC and the Number of Patternsin dnsjava (Figure 2)
• 51 versions• 5,084 LOC to 33,330 LOC• 512 to 4,405 patterns (in single version)• 17,284 patterns in total (no duplication)• The correlation coefficients (LOC & #Pattern): 0.912
2012/10/2623
0
1000
2000
3000
4000
5000
0
5000
10000
15000
20000
25000
30000
35000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.8.1
0.8.2
0.8.3
0.9
0.9.1
0.9.2
0.9.3
0.9.4
0.9.5
1.0
1.0.1
1.0.2
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.1.5
1.1.6
1.2.0
1.2.1
1.2.2
1.2.3
1.2.4
1.3.0
1.3.1
1.3.2
1.3.3
1.4.0
1.4.1
1.4.2
1.4.3
1.5.0
1.5.1
1.5.2
1.6.1
1.6.2
1.6.3
1.6.4
1.6.5
1.6.6
2.0.0
2.0.1
LOC
#Pattern
LOC #Pattern
Version
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
LOC and the Number of Patternsin JmDNS (Figure 3)
• 20 versions• 3,408 LOC to 17,252 LOC• 237 to 2,419 patterns (in single version)• 8,625 patterns in total (no duplication)• The correlation coefficients (LOC & #Pattern): 0.721
2012/10/2624
0
500
1000
1500
2000
2500
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0.2
1.0.RC
11.0.RC
21.0-Final
2.0
2.1
3.0
3.1
3.1.2
3.1.3
3.1.4
3.1.5
3.1.6
3.1.7
3.1.8
3.2.0
3.2.1
3.2.2
3.4.0
3.4.1
LOC
#Pattern
LOC
#Pattern
Version
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University25
Life-span of Patterns in dnsjava (Figure 4)
2012/10/26
4929
3422
1591
1167
782
467
794647
377454320313273136
403
149134121228
48 43 51 29 44 26 25 30 39 25 19 24 20 13 16 14 13 5 6 5 8 3 10 7 21 8 1 1 4 5 0 140
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
Median: 3
14 patterns appear in all versions(Table III)
Life-span
Frequency
Stable PatternUnstable Pattern
Total 17,284 patternsin 51 versions
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University26
Life-span of Patternsin JmDNS (Figure 5)
2012/10/26
21532244
1531
870
415 426357
50188
65 46 41 84 22 16 43 9 2 42 210
500
1000
1500
2000
2500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Median: 2
21 patterns appear in all versions(Table IV)
Life-span
Frequency
Stable PatternUnstable Pattern
Total 8,625 patternsin 20 versions
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Life-span of Patterns• dnsjava (51 versions)
– A half of coding pattern disappeared within 3 versions (median is 3)
• JmDNS (20 versions)– A half of coding pattern disappeared within 2 versions
(median is 2)
2012/10/2627
Life-span of coding pattern tends to be short
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University28
Life-span and Pattern Length dnsjava (Figure 6)
2012/10/26
No Patterns
Coding patterns with short life-span include a small number of elements
Coding patterns with long life-span have short pattern length
Coding patterns
includes a large number of elements
survive only a short period
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University29
Life-span and Pattern LengthJmDNS (Figure 7)
2012/10/26
No Patterns
Coding patterns with long life-span have short pattern length
A lot of patterns with short life-span include a small number of elements
Coding patterns includes a large number of elementssurvive only a short period
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University30
Stable Patterns in dnsjava
2012/10/26
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Stable Pattern in dnsjavaApplication-specific pattern
2012/10/2631
public SetResponseaddMessage(Message in) {
boolean isAuth = in.getHeader().getFlag(Flags.AA);Record question = in.getQuestion();Name qname;Name curname;int qtype;int qclass;int cred;int rcode = in.getHeader().getRcode();boolean haveAnswer = false;
...}
org.xbill.DNS.Cache (ver. 2.0.1)
<getHeader(), getRcode()> 5 instances in ver. 2.0.1
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Stable Pattern in dnsjavaObject generation pattern
2012/10/2632
private voidfindResolvConf(String file) {
InputStream in = null;try {
in = new FileInputStream(file);}catch (FileNotFoundException e) {
return;}InputStreamReader isr = new InputStreamReader(in);BufferedReader br = new BufferedReader(isr);
...}
org.xbill.DNS.spi.ResolverConfig (ver. 2.0.1)
<java.io.InputStreamReader.<init>(java.io.InputStream), java.io.BufferedReader.<init>(java.io.Reader)> 5 instances in ver. 2.0.1
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Stable Pattern in dnsjavaIteration related idiom
2012/10/2633
protected DNSJavaNameService() { ...
if (nameServers != null) {StringTokenizer st = new StringTokenizer(nameServers, ",");String [] servers = new String[st.countTokens()];int n = 0;while (st.hasMoreTokens())
servers[n++] = st.nextToken();try {
Resolver res = new ExtendedResolver(servers);Lookup.setDefaultResolver(res);
}catch (UnknownHostException e) {
...}
} ...} org.xbill.DNS.spi.DNSJavaNameService (ver. 2.0.1)
<hasMoreTokens(), LOOP, nextToken(), hasMoreTokens(), END-LOOP>6 instances in ver. 2.0.1
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Stable Patterns in JmDNS
2012/10/2634
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University35
Stable Pattern in JmDNSMulti-thread idiom with synchronized keyword
2012/10/26
public synchronized String getPropertyString(String name) { byte data[] = this.getProperties().get(name); if (data == null) { return null; } if (data == NO_VALUE) { return "true"; } return readUTF(data, 0, data.length);}
javax.jmdns.impl.ServiceInfoImpl (ver. 3.4.1)
<SYNCHRONIZED, getProperties(), get(java.lang.Object), END-SYNCHRONIZED>2 instances in ver.3.4.1
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Answer the Research Question
• Coding patterns with short life-span account for a large part
• Few coding patterns with long life-span
Are the coding patterns generally stable
over the version history?
RQ
No, The coding patterns are NOT generally
stable.
Answer
2012/10/2636
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion• Investigation of the stability of coding patterns across
versions– Method
• Extract coding patterns from versions of code• Compute life-span
– Target• dnsjava (51 versions)• JmDNS (20 versions)
• Result– Coding patterns are not generally stable
• Coding patterns may not be suitable for reuse• Future work
– Further investigation with more applications
2012/10/2637