Investigation of Coding Patterns over Version History

37
e Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka Un Investigation of Coding Patterns over Version History Hironori Date , Takashi Ishio, Katsuro Inoue Osaka University, Japan

description

Investigation of Coding Patterns over Version History. Hironori Date , Takashi Ishio , Katsuro Inoue Osaka University, Japan. Coding Patterns. F requent sequence of call elements and control elements Call element Method call element Constructor call element Control element - PowerPoint PPT Presentation

Transcript of Investigation of Coding Patterns over Version History

Page 1: Investigation of Coding Patterns over Version History

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Investigation of Coding Patterns over Version History

Hironori Date, Takashi Ishio, Katsuro Inoue

Osaka University, Japan

Page 2: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Coding Patterns

• Frequent sequence of call elements and control elements– Call element

• Method call element• Constructor call element

– Control element• IF, END-IF• LOOP, END-LOOP

etc…• Implement a particular kind of

concerns– spread around source code

2012/10/262

JHotDraw Ver. 5.4b1

Page 3: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Previous Research [1]

• Extracted coding patterns from 5 applications• Coding pattern type

– API usage patterns– Application-specific Patterns

2012/10/263

[1] T. Ishio, H. Date, T. Miyake, and K. Inoue, “Mining coding patterns to detect crosscutting concerns in java programs,” in Proceedings of the 15th Working Conference on Reverse Engineering, 2008, pp. 123–132.

Coding patterns are candidates of reusable code

Page 4: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Previous Research [1]

2012/10/264

?? ??

Similar Patterns

<a(), b()> <a(), b(), c()>

<a(), c(), b()>

<IF, a(), b(), END-IF>

Which patterns are easier to reuse?

Assumption: Stable patterns are reusable

Page 5: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Research Question

To answer this question …1. Extract coding patterns from multiple

versions of applications2. Investigate the life-span of coding patterns

Life-span: the number of versions where we find the identical pattern

Are the coding patterns generally stable

over the version history?

RQ

2012/10/265

Page 6: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline of Experiment

• Mining coding patterns 1. Normalization of source

code2. Sequential pattern

mining for each version• Tracking coding patterns

– Compute life-span of each pattern

2012/10/266

.java.java.java .jav

a.java.java.jav

a.java.java

.xml .xml .xml…

Ver. 1 Ver. 2 Ver. N

Source Code

Coding Patterns

Ver. 1 Ver. 2 … Ver. N Life-span

Pat. 1 3 4 … 3 6

Pat. 2 0 0 … 2 4

… … … … … …

Pat. M 3 0 … 2 3

Life-span

Mining Coding Patterns(using Fung)

Tracking Coding Patterns

Page 7: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline of Experiment

• Mining coding patterns 1. Normalization of source

code2. Sequential pattern

mining for each version• Tracking coding patterns

– Compute life-span of each pattern

2012/10/267

.java.java.java .jav

a.java.java.jav

a.java.java

.xml .xml .xml…

Ver. 1 Ver. 2 Ver. N

Source Code

Coding Patterns

Ver. 1 Ver. 2 … Ver. N Life-span

Pat. 1 3 4 … 3 6

Pat. 2 0 0 … 2 4

… … … … … …

Pat. M 3 0 … 2 3

Life-span

Mining Coding Patterns(using Fung)

Tracking Coding Patterns

Page 8: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline of Experiment

• Mining coding patterns 1. Normalization of source

code2. Sequential pattern

mining for each version• Tracking coding patterns

– Compute life-span of each pattern

2012/10/268

.java.java.java .jav

a.java.java.jav

a.java.java

.xml .xml .xml…

Ver. 1 Ver. 2 Ver. N

Source Code

Coding Patterns

Ver. 1 Ver. 2 … Ver. N Life-span

Pat. 1 3 4 … 3 6

Pat. 2 0 0 … 2 4

… … … … … …

Pat. M 3 0 … 2 3

Life-span

Mining Coding Patterns(using Fung)

Tracking Coding Patterns

Page 9: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Normalization in Pattern Mining

• Translate each method into a sequence– Call elements– Control elements

• Normalize control elements (Table I)

2012/10/269

public class A { void a() { int i = x + y; callA(); callB(); callB(); }

void b() { if (cond()) { callA(); callB(); } }}

Source File Sequence Database

<callA(), callB(), callB()>

<cond(), IF, callA(), callB(), END-IF>

A.a()

A.b()

Norm

alization

Page 10: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

public class A { void a() { int i = x + y; callA(); callB(); callB(); }

void b() { if (cond()) { callA(); callB(); } }}

Source File

Norm

alization

Coding Pattern

Sequential Pattern Mining

Sequence Database

<callA(), callB(), callB()>

<cond(), IF, callA(), callB(), END-IF>

2012/10/2610

A.a()

A.b()

Sequential Pattern Mining

• Minimum Length: 2 threshold of #pattern element • Minimum Support: 2 threshold of #pattern instance

class A { void a() { … }}

class A { void b() { … }}

<callA(), callB()>

Parameters

Page 11: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University11

Identical Patterns Between Versions

• Exact match of pattern sequence

• Not care #instance

2012/10/26

<a(), b(), c()>

Ver. X Ver. Y

<a(), b(), c(), d()>

<a(), b(), c()>

… …

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class C { void c() { … }}

Page 12: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University12

Identical Patterns Between Versions

• Exact match of pattern sequence

• Not care #instance

2012/10/26

<a(), b(), c()>

Ver. X Ver. Y

<a(), b(), c(), d()>

<a(), b(), c()>

… …

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class C { void c() { … }}

NOT Identical

Page 13: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University13

Identical Patterns Between Versions

• Exact match of pattern sequence

• Not care #instance

2012/10/26

<a(), b(), c()>

Ver. X Ver. Y

<a(), b(), c(), d()>

<a(), b(), c()>

… …

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class B { void b() { … }}

class C { void c() { … }}

Identical

Page 14: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span

2012/10/2614

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3

<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3

<d(), e(), f()> 2 0 2 2

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

Pattern Version

Coding Patterns

Page 15: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span

2012/10/2615

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()>

<IF, b(), c(), END-IF>

<a(), IF, d(), ELSE, c(), END-IF>

<d(), e(), f()>

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

Pattern Version

Coding Patterns

Page 16: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

1. List all of coding patterns from all versions2. Look up #pattern instance in each version3. Compute life-span

2012/10/2616

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()>

<IF, b(), c(), END-IF>

<a(), IF, d(), ELSE, c(), END-IF>

<d(), e(), f()>

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

Pattern Version

Coding Patterns

Page 17: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3

<IF, b(), c(), END-IF>

<a(), IF, d(), ELSE, c(), END-IF>

<d(), e(), f()>

Pattern Version

Tracking Coding Patterns

2012/10/2617

Coding Patterns

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

V

<a(), b()>

class A { void a() { … }}

class B { void b() { … }}

class A { void a() { … }}

class C { void c() { … }}

class B{ void b() { … }}

Ver. 1 Ver. 2 Ver. 3

class A { void a() { … }}

class C { void c() { … }}

class B{ void b() { … }}2 instances

3 instances

3 instances

Coding Patterns

Page 18: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

2012/10/2618

Coding Patterns

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

V

Ver. 1 Ver. 2 Ver. 3

class A { void a() { … }}

class B{ void b() { … }}Not Found

Not Found

2 instances

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3

<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF>

<d(), e(), f()>

Pattern Version

<IF, b(), c(), END-IF>

Coding Patterns

Page 19: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

2012/10/2619

Coding Patterns

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

V

Ver. 1 Ver. 2 Ver. 3

class A { void a() { … }}

class B{ void b() { … }}

class A { void a() { … }}

class C { void c() { … }}

class B{ void b() { … }}

class A { void a() { … }}

class C { void c() { … }}

class B{ void b() { … }}

class D { void d() { … }}4 instances

2 instances

3 instances

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3

<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3

<d(), e(), f()>

Pattern Version

<a(), IF, d(), ELSE, c(), END-IF>

Coding Patterns

Page 20: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Tracking Coding Patterns

2012/10/2620

Coding Patterns

Ver. 1 Ver. 2 Ver. 3

.xml .xml.xml

V

Ver. 1 Ver. 2 Ver. 3

class A { void a() { … }}

class B{ void b() { … }}Not Foundclass A {

void a() { … }}

class B{ void b() { … }}2 instances

2 instances

Ver. 1 Ver. 2 Ver. 3 Life-span<a(), b()> 2 3 3 3

<IF, b(), c(), END-IF> 0 0 2 1<a(), IF, d(), ELSE, c(), END-IF> 4 2 3 3

<d(), e(), f()> 2 0 2 2

Pattern Version

<d(), e(), f()>

Coding Patterns

Page 21: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiments• Target applications

download source archive of release versions from project web sites– dnsjava

Version: 0.1 to 2.0.1 (51 versions)– JmDNS

Version: 0.2 to 3.4.1 (20 versions)

• Pattern mining parameters– Minimum length: 2

• Threshold of the number of elements of a pattern sequence– Minimum support: 2

• Threshold of the number of pattern instances

2012/10/2621

Page 22: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result of Experiment

• LOC and the number of patterns– Figure 2 and Figure 3

• Distribution of life-span– Figure 4 and Figure 5

• Distribution of life-span and pattern length– Figure 6 and Figure 7

• Show sample code of patterns with longest life-span– Picked up from Table III and Table IV

2012/10/2622

Page 23: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

LOC and the Number of Patternsin dnsjava (Figure 2)

• 51 versions• 5,084 LOC to 33,330 LOC• 512 to 4,405 patterns (in single version)• 17,284 patterns in total (no duplication)• The correlation coefficients (LOC & #Pattern): 0.912

2012/10/2623

0

1000

2000

3000

4000

5000

0

5000

10000

15000

20000

25000

30000

35000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.8.1

0.8.2

0.8.3

0.9

0.9.1

0.9.2

0.9.3

0.9.4

0.9.5

1.0

1.0.1

1.0.2

1.1

1.1.1

1.1.2

1.1.3

1.1.4

1.1.5

1.1.6

1.2.0

1.2.1

1.2.2

1.2.3

1.2.4

1.3.0

1.3.1

1.3.2

1.3.3

1.4.0

1.4.1

1.4.2

1.4.3

1.5.0

1.5.1

1.5.2

1.6.1

1.6.2

1.6.3

1.6.4

1.6.5

1.6.6

2.0.0

2.0.1

LOC

#Pattern

LOC #Pattern

Version

Page 24: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

LOC and the Number of Patternsin JmDNS (Figure 3)

• 20 versions• 3,408 LOC to 17,252 LOC• 237 to 2,419 patterns (in single version)• 8,625 patterns in total (no duplication)• The correlation coefficients (LOC & #Pattern): 0.721

2012/10/2624

0

500

1000

1500

2000

2500

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0.2

1.0.RC

11.0.RC

21.0-Final

2.0

2.1

3.0

3.1

3.1.2

3.1.3

3.1.4

3.1.5

3.1.6

3.1.7

3.1.8

3.2.0

3.2.1

3.2.2

3.4.0

3.4.1

LOC

#Pattern

LOC

#Pattern

Version

Page 25: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University25

Life-span of Patterns in dnsjava (Figure 4)

2012/10/26

4929

3422

1591

1167

782

467

794647

377454320313273136

403

149134121228

48 43 51 29 44 26 25 30 39 25 19 24 20 13 16 14 13 5 6 5 8 3 10 7 21 8 1 1 4 5 0 140

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Median: 3

14 patterns appear in all versions(Table III)

Life-span

Frequency

Stable PatternUnstable Pattern

Total 17,284 patternsin 51 versions

Page 26: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University26

Life-span of Patternsin JmDNS (Figure 5)

2012/10/26

21532244

1531

870

415 426357

50188

65 46 41 84 22 16 43 9 2 42 210

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Median: 2

21 patterns appear in all versions(Table IV)

Life-span

Frequency

Stable PatternUnstable Pattern

Total 8,625 patternsin 20 versions

Page 27: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Life-span of Patterns• dnsjava (51 versions)

– A half of coding pattern disappeared within 3 versions (median is 3)

• JmDNS (20 versions)– A half of coding pattern disappeared within 2 versions

(median is 2)

2012/10/2627

Life-span of coding pattern tends to be short

Page 28: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University28

Life-span and Pattern Length dnsjava (Figure 6)

2012/10/26

No Patterns

Coding patterns with short life-span include a small number of elements

Coding patterns with long life-span have short pattern length

Coding patterns

includes a large number of elements

survive only a short period

Page 29: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University29

Life-span and Pattern LengthJmDNS (Figure 7)

2012/10/26

No Patterns

Coding patterns with long life-span have short pattern length

A lot of patterns with short life-span include a small number of elements

Coding patterns includes a large number of elementssurvive only a short period

Page 30: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University30

Stable Patterns in dnsjava

2012/10/26

Page 31: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Stable Pattern in dnsjavaApplication-specific pattern

2012/10/2631

public SetResponseaddMessage(Message in) {

boolean isAuth = in.getHeader().getFlag(Flags.AA);Record question = in.getQuestion();Name qname;Name curname;int qtype;int qclass;int cred;int rcode = in.getHeader().getRcode();boolean haveAnswer = false;

...}

org.xbill.DNS.Cache (ver. 2.0.1)

<getHeader(), getRcode()> 5 instances in ver. 2.0.1

Page 32: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Stable Pattern in dnsjavaObject generation pattern

2012/10/2632

private voidfindResolvConf(String file) {

InputStream in = null;try {

in = new FileInputStream(file);}catch (FileNotFoundException e) {

return;}InputStreamReader isr = new InputStreamReader(in);BufferedReader br = new BufferedReader(isr);

...}

org.xbill.DNS.spi.ResolverConfig (ver. 2.0.1)

<java.io.InputStreamReader.<init>(java.io.InputStream), java.io.BufferedReader.<init>(java.io.Reader)> 5 instances in ver. 2.0.1

Page 33: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Stable Pattern in dnsjavaIteration related idiom

2012/10/2633

protected DNSJavaNameService() { ...

if (nameServers != null) {StringTokenizer st = new StringTokenizer(nameServers, ",");String [] servers = new String[st.countTokens()];int n = 0;while (st.hasMoreTokens())

servers[n++] = st.nextToken();try {

Resolver res = new ExtendedResolver(servers);Lookup.setDefaultResolver(res);

}catch (UnknownHostException e) {

...}

} ...} org.xbill.DNS.spi.DNSJavaNameService (ver. 2.0.1)

<hasMoreTokens(), LOOP, nextToken(), hasMoreTokens(), END-LOOP>6 instances in ver. 2.0.1

Page 34: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Stable Patterns in JmDNS

2012/10/2634

Page 35: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University35

Stable Pattern in JmDNSMulti-thread idiom with synchronized keyword

2012/10/26

public synchronized String getPropertyString(String name) { byte data[] = this.getProperties().get(name); if (data == null) { return null; } if (data == NO_VALUE) { return "true"; } return readUTF(data, 0, data.length);}

javax.jmdns.impl.ServiceInfoImpl (ver. 3.4.1)

<SYNCHRONIZED, getProperties(), get(java.lang.Object), END-SYNCHRONIZED>2 instances in ver.3.4.1

Page 36: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Answer the Research Question

• Coding patterns with short life-span account for a large part

• Few coding patterns with long life-span

Are the coding patterns generally stable

over the version history?

RQ

No, The coding patterns are NOT generally

stable.

Answer

2012/10/2636

Page 37: Investigation of Coding Patterns over Version History

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion• Investigation of the stability of coding patterns across

versions– Method

• Extract coding patterns from versions of code• Compute life-span

– Target• dnsjava (51 versions)• JmDNS (20 versions)

• Result– Coding patterns are not generally stable

• Coding patterns may not be suitable for reuse• Future work

– Further investigation with more applications

2012/10/2637