Graph-Based Code Completion

30
Graph-Based Pattern-Oriented, Context-Sensitive Source Code Completion Nguyen, T.T. ; Nguyen, H.A. ; Tamrawi, A. ; Nguyen, H.V. ; Al- Kofahi, J. ; Nguyen, T.N. Presented By: Mohammad Masudur Rahman

Transcript of Graph-Based Code Completion

Page 1: Graph-Based Code Completion

Graph-Based Pattern-Oriented, Context-Sensitive Source Code Completion

Nguyen, T.T. ;  Nguyen, H.A. ;  Tamrawi, A. ;  Nguyen, H.V. ;  Al-Kofahi, J. ;  Nguyen, T.N.

Presented By: Mohammad Masudur Rahman

Page 2: Graph-Based Code Completion

2

Contents

Code Completion Thesis Statement Motivating Example Terminologies Methodology Empirical Evaluation & Results My Observation & Future Thoughts

Page 3: Graph-Based Code Completion

3

Code Completion

Built-in feature of modern all IDEs Speed up development Longer Identifier names for program comprehension Less overhead for developers Mostly single variable, method supports- API

packages Template based support – control structure, event

handling and others

Page 4: Graph-Based Code Completion

4

Thesis Statement

Novel approach with graph-based code completion

Graph based feature extracting, searching, ranking of API usage pattern, matching with editing context of current code.

Empirical evaluation shows correctness and usefulness- 95% precision, 92% recall, 93% f-score over 24 real world systems

Page 5: Graph-Based Code Completion

5

Motivating Example (Single-line)

Fig 1: Current State of Code Completion (Eclipse 3.6)

Page 6: Graph-Based Code Completion

6

Motivating Example (Multi-line)

Fig 2: SWT Usage Example

Page 7: Graph-Based Code Completion

7

Motivating Example (Query)

Fig 3: SWT Query Example

Page 8: Graph-Based Code Completion

8

Terminologies

GRAPACC API Usage Pattern Groum Based Model Context-sensitive Weight

Page 9: Graph-Based Code Completion

9

GRAPACC

Graph-Based Pattern-Oriented Context-Sensitive Code Completion

Page 10: Graph-Based Code Completion

10

API Usage Pattern

Fig 4: SWT API Usage

Page 11: Graph-Based Code Completion

11

Groum Based Model

Fig 5: Groum Conversion

Page 12: Graph-Based Code Completion

12

Context-Sensitive Weight

)1(

1)(

d

qw f

Wf (q)=Context-sensitive weight of feature q

q= feature of Query, Q

d=distance to the closest token in Groum Model

Page 13: Graph-Based Code Completion

13

Methodology

Query Processing and Feature Extraction Pattern Managing, Searching and Ranking Pattern Oriented Code Completion

Page 14: Graph-Based Code Completion

14

Query Processing and Feature Extraction

Tokenizing Partial Parsing Groum Building Feature Extracting and Weighting

Page 15: Graph-Based Code Completion

15

Tokenizing, Partial Parsing

Lexical analysis Preserves keywords related to control

structure, rest are removed elsewhere but saved

Eclipse java parser PPA tool returns AST (Abstract Syntax

Tree) Unresolved nodes assigned ‘Unknown Type’

Page 16: Graph-Based Code Completion

16

Groum Building

Groum from AST Unresolved nodes are

discarded but considered as tokens

Query converted to the following Groum

Fig 6: Groum of Query

Page 17: Graph-Based Code Completion

17

Feature Extraction & Weighting

Groum nodes mapped to tokens in tokenization step

Feature extracted from Groum for path, L<=3 3 factors contribute to feature weight Structured based factor (size) Structured based factor (centrality) User based factor

Page 18: Graph-Based Code Completion

18

Feature Extraction & Weighting

ws(q)= size based weight for feature, q of Query, Q (w(q)=1+size(q); 1<= size(q)<=3)

wc(q)= Centrality based weight for feature, q of Query, Q (wc(q)=n / s, n=no of neighbors, s=size)

(wf(q)=1/(d+1)), distance between focus node and the closest token in feature path Groum Model

Page 19: Graph-Based Code Completion

19

Feature Extraction & Weighting

w(q)= total weight for feature, q of Query, Q ws(q)= size based weight for feature, q of Query, Q

wc(q)= Centrality based weight for feature, q of Query, Q

wf(q)= used based weight for feature, q of Query, Q

Page 20: Graph-Based Code Completion

20

Pattern Managing, Searching and Ranking

Pr(P) is popularity of pattern P = frequency of Pattern P

Weight of feature p in Pattern P using inverse indexing

Np,P=occurrence of feature p in P, NP=total no of features in P

Np=No of patterns containing p, N=total no of pattern in database

Page 21: Graph-Based Code Completion

21

Pattern Managing, Searching and Ranking

For each feature p, L(p), a list of patterns from which p can be extracted

p for pattern feature, q for query feature Now sim(p,q)>∂,then p is added to F, set of mapped

features for q For each pєF, top n ranked patterns from L(p) is

added to C, candidate patterns for relevance computation

Now for each P in C, compute fit(P,Q)

Page 22: Graph-Based Code Completion

22

Feature Similarity

is a name-based similarity between two features given that feature is a collection of labels and has the formOf X.Y.Z where X=package nameY=class nameZ=method name

Page 23: Graph-Based Code Completion

23

Name-based Similarity (nsim)

wsim(X, X’) is word-based similarity X, X’ are broken down and two sequence of words

L(x) and L(y) Similarity computed as Lo/Lm

Lo is length of LCS, Lm is average length of two sequences

=

Page 24: Graph-Based Code Completion

24

Pattern Matching (Relevance)

Page 25: Graph-Based Code Completion

25

Pattern Matching

SM(P,Q)=total weight of Matched feature pair

Fit (P, Q)=Relevance degree between P and Q

Pr(P)=Popularity of Pattern P

Page 26: Graph-Based Code Completion

26

Pattern Oriented Code Completion

Matched pattern is selected and corresponding node in Groum is matched

The missing nodes are fulfilled with code

Page 27: Graph-Based Code Completion

27

Empirical Evaluation

Precision Recall F-score java.io, java.util :API used as library 28 real world open-source systems 4 for training, 24 for testing

Page 28: Graph-Based Code Completion

28

Empirical Evaluation

Page 29: Graph-Based Code Completion

29

My Observation

Planning to use semantic web technology Data and control dependency relationship

can be improved using semantic relationship like conceptual similarity

Matching of pattern is complex and error-prone, semantic score can be beneficial

Page 30: Graph-Based Code Completion

30

Thanks

Questions??