Graph-Based Code Completion

Graph-Based Pattern-Oriented, Context-Sensitive Source Code Completion

Nguyen, T.T. ; Nguyen, H.A. ; Tamrawi, A. ; Nguyen, H.V. ; Al-Kofahi, J. ; Nguyen, T.N.

Presented By: Mohammad Masudur Rahman

2

Contents

Code Completion Thesis Statement Motivating Example Terminologies Methodology Empirical Evaluation & Results My Observation & Future Thoughts

3

Code Completion

Built-in feature of modern all IDEs Speed up development Longer Identifier names for program comprehension Less overhead for developers Mostly single variable, method supports- API

packages Template based support – control structure, event

handling and others

4

Thesis Statement

Novel approach with graph-based code completion

Graph based feature extracting, searching, ranking of API usage pattern, matching with editing context of current code.

Empirical evaluation shows correctness and usefulness- 95% precision, 92% recall, 93% f-score over 24 real world systems

5

Motivating Example (Single-line)

Fig 1: Current State of Code Completion (Eclipse 3.6)

6

Motivating Example (Multi-line)

Fig 2: SWT Usage Example

7

Motivating Example (Query)

Fig 3: SWT Query Example

8

Terminologies

GRAPACC API Usage Pattern Groum Based Model Context-sensitive Weight

9

GRAPACC

Graph-Based Pattern-Oriented Context-Sensitive Code Completion

10

API Usage Pattern

Fig 4: SWT API Usage

11

Groum Based Model

Fig 5: Groum Conversion

12

Context-Sensitive Weight

)1(

1)(

d

qw f

Wf (q)=Context-sensitive weight of feature q

q= feature of Query, Q

d=distance to the closest token in Groum Model

13

Methodology

Query Processing and Feature Extraction Pattern Managing, Searching and Ranking Pattern Oriented Code Completion

14

Query Processing and Feature Extraction

Tokenizing Partial Parsing Groum Building Feature Extracting and Weighting

15

Tokenizing, Partial Parsing

Lexical analysis Preserves keywords related to control

structure, rest are removed elsewhere but saved

Eclipse java parser PPA tool returns AST (Abstract Syntax

Tree) Unresolved nodes assigned ‘Unknown Type’

16

Groum Building

Groum from AST Unresolved nodes are

discarded but considered as tokens

Query converted to the following Groum

Fig 6: Groum of Query

17

Feature Extraction & Weighting

Groum nodes mapped to tokens in tokenization step

Feature extracted from Groum for path, L<=3 3 factors contribute to feature weight Structured based factor (size) Structured based factor (centrality) User based factor

18


ws(q)= size based weight for feature, q of Query, Q (w(q)=1+size(q); 1<= size(q)<=3)

wc(q)= Centrality based weight for feature, q of Query, Q (wc(q)=n / s, n=no of neighbors, s=size)

(wf(q)=1/(d+1)), distance between focus node and the closest token in feature path Groum Model

19


w(q)= total weight for feature, q of Query, Q ws(q)= size based weight for feature, q of Query, Q

wc(q)= Centrality based weight for feature, q of Query, Q

wf(q)= used based weight for feature, q of Query, Q

20

Pattern Managing, Searching and Ranking

Pr(P) is popularity of pattern P = frequency of Pattern P

Weight of feature p in Pattern P using inverse indexing

Np,P=occurrence of feature p in P, NP=total no of features in P

Np=No of patterns containing p, N=total no of pattern in database

21

Pattern Managing, Searching and Ranking

For each feature p, L(p), a list of patterns from which p can be extracted

p for pattern feature, q for query feature Now sim(p,q)>∂,then p is added to F, set of mapped

features for q For each pєF, top n ranked patterns from L(p) is

added to C, candidate patterns for relevance computation

Now for each P in C, compute fit(P,Q)

22

Feature Similarity

is a name-based similarity between two features given that feature is a collection of labels and has the formOf X.Y.Z where X=package nameY=class nameZ=method name

23

Name-based Similarity (nsim)

wsim(X, X’) is word-based similarity X, X’ are broken down and two sequence of words

L(x) and L(y) Similarity computed as Lo/Lm

Lo is length of LCS, Lm is average length of two sequences

=

24

Pattern Matching (Relevance)

25

Pattern Matching

SM(P,Q)=total weight of Matched feature pair

Fit (P, Q)=Relevance degree between P and Q

Pr(P)=Popularity of Pattern P

26

Pattern Oriented Code Completion

Matched pattern is selected and corresponding node in Groum is matched

The missing nodes are fulfilled with code

27

Empirical Evaluation

Precision Recall F-score java.io, java.util :API used as library 28 real world open-source systems 4 for training, 24 for testing

28

Empirical Evaluation

29

My Observation

Planning to use semantic web technology Data and control dependency relationship

can be improved using semantic relationship like conceptual similarity

Matching of pattern is complex and error-prone, semantic score can be beneficial

30

Thanks

Questions??

Graph-Based Code Completion

Education

Transcript of Graph-Based Code Completion