Graph-Based Code Completion
-
Upload
masud-rahman -
Category
Education
-
view
149 -
download
0
Transcript of Graph-Based Code Completion
Graph-Based Pattern-Oriented, Context-Sensitive Source Code Completion
Nguyen, T.T. ; Nguyen, H.A. ; Tamrawi, A. ; Nguyen, H.V. ; Al-Kofahi, J. ; Nguyen, T.N.
Presented By: Mohammad Masudur Rahman
2
Contents
Code Completion Thesis Statement Motivating Example Terminologies Methodology Empirical Evaluation & Results My Observation & Future Thoughts
3
Code Completion
Built-in feature of modern all IDEs Speed up development Longer Identifier names for program comprehension Less overhead for developers Mostly single variable, method supports- API
packages Template based support – control structure, event
handling and others
4
Thesis Statement
Novel approach with graph-based code completion
Graph based feature extracting, searching, ranking of API usage pattern, matching with editing context of current code.
Empirical evaluation shows correctness and usefulness- 95% precision, 92% recall, 93% f-score over 24 real world systems
5
Motivating Example (Single-line)
Fig 1: Current State of Code Completion (Eclipse 3.6)
6
Motivating Example (Multi-line)
Fig 2: SWT Usage Example
7
Motivating Example (Query)
Fig 3: SWT Query Example
8
Terminologies
GRAPACC API Usage Pattern Groum Based Model Context-sensitive Weight
9
GRAPACC
Graph-Based Pattern-Oriented Context-Sensitive Code Completion
10
API Usage Pattern
Fig 4: SWT API Usage
11
Groum Based Model
Fig 5: Groum Conversion
12
Context-Sensitive Weight
)1(
1)(
d
qw f
Wf (q)=Context-sensitive weight of feature q
q= feature of Query, Q
d=distance to the closest token in Groum Model
13
Methodology
Query Processing and Feature Extraction Pattern Managing, Searching and Ranking Pattern Oriented Code Completion
14
Query Processing and Feature Extraction
Tokenizing Partial Parsing Groum Building Feature Extracting and Weighting
15
Tokenizing, Partial Parsing
Lexical analysis Preserves keywords related to control
structure, rest are removed elsewhere but saved
Eclipse java parser PPA tool returns AST (Abstract Syntax
Tree) Unresolved nodes assigned ‘Unknown Type’
16
Groum Building
Groum from AST Unresolved nodes are
discarded but considered as tokens
Query converted to the following Groum
Fig 6: Groum of Query
17
Feature Extraction & Weighting
Groum nodes mapped to tokens in tokenization step
Feature extracted from Groum for path, L<=3 3 factors contribute to feature weight Structured based factor (size) Structured based factor (centrality) User based factor
18
Feature Extraction & Weighting
ws(q)= size based weight for feature, q of Query, Q (w(q)=1+size(q); 1<= size(q)<=3)
wc(q)= Centrality based weight for feature, q of Query, Q (wc(q)=n / s, n=no of neighbors, s=size)
(wf(q)=1/(d+1)), distance between focus node and the closest token in feature path Groum Model
19
Feature Extraction & Weighting
w(q)= total weight for feature, q of Query, Q ws(q)= size based weight for feature, q of Query, Q
wc(q)= Centrality based weight for feature, q of Query, Q
wf(q)= used based weight for feature, q of Query, Q
20
Pattern Managing, Searching and Ranking
Pr(P) is popularity of pattern P = frequency of Pattern P
Weight of feature p in Pattern P using inverse indexing
Np,P=occurrence of feature p in P, NP=total no of features in P
Np=No of patterns containing p, N=total no of pattern in database
21
Pattern Managing, Searching and Ranking
For each feature p, L(p), a list of patterns from which p can be extracted
p for pattern feature, q for query feature Now sim(p,q)>∂,then p is added to F, set of mapped
features for q For each pєF, top n ranked patterns from L(p) is
added to C, candidate patterns for relevance computation
Now for each P in C, compute fit(P,Q)
22
Feature Similarity
is a name-based similarity between two features given that feature is a collection of labels and has the formOf X.Y.Z where X=package nameY=class nameZ=method name
23
Name-based Similarity (nsim)
wsim(X, X’) is word-based similarity X, X’ are broken down and two sequence of words
L(x) and L(y) Similarity computed as Lo/Lm
Lo is length of LCS, Lm is average length of two sequences
=
24
Pattern Matching (Relevance)
25
Pattern Matching
SM(P,Q)=total weight of Matched feature pair
Fit (P, Q)=Relevance degree between P and Q
Pr(P)=Popularity of Pattern P
26
Pattern Oriented Code Completion
Matched pattern is selected and corresponding node in Groum is matched
The missing nodes are fulfilled with code
27
Empirical Evaluation
Precision Recall F-score java.io, java.util :API used as library 28 real world open-source systems 4 for training, 24 for testing
28
Empirical Evaluation
29
My Observation
Planning to use semantic web technology Data and control dependency relationship
can be improved using semantic relationship like conceptual similarity
Matching of pattern is complex and error-prone, semantic score can be beneficial
30
Thanks
Questions??