An Algorithm for Keyword Search on an Execution Path
-
Upload
kamiya-toshihiro -
Category
Technology
-
view
308 -
download
1
description
Transcript of An Algorithm for Keyword Search on an Execution Path
An Algorithm for Keyword Search
on an Execution Path
Toshihiro KamiyaFuture University Hakodate
CSMR-WCRE-2014 Era Track 2
Developers do search! ➤ To find reusable components for a function of a product
➤ To find similar code fragments before modifying a code
➤ To find code samples showing usage a given class or component
Background #1: Code searching
CSMR-WCRE-2014 Era Track 3
Background #2: Emerging fine-grained module technologies
More and more fine-grained modules are used.● Object/Closure
extract a data and its manipulation● Aspect
extract interests, a set of code invoked by a specific condition or event
● Dependency Injectionsplit code at each dependency
CSMR-WCRE-2014 Era Track 4
Problem: Searching on fine-grained modules
Code search becomes difficult by fine-grained modules
(Old days) the search result was contained in a file
↓
(Now) is a set of several parts of several files
This affects code-search methods in both● Algorithm
– “how to find”● Displaying/Visualizing
– “how to show search results”
Old days
Now
CSMR-WCRE-2014 Era Track 5
Solution: Keyword Search on an Execution Path
● Static analysis● Find the execution paths that include given keywords
● From all possible execution paths of a target program● Idea: a compact data structure (And/Or/Call graph) of
execution paths + search algorithm on it● A prototype implementation
● applied to up-to 183k lines of Java source code
Related work● Prospector[8]● PARSEWeb[9]
CSMR-WCRE-2014 Era Track 6
And/Or/Call Graph
● A DAG contains all execution paths in a compact form
● is generated by the following translation rules
– Sequence structure And node➡– Selection structure Or node➡– Repetitive structure
Selection among sequences ➡of 0-time repetition, 1-time repetition,2-times repetition, ...
Or node having And nodes as ➡children
– Method call Call node➡● Dynamic dispatching
s1
s2
s3
s1;s2;s3;
st
se
if (...) { st;} else { se;}
interface I { m(); }
class B implements I { m() {...}}class C implements I { m() {...}}
I i;...i.m();
B//m
C//m
(s1 ∧ s2 ∧ s3)
(st ∨ se)
(B//m { }∨ C//m { })
Source code Graphical form Textual expression
(s1 · s2 · s3)
or
CSMR-WCRE-2014 Era Track 7
Example
行 1 行 2 行 3 行 40
2
4
6
8
10
12
列 1
列 2
列 3
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
CSMR-WCRE-2014 Era Track 8
Example
行 1 行 2 行 3 行 40
2
4
6
8
10
12
列 1
列 2
列 3
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
CSMR-WCRE-2014 Era Track 9
Example
行 1 行 2 行 3 行 40
2
4
6
8
10
12
列 1
列 2
列 3
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
CSMR-WCRE-2014 Era Track 10
Search Algorithm
● Input: Keywords to identify nodes● Output: Connected sub-graphs including the
nodes identified with the keywords“connected sub-graph” → continuous execution path
● Heuristics– Find deepest nodes
← Assumption: small operation is easy to understand– Extract shallowest sub-graph(treecut)
← Assumption: deep method-invocation chain is difficult to understand
CSMR-WCRE-2014 Era Track 11
Label and Summary
Label/Summary are “index” data of search algorithm.● Label
– A set of names put on a node– Keywords in a query
● Summary– A node n’s summary S(n) is a set of
names of (child and) descendant nodes of n.
Properties– For any node n and its any child node c S(n) ⊇ S(c).
– A root node has a summary of local maximum.
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
CSMR-WCRE-2014 Era Track 12
Label and Summary
Label/Summary are “index” data of search algorithm.● Label
– A set of names put on a node– Keywords in a query
● Summary– A node n’s summary S(n) is a set of
names of (child and) descendant nodes of n.
Properties– For any node n and its any child node c S(n) ⊇ S(c).
– A root node has a summary of local maximum.
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
summary
CSMR-WCRE-2014 Era Track 13
Label and Summary
Label/Summary are “index” data of search algorithm.● Label
– A set of names put on a node– Keywords in a query
● Summary– A node n’s summary S(n) is a set of
names of (child and) descendant nodes of n.
Properties– For any node n and its any child node c S(n) ⊇ S(c).
– A root node has a summary of local maximum.
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
{ “Calendar//getInstance”,“Calendar//set”,“split”, “parseInt” }
summary
CSMR-WCRE-2014 Era Track 14
Label and Summary
Label/Summary are “index” data of search algorithm.● Label
– A set of names put on a node– Keywords in a query
● Summary– A node n’s summary S(n) is a set of
names of (child and) descendant nodes of n.
Properties– For any node n and its any child node c S(n) ⊇ S(c).
– A root node has a summary of local maximum.
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
{ “Calendar//getInstance”, “Calendar//get”,“Calendar//set”, “getDay”, “getDayOfWeek”,“split”, “parseInt”, “printf” }
summary
CSMR-WCRE-2014 Era Track 15
Steps of search algorithm
(S1) finds query-fulfilling sub-trees of the (local) maximum depths– by comparing summary of each node with the query
(S2) makes the shallowest treecut– by removing deeper leaf nodes until the treecut
does not fulfill the query anymore.
(S3) removes uncontributing leaf nodes– Uncontributing = its label does not match any of the
query keywords
CSMR-WCRE-2014 Era Track 16
Example
Query { “Calender//get”,“Calender//set” }
(S1) finds query-fulfilling sub-trees of the (local) maximum depths
(S2) makes the shallowest treecut
(S3) removes uncontributing leaf nodes
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
CSMR-WCRE-2014 Era Track 17
Example
Query { “Calender//get”,“Calender//set” }
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
{ “Calendar//getInstance”, “Calendar//get”,“Calendar//set”, “getDay”, “getDayOfWeek”,“split”, “parseInt”, “printf” }
(S1) finds query-fulfilling sub-trees of the (local) maximum depths
(S2) makes the shallowest treecut
(S3) removes uncontributing leaf nodes
CSMR-WCRE-2014 Era Track 18
Example
Query { “Calender//get”,“Calender//set” }
main
Calendar//getIntance
split
Calendar//set
Calender//get
printf
getDay
getToday
getDayOfWeek
parseIntparseIntparseInt
Calendar//getIntance
(S1) finds query-fulfilling sub-trees of the (local) maximum depths
(S2) makes the shallowest treecut
(S3) removes uncontributing leaf nodes
CSMR-WCRE-2014 Era Track 19
mainCalendar//set
Calender//get
getDay
getDayOfWeek
Example
Query { “Calender//get”,“Calender//set” }
main { getDay { Calendar//set } getDayOfWeek { Calendar//get }}
Search result
(S1) finds query-fulfilling sub-trees of the (local) maximum depths
(S2) makes the shallowest treecut in each of the sub-trees
(S3) removes uncontributing leaf nodes
CSMR-WCRE-2014 Era Track 20
Prototype tool
Implementation● Target: Java source
code– Analysis of Java's
dynamic dispatch● Written in 8k lines of
Python● Applied up-to 183kloc
product (jEdit)
Limitations● Keywords
– Names of class or method– Text in string literal
● Exception handling– Does not search in the
execution paths that throw● Entry points
– main() and static initializers– Does not search for entry
points such as @Test
CSMR-WCRE-2014 Era Track 21
Java class files(bytecode)
Line number table
And/Or/Call graph
Node summary
Type hierarchy
Method signature
Method callsControl flow
And/Or/Call graphof method body
Method-body analysis
Dynamic-dispatch resolver
Dynamic-dispatch analysis
Keyword-query search
Sub-graph /Execution path
Formatting
Search result
Whole-program graph building
Inde
Sea
rchi
ng
Node summary building
Node label
Query
CSMR-WCRE-2014 Era Track 22
Applied to jEdit
● H/W– CPU Xeon E5520 2.27GHz– 32GiB mem.
● Indexing – 48.8 sec. in elapsed time– 644 MiB peak mem.
● Searching– 3.09 72.2 (ave. 5.71) ∼
sec. in elapsed time– up-to 1412 MiB peak mem.
CSMR-WCRE-2014 Era Track 23
Summary
● Background– #1: Code searching– #2: Emerging of fine-grained module technologies
● Problem: Searching on fine-grained modules● Solution: Keyword search on an execution Path
– And/Or/Call graph, Label/summary– Search algorithm
● Prototype implementationApplied to jEdit
● GitHub– https://github.com/tos-kamiya/agoat/