Graph-Based Source Code Analysis of JavaScript Repositories
-
Upload
daniel-stein -
Category
Technology
-
view
424 -
download
0
Transcript of Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysisof JavaScript Repositories
Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems
Fault Tolerant Systems Research Group
Dániel SteinGábor Szárnyas
Content
1. Context
2. Tooling
3. Use Cases
4. Neo4j Observations
2
Continuous Integration (CI)
– Developers working together
– Prevent integration problems
– Examples
– Jenkins
– Hudson
– Travis CI
3
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
4
Apple,https://blog.codecentric.de/en/2014/02/curly-braces/
4
Apple,https://blog.codecentric.de/en/2014/02/curly-braces/
4
whoops
Apple,https://blog.codecentric.de/en/2014/02/curly-braces/
Static Analysis
– No need for compilation orexecution of the application
– Formatting, structural and semantic rule checking
– Can extend the workflow of continuous integration and improve it
– In this research we used codeanalysis utilizing patternmatching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
StaticAnalysis
Static Analysis
– No need for compilation orexecution of the application
– Formatting, structural and semantic rule checking
– Can extend the workflow of continuous integration and improve it
– In this research we used codeanalysis utilizing patternmatching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
StaticAnalysis
Static Analysis
– No need for compilation orexecution of the application
– Formatting, structural and semantic rule checking
– Can extend the workflow of continuous integration and improve it
– In this research we used codeanalysis utilizing patternmatching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
– Java
– FindBugs
– PMD
– CheckStyle
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
StaticAnalysis
Static Analysis
– No need for compilation orexecution of the application
– Formatting, structural and semantic rule checking
– Can extend the workflow of continuous integration and improve it
– In this research we used codeanalysis utilizing patternmatching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
– Java
– FindBugs
– PMD
– CheckStyle
– JavaScript
– ESLint
– Facebook Infer, Flow
– Tern
– TAJS
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
StaticAnalysis
– Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
– Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
– Temporary solution: batching
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
unit tests
static analyis
– Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
– Temporary solution: batching
Present results
as soon and as fast
as possible.
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
unit tests
static analyis
Problems to Solve
– Memory limits appear when...
– Global rules are checked
– Storing the structure in-memory
– For large code repositories
– Not being incremental
– Batched execution simplydoes not cut it
– Small change inducescomplete recheck
7
Our Approach
– Incremental methodology– Instead of batched execution
– Update the prepared results with theeffects of the change
– Only store the required parts in thememory
8
analyzer
Δ2.-1.1.
VCS Workspace Abstact SyntaxTree
Abstract SemanticGraph
Well-formednessRules
Query Execution Database
Main.js | ++----
Dependency.js | +++++-
FIterator.js | ----
Parser.js | ++
AutomaticWell-formedness
Rule Evaluation
Manual Executionand Data Extraction
Querying and Transformation
.
discoverer
ChangeProcessor.js
CommandParser.js
FileIterator.js
iterators
DepCollector.js
FileDiscoverer.js
InitIterator.js
Main.js
whitepages
ConnectionMgr.js
DependencyMgr.js
neo4jValidation Report
<!><?>
<.>
Module
declaration
declarators
items
binding init
left right
Architecture overview
9
VCS Workspace Abstact SyntaxTree
Abstract SemanticGraph
Well-formednessRules
Query Execution Database
Main.js | ++----
Dependency.js | +++++-
FIterator.js | ----
Parser.js | ++
AutomaticWell-formedness
Rule Evaluation
Manual Executionand Data Extraction
Querying and Transformation
.
discoverer
ChangeProcessor.js
CommandParser.js
FileIterator.js
iterators
DepCollector.js
FileDiscoverer.js
InitIterator.js
Main.js
whitepages
ConnectionMgr.js
DependencyMgr.js
neo4jValidation Report
<!><?>
<.>
Module
declaration
declarators
items
binding init
left right
Architecture overview
9
VCS Workspace Abstact SyntaxTree
Abstract SemanticGraph
Well-formednessRules
Query Execution Database
Main.js | ++----
Dependency.js | +++++-
FIterator.js | ----
Parser.js | ++
AutomaticWell-formedness
Rule Evaluation
Manual Executionand Data Extraction
Querying and Transformation
.
discoverer
ChangeProcessor.js
CommandParser.js
FileIterator.js
iterators
DepCollector.js
FileDiscoverer.js
InitIterator.js
Main.js
whitepages
ConnectionMgr.js
DependencyMgr.js
neo4jValidation Report
<!><?>
<.>
Module
declaration
declarators
items
binding init
left right
Architecture overview
9
VCS Workspace Abstact SyntaxTree
Abstract SemanticGraph
Well-formednessRules
Query Execution Database
Main.js | ++----
Dependency.js | +++++-
FIterator.js | ----
Parser.js | ++
AutomaticWell-formedness
Rule Evaluation
Manual Executionand Data Extraction
Querying and Transformation
.
discoverer
ChangeProcessor.js
CommandParser.js
FileIterator.js
iterators
DepCollector.js
FileDiscoverer.js
InitIterator.js
Main.js
whitepages
ConnectionMgr.js
DependencyMgr.js
neo4jValidation Report
<!><?>
<.>
Module
declaration
declarators
items
binding init
left right
Architecture overview
9
Code Processing Steps
20
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
21
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
22
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Sequence of statements
formalized in a given language
Code Processing Steps
23
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Sequence of statements
formalized in a given language
Code Processing Steps
24
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
25
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
token – the shortest character sequence still having meaning.
Code Processing Steps
26
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
token – the shortest character sequence still having meaning.
Code Processing Steps
27
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Token Token type
VAR (Keyword)
IDENTIFIER (Ident)
ASSIGN (Punctuator)
NUMBER (NumericLiteral)
DIV (Punctuator)
NUMBER (NumericLiteral)
token – the shortest character sequence still having meaning.
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LiteralNumericExpressionvalue = 1.0
LiteralNumericExpressionvalue = 0.0
declaration
declarators
items
binding init
left right
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Abstract Syntax Tree (AST)
– Tree representation of
– the grammar structure of
– the sequence of tokens.
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LiteralNumericExpressionvalue = 1.0
LiteralNumericExpressionvalue = 0.0
declaration
declarators
items
binding init
left right
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LiteralNumericExpressionvalue = 1.0
LiteralNumericExpressionvalue = 0.0
declaration
declarators
items
binding init
left right
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LiteralNumericExpressionvalue = 1.0
LiteralNumericExpressionvalue = 0.0
declaration
declarators
items
binding init
left right
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `
LiteralNumericExpressionvalue = 1.0
declaration
declarators
items
binding init
left right
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `
LiteralNumericExpressionvalue = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variablename = `foo`
Referenceaccessibility = `Write`
variables
references
children
Declarationkind = `Var`
declarations
node
astNode
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing StepsAbstract Semantic Graph(ASG)
– Graph, not necessarily tree.
– Semantic information besidesthe syntactic structure.
– Containscross-edges →
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `
LiteralNumericExpressionvalue = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variablename = `foo`
Referenceaccessibility = `Write`
variables
references
children
Declarationkind = `Var`
declarations
node
astNode
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `
LiteralNumericExpressionvalue = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variablename = `foo`
Referenceaccessibility = `Write`
variables
references
children
Declarationkind = `Var`
declarations
node
astNode
AST vs ASG
14
AST vs ASG
14
AST vs ASG
14
1SLOC
20-40-50nodes
Overview of the Approach
15
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és integrációs teszt
Kódanalízis
DevelopmentVersion ControlSystem
CompilationUnit and
IntegrationTests
StaticAnalysis
Overview of the Approach
16
Overview of the Approach
16
VersionControlSystem
IntegratedDevelopmentEnvironment
Git, Visual StudioCode
Overview of the Approach
16
VersionControlSystem
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git, Visual StudioCode ShapeSecurityShift
Overview of the Approach
16
VersionControlSystem
transformationIntegrated
DevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git, Visual StudioCode ShapeSecurityShift Java, Cypher
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Overview of the Approach
16
VersionControlSystem
transformationtransformation
graphdatabase
IntegratedDevelopmentEnvironment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
resultprocessing
resultprocessing
Git, Visual StudioCode ShapeSecurityShift Java, Cypher Neo4j
Graph Pattern Matching
17
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LNExpressionvalue = 1.0
LNExpressionvalue = 0.0
Graph Pattern Matching
– Graph pattern
– A declarative,
– graph-like formalism
– expressing constraints.
17
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LNExpressionvalue = 1.0
LNExpressionvalue = 0.0
Graph Pattern Matching
– Graph pattern
– A declarative,
– graph-like formalism
– expressing constraints.
17
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LNExpressionvalue = 1.0
LNExpressionvalue = 0.0
binding be
right
Graph Pattern Matching
– Graph pattern
– A declarative,
– graph-like formalism
– expressing constraints.
17
VariableDeclarator
BindingIdentifiername = `foo`
BinaryExpressionoperator = `Div`
LNExpressionvalue = 1.0
LNExpressionvalue = 0.0
binding be
right
Graph Pattern Matching
– Graph pattern
– A declarative,
– graph-like formalism
– expressing constraints.
17
BindingIdentifiername = `foo`
Graphpatternqueryexpressed in Cypherlookingforadivisionbyzero
binding
Resultsof thepatternmatching
Use Cases static analysis
– Searching for local badsmells (linter warnings)
– without a case
– value set more than once
– Not used variable
– Global rules– Unreachable code parts
– Framework
– Freely extendable
– User-defined rules
– Easier to use than visitorpattern solutions
18
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
if
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
if condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
if condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
if
statement
condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
error
if
statement
condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
statement
condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
statement
condition
Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possiblestatement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
return
statement
condition
error
Use Cases test generation
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
– Producing test input
for dynamic testing
20
statement
statement
statement statement
statement
if
return
condition
statement
Use Cases type inference
– Supporting dynamically typed languages
– Python
– JavaScript / ECMAScript
21
Use Cases type inference
– Supporting dynamically typed languages
– Python
– JavaScript / ECMAScript
21
http://marijnhaverbeke.nl/blog/tern.html
Use Cases impact analysis
– Adapting to the continuous integration workflow
– Handling multiple branches
– Following the modifications in a branch
– File-level incremental granularity
– Giving differential reports to the developers
22
Why Neo4j?+++
– Quick prototyping
– Supporting transactions
– Great tooling
--
– Not scaling well
– Only disk-based
23
Remarks MERGE
– MATCH or CREATE
– Great for the lazy
– Can be expensive
– Possible solutions:
– Less MERGE
– Separating queries
– Create first if not present
– Use MATCH instead of MERGE
– Prevention
– Prepare the structure when
inserting the data
24
Remarks MERGE
25
3 1
Remarks if-then-else
– Not a language element in
Cypher
– Can be solved with a trick
– Verrrrrry sloww
– Solution:
– Two smaller, disjunct cases
26
Remarks if-then-else
– Not a language element in
Cypher
– Can be solved with a trick
– Verrrrrry sloww
– Solution:
– Two smaller, disjunct cases
26
Remarks if-then-else
27
Remarks if-then-else
28
Remarks if-then-else
28
Remarks if-then-else
28
Remarks if-then-else
28
Remarks if-then-else
28∞ vs 15 sec
Remarks if-then-else
28∞ vs 15 sec
These are not chickens.
Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
Conclusions
– Source code analyzerframework
– Searching for global errorpatterns
– Close to real time feedback
– Type inference possible
– Test input generation possible
– Approach for both dynamicallyand statically typed languages
– Using Neo4j for
– Storing
– Pattern matching
– Transforming
– Version control
– Storing metadata
30
– Our work was supported by:
– ÚNKP*
– Microsoft Azure for Research
– MTA-BME Lendület Program
Project Details
– The frameworkprototype is open-source.
https://github.com/
ftsrg/codemodel-rifle
31
*Supported by the ÚNKP-16-2-I. New National Excellence Program of the Ministry of Human Capacities.
Project Details
– Supervisors
– Ádám Lippai
– Dávid Honfi
– Gábor Szárnyas
– Helped my research
– Tamás Soma Lucz
– Industrial case study
32