Refinement-Based Context-Sensitive Points-To Analysis for Java
-
Upload
jasmine-kaufman -
Category
Documents
-
view
26 -
download
0
description
Transcript of Refinement-Based Context-Sensitive Points-To Analysis for Java
1
Refinement-Based Context-Sensitive Points-To Analysis for Java
Manu Sridharan, Rastislav BodíkUC Berkeley
PLDI 2006
2
What Does Refinement Buy You?
Increased scalability: enable new clients• Memory: orders of magnitude savings• Time: answer for a variable comes back in 1 second• ) Suitable for IDE
0
10
20
30
40
50
Andersen Whaley/Lam 1-ObjSens(Milanova /
Lhotak)
Refine
Algorithm
% o
f ca
sts
pro
ved
saf
e
Precision:
Cast Safety Client
3
Approach: Focus on the Client
Demand-driven: only do requested work
Client-driven refinement: stop when client satisfied
Example:• client asks: “can x point to o?”
• we refine until we answer NO (the good answer) or we time out
4
Context-Sensitive Analysis Costly
Context-sensitive analysis (def):• Compute result as if all calls inlined
• But, collapse recursive methods
Exponential blowup (code growth)
5
Why Not Existing Technique?
Most analyses approximate same way in all code• E.g., k-CFA
• Precision lost, esp. for data structures
Our analysis focuses precision where it matters• Fully precise in the limit
• Only small amount of code analyzed precisely
• First refinement algorithm for Java
6
Points-To Analysis Overview
Compute objects each variable can point toFor each var x, points-to set pt(x)
Model objects with abstract locations1: x = new Foo() yields pt(x) = { o1 }
Flow-insensitive: statements in any order
7
Points-To Analysis as CFL-Reachability
1) Assignmentsx = new Obj(); // o1
y = new Obj(); // o2
z = x; o1 x
y
z
o2
a
b
pid retidd c
(1 )1
(2 )2
[f
[g
]f
2) Method callsid(p) { return p; } a = id(x);b = id(y);
3) Heap accessesc.f = x;c.g = y;d = c.f; pt(x) = { o | o flowsTo x }
flowsTo: balanced call and field parensflowsTo: balanced call parensflowsTo: path exists
8
Summary of Formulation
Graph represents program
Compute reachability with two filters• Language of balanced call parens
• Language of balanced field parens
9
Single path problem
Problem: show path is unbalancedGoal: reduce number of visited edgesInsight: enough to find one unbalanced paren
o xt0 t1 t2[f
(1 )1
[h
[f (1 )1 [h
t5)5
t6
(7
t8
t9t7
… …
…
]j[p )8
o2 t10
t11
t12 ]g]k
10
Approximation via Match Edges
Match edges connect matched field parens• From source of open to sink of close
• Initially, all pairs connected
Use match edges to skip subpaths
o t3t0 t1 t2[f
[g [h ]h
t4 x]j ]f
[f [g [h ]h ]j ]f
11
Refining the Approximation
Refine by removing some match edges• Exposes more of original path for checking
Soundness: Traverse match edge ) assume field parens balanced on skipped path
Remove where unbalanced parens expected• Explore deeper levels of pointer indirection
o t3t0 t1[f
[g
t4 x]j ]f
[f [g [h ]h ]j ]f
12
Refinement With Both Languages
o t5t0 t1 t2
(1 )1
[g ]g
t6 x]f
)3t3 t4[f
(2
Match edges enable approximation of calls• Only can check calls on match-free subpaths
Match edge removal ) more call checking • Key point: refine heap and calls together
Calls: (1 )1 (2 )3
Fields: [f [g ]g ]f
14
Experimental Configuration
Implemented in Soot framework
Tested on large benchmarks x 2 clients• SPECjvm98, Dacapo suite
• Downcast checking, factory method props
Refine context-insensitive result
Timeout for long-running queries
15
Precision: Cast Checking
0
10
20
30
40
50
60
70
80
90
100
com
press db ja
ckja
vac
jess
mpe
gaud
iom
trt
soot
-c
sable
cc-j
polyg
lotan
tlrbl
oat
char
t
jytho
npm
d ps
Benchmark
% o
f C
as
ts P
rov
ed
Sa
fe
1-ObjSens (Milanova / Lhotak) Refine
16
Scalability: Time and Memory
Average query time less than 1 second• Interactive performance (for IDE)
• At most 13 minutes for casts, 4 minutes for factory client
Very low memory usage: at most 35MB• Of this, 30MB for context-insensitive
result
• Compare with >2GB for 1-ObjSens analysis
17
Demand-Driven vs. Exhaustive
Demand advantage: no caching required• Hence, low memory overhead
• No engineering of efficient sets
• Good for changing code; just re-compute
Demand advantage: faster for many clients• Often only care about some variables
Demand disadvantage: slower querying all vars• At most 90 minutes for all app. vars
• But, still good precision, memory