Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability
-
Upload
davis-hebert -
Category
Documents
-
view
58 -
download
0
description
Transcript of Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability
![Page 1: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/1.jpg)
Context-Sensitive, Interprocedural Dataflow
Analysis as CFL ReachabilitySeth Hallem and Eric Watkins
![Page 2: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/2.jpg)
Exhaustive Analysis Papers
• “Precise Interprocedural Dataflow Analysis via Graph Reachability”– Reps, Horowitz, Sagiv -- POPL 1995
– applies CFL reachability to context-sensitive, interprocedural dataflow analysis
• “Program Analysis via Graph Reachability”– Reps -- ILP 1997
– describes two additional applications: interprocedural program slicing and shape analysis
![Page 3: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/3.jpg)
The Reduction to CFL Reachability
• Question 1: What problems can we solve?
• Question 2: How do we set up the problem?
• Question 3: How do we solve the problem?
• Question 4: What is the complexity of this approach?
• Running example: possibly uninitialized variables
![Page 4: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/4.jpg)
What problems can we solve?
• IFDS problems– Finite set of dataflow facts (D)
– Mapping from functions ƒ:2D2D to edges in the CFG
– Each ƒ is distributive wrt the meet operator:• ƒ(a b) = ƒ(a) ƒ(b)
• Possibly uninitialized vars:– Each program variable corresponds to a dataflow fact.
When that fact holds, the variable may be uninitialized.
– Transfer functions: a variable is uninitialized if it was just declared or if it is assigned an expression containing uninitialized variables.
![Page 5: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/5.jpg)
Simple Exampleint z;
int main (void) {
int x ,y = 0; /* {x, z} */
y = y + x; /* {x, y, z} */
z = 0; /* {x, y} */
}
• D = {x, y, z}, domain/range of transfer functions is the power set of D (2D)
![Page 6: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/6.jpg)
How do we setup and solve IFDS problems?
• Inputs to the algorithm:– Exploded supergraph (next couple of slides)
• Outputs from the algorithm:– meet-over-all-realizable-paths solution:
• MRPn = pfq( )qRpaths (startmain, n)
![Page 7: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/7.jpg)
The Supergraph
![Page 8: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/8.jpg)
Representation Relations
• Each dataflow function, ƒ, is converted to a representation relation, which is represented as a graph consisting of 2D + 2 nodes– D input nodes, one for each dataflow fact, plus the node
(or 0), which corresponds to the empty set.
– D output nodes plus the node – There is an edge from input node d1 to output node d2 if
d2 ƒ(S) if d1S and d2 ƒ()
![Page 9: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/9.jpg)
More Representation Relations
• (a) and (b) show representation relations for two functions (nodes smain and n1)
• (c) and (d) show two ways to compose these relations– (d) illustrates the need for the in each relation
![Page 10: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/10.jpg)
Exploding the Supergraph
![Page 11: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/11.jpg)
CFL Reachability
• Want to solve the dataflow problem with a reachability query on the exploded supergraph.
• Not all paths in G# are valid, though. Must match calls w/returns.
• Insight: context-sensitivity = matching parens; language of matching parens is a CFL
![Page 12: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/12.jpg)
Context-Sensitivity = CFL
• Assign a unique index to each callsite, define a CFL of matching calls and returns.
• Suppose we have two call-sites to function P(), which we label i and k– (i (k )k )i is a valid path
– (i (k )k is a valid path
– (i (k )i is not
![Page 13: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/13.jpg)
Reachability Algorithm
• Dynamic programming is the key– Start at the entry point to the program. Follow the
edges in G#, recording what dataflow facts we can reach.
– At a procedure call, follow the call. To avoid re-doing any work, though, maintain a cache of edges of that summarize pieces of the computation.
– Summary edges record the results of an entire procedure, start at a callsite, end at the corresponding return-site.
– Path edges record the suffix of a valid path.
![Page 14: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/14.jpg)
Dynamic Programming Details
![Page 15: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/15.jpg)
Complexity
• Worst case for general CFL reachability is cubic in the number of nodes in the graph
• Can do better for dataflow analysis: O(ED3) for any distributive problem, O(Call D3 + hED2) for h-sparse problems– possibly uninitialized variables is 2-sparse when
aliasing is ignored: a variable’s status as initialized or uninitialized can only affect itself and one other variable (if it is assigned to that variable)
![Page 16: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/16.jpg)
Other Applications• Interprocedural slicing
– identify all pieces of a program relevant to a particular statement
• Shape Analysis
– For any DAG data structure, determines a superset of the possible shapes for that data structure.
– Each dataflow fact corresponds to a single possible shape.
– Problem: infinite number of shapes. Solution is to define shape at program point q in terms of shape at previous program points.
– ILP paper has an example of shape analysis of a linked list.
![Page 17: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/17.jpg)
The other papers
• “Demand Interprocedural Dataflow Analysis”– Horowitz, Reps, Sagiv -- FSE 1995
• “Demand-driven Computation of Interprocedural Data Flow”– Duesterwald, Gupta, Soffa -- POPL 1995
• Provide two possible frameworks for transforming any IFDS analysis into a demand-driven analysis
![Page 18: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/18.jpg)
Steps to Demand-driven analysis
• Define problem in the IFDS framework
• Reverse the flow functions, or reverse the flow edges
• Start with initial query < d, n >
• Propagate the query backwards until solved
![Page 19: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/19.jpg)
Reversing dataflow
• In Duesterwald et al., the dataflow problem is specified with flow functions– Reverse the functions
• For CFL problems, the problem is represented as a set of edges– Just reverse the edges
![Page 20: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/20.jpg)
Example: CCPNotation
• x – set of dataflow facts
• xw – dataflow fact for variable w
• fn(x)w – transfer fn for variable w at node n
• [w = c] – set of dataflow facts, where the fact for variable w equals c
![Page 21: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/21.jpg)
Query Algorithm
• Worklist holds the set of outstanding queries
• While not empty, remove a query
• Propagate backwards one node in the flowgraph
• For a function call, create a backwards summary for that function and apply that
![Page 22: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/22.jpg)
Query Propagation
More notation• rp – entry node for
procedure p• m, n – normal nodes• fm – reverse dataflow fn
for node m• Ncall – all nodes that are
callsites• call(m) – the procedure
called at node m• (rp, ep) – summary fn
for procedure p
![Page 23: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/23.jpg)
Backwards edge propagation
![Page 24: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/24.jpg)
Query Algorithm Efficiency
• Optimizations: function summaries, early termination, query result cache
• In the worst case, it’s the same as exhaustive analysis
• Some problems work better than others for demand-driven analysis.– Depends how much information you need to answer
queries, or how many queries need to be made.
![Page 25: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/25.jpg)
Conclusions
• Demand-driven analysis is a powerful idea
• Saves time and space, but in the worst case it’s no better than exhaustive analysis
• Only works for distributive problems
• Two approaches for demand-driven analysis are equivalent
![Page 26: Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability](https://reader033.fdocuments.us/reader033/viewer/2022052414/56812d87550346895d9299a4/html5/thumbnails/26.jpg)
Discussion
• Are these algorithms generally applicable?• Are they fast?
– No evidence the papers, but the answer is yes (see ESP in a couple of weeks)
• Why are they efficient (beyond the complexity guarantee)?
• Is it always cheap to compute the exploded supergraph?– How can an imprecise alias analysis influence this step
and the overall performance of the algorithm?