Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf ·...
Transcript of Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf ·...
CRPC
Interprocedural Compilation:Algorithms and Applications
Ken KennedyCenter for Research on Parallel Computation
Rice University
1. Why Should You Care? 2. Why Is It Hard? 3. Solution Methods
an example: linear-time flow-insensitive analysis 4. Whole-Program Management 5. New Compiler Architectures
http://www.cs.rice.edu/~ken/Presentations/Interprocedural.pdf
CRPC
Why Should You Care?
• Performance is important- Even in Java
• Machines are becoming more complicated- performance penalties for mistakes are much larger- examples:
memory hierarchy management automatic parallelization insertion of communication
• Languages are becoming more Complex- Fortran 90, C++, Java, HPF are more complex to compile- examples:
type analysis in C++ and Java distribution analysis in HPF
CRPC
Interprocedural Distribution Analysis in HPF
• HPF allows distributions to be inherited from callers- support for libraries
SUBROUTINE FOO(A,N)
REAL A(N)
!HPF$ INHERIT A
...
A(2:N-1) = (A(1:N-2) + A(3:N))/2.0
What is the distribution of A here?
CRPC
Impact of Inlining in Java
CRPC
Programming in the Future
• Challenges- programming is hard- professional programmers are in short supply- high performance will continue to be important
• A Solution: Make the End User a Programmer- professional programmers develop components- users integrate components using:
problem-solving environments (PSEs) scripting languages (possibly graphical)
examples: Tcl, Visual Basic, AVS, Khoros
• Compilation for High Performance- translate scripts and components to common intermediate language- optimize the resulting program using interprocedural methods
CRPC
Script-Based Programming System
Script
IntermediateCode
TargetMachine 1
TargetMachine 2
TargetMachine 3
Portable ICCompiler
TranslationSystem
Whole-SystemCompiler
Native ICCompiler
ProgramComponent
ProgramComponent
CRPC
Why Is It Hard?
• Problems can be large- in some cases, thousands of procedures and call sites- simply reading all program components may be too expensive to do in
any single step fear of recompiling "the world"
• Compilation time must be manageable- compilation algorithms should be linear or near-linear in the size of the
whole program- recompilation after small change should not take time proportional to
the size of the entire program
• Information may be difficult to collect- when do you know the composition of the entire program?
CRPC
MOD and REF
COMMON X, Y...DO I = 1, N
s0
CALL SX(I) = X(I) + Y(I)
ENDDODoes this vectorize?
X∉REF(s0) and X∉MOD(s
0) and Y∉MOD(s
0)
Yes, if
"reference" side effect set "modification" side effect set
CRPC
ALIAS
SUBROUTINE S(A,X,N)REAL A(*), X, YCOMMON YDO 100 I = 1, N
s0
X = X + Y*A(I)100 CONTINUE
END
CALL S(A,Y,N)
in a register?
but what if the call is:
register possible only if Y∉ALIAS(S,X)
CRPC
USE
DO I = 1, Ns0
CALL S(T,A) T = X(I)*C A(I) = T + B(I)
ENDDO
PARALLEL DO I = 1, NLOCAL tCALL S(t,A)t = X(I)*CA(I) = t + B(I)
ENDDO
Is there an upwards exposed use?
If not, we may parallelize the loop as follows, assuming T is notlive on exit:
One copy per iteration
T∉ USE(s0)
"Privatization"
CRPC
KILL
DO I = 1, Ns0
CALL INIT(T,I)T = T + B(I)A(I) = A(I) + T
ENDDO
Can we privatize T?
Yes, if there is no upwards exposed use in INIT and there is anassignment to T on every path through INIT.
T∉USE(s0) and T∈KILL(s
0)
CRPC
Constant Propagation
SUBROUTINE S(A,B,N,IS,I1)REAL A(*), B(*)DO I = 0, N-1
10 A(IS*I + I1) = A(IS*I + I1) + B(I+1)ENDDO
END Vectorizable or a reduction?
The code is vectorizable if the compiler can show that thevalue of IS is a constant not equal to zero, (e.g. IS = 1).
IS∈CONST(S) and valin(IS,S) = 1
CRPC
Problem Classification
• May vs Must (Barth 78)- May: MOD- Must: KILL- not a deep distinction—
the complement of a May problem is a Must problem
• Flow Sensitive vs Flow Insensitive (Banning 79)- Insensitive: MOD- Sensitive: KILL- a deep distinction
• Propagation vs Side Effect- based on the direction of data flow- Side Effect: MOD- Propagation: CONST
CRPC
Problem Classification Matrix
Propagation Side Effect
Insensitive
Sensitive
ALIAS1
CONST
MODREF
USEKILL
1with no pointer variables
CRPC
Call Graph Construction
SUBROUTINE S(X,P)s0
CALL P(X)RETURN
END
What can be called here?
Answer: any procedure in CALL(s0)
To build the call graph, insert an edge between the nodefor procedure S and the node for each procedure P inCALL(s) for each call site s in S.
CRPC
Interprocedural Type Analysis
class Foo{private int x = 0;public void inc(){
x++;}public void useGoo(Goo goo){
for(int i = 0; i<10; i++){goo.dec();x--;inc();
}}
}
The Goo used here wasinstantiated as an objectof which class?
This is a direct analog of flow-sensitive call graph construction
CRPC
Some Results
• Flow Insensitive Problems- precise solutions in O((N+E)V) (Cooper and Kennedy 84,88,89)
N nodes and E edges in call graph V variables in the program
• Flow Sensitive Problems- precise solutions intractable (Myers 80)- approximate solutions in near-linear time (Callahan, Cooper, Kennedy
Torczon 86, Callahan 88, numerous others)
• Call Graph Construction- precise solutions in exponential time (Ryder 79, Callahan, Carle, Hall,
Kennedy 90) manageable in practice
- approximate solutions in linear time (Hall, Kennedy 92)
CRPC
A Sample Problem: MOD
• Subdivide the problem into ones that can be managed
MOD
Alias-FreeMOD
ALIASAnalysis
ParameterPropagation Reachability
ALIASIntegration
PairPropagation
MappingAnalysis
CRPC
Solving Alias-Free MOD
GMOD(p): the set of variables that may be modified as a sideeffect due to the invocation of procedure p.
indirect assignment: x mapped to y∈GMOD(q)
GMOD(p) = IMOD(p) ∪ ∪ {z such that z maps to y∈GMOD(q)}
procedure p(x)...x = expr...call q(x)
end
direct assignment: x∈IMOD(p)
s=(p,q)
View it as a data-flow problem on the call graph.
Note: MOD(s) is the set of variables that map to GMOD(p), where p is called at s
CRPC
Bad News about GMOD
Problem is neither fast nor rapid—fast data flow methodscannot be used.
procedure foo(f0,f1,...,fn)...f0 = expr...call foo(f1,f2,...,fn,y)
end
What is in GMOD(foo)?
"shift-register effect"
Assumption: maximum number of parameters to anyprocedure does not grow with the size of the program, i.e.,there is a constant upper bound.
CRPC
Subdividing the Problem
GMOD(p) = IMOD+(p) ∪ ∪ GMOD(q) ∩ ¬LOCALs=(p,q)
Separate global variables from reference formal parameters.
IMOD+(p): the set of variables in p that are modified in p eitherdirectly or indirectly as a side effect of being passed byreference at a call site.
This can be solved by a reachability algorithm—a variable is inGMOD(p) if it is in IMOD+(p) or there is a call chain to q and it isin IMOD+(q).
CRPC
Computing IMOD+
1. Initially let IMOD+(p) = IMOD(p) for each p in theprogram.2. For each p in the program and for each formalparameter f of p, if f is in IMOD+(p) put f on the worklist W.3. While W is non-empty, take an arbitrary formalparameter f of p from it.
a. for each call site (q,p), if the variable x mapped to fat the call site is not in IMOD+(q), add it and,b. if x is a formal parameter of q, put it on W.
Linear because number of parameters is no more than aconstant factor larger than the size of the call graph.
CRPC
Integrating Aliases
• Initially MOD contains direct modifications- must add to MOD(s) any variable that may be aliased to a variable in
MOD(s) in the procedure containing s
• ALIAS(x,p):- every variable that may be aliased to x on entry to procedure p- can be computed in O((N+E)V) time
• How do we integrate ALIAS with MOD?- must be done in O((N+E)V) time for the whole program- is this possible?
1. every MOD(s) can be O(V) in size 2. every ALIAS(x,p) can be O(V) in size 3. for each of the O(E) call sites in the program, must take theunion of O(V) sets of size O(V) → O(EV2) time — TOO MUCH!
CRPC
Integrating Aliases: The Trick
• Two Types of Variables in MOD(s):- global variables
O(V) of these, but each can only be aliased to formal parametersof the procedure p containing s (at most a small constant m)
- formal parameters of p up to m of these, but each can be aliased to O(V) global variablesand other parameters
• Integrating Aliases:- At each call site s
1. take the union of the sets of aliases (size m) of global variablesin MOD(s) and add the result to MOD(s) — O(mV) = O(V) time
2. for each formal parameter f in MOD(s) add ALIAS(f,p) toMOD(s) — O(mV) = O(V) time
- Total Time = O(EV)
CRPC
Parallelizing Loops with Calls
REAL A(100,100)
DO I = 1,Ns0
CALL XCOL(A,I)ENDDO
Can this loop be parallelized?
Answer: yes, if different iterations of the loop access completelydifferent subsections of array A.
Let W(A, s0, i) be the region written by XCOL on iteration i.
Let R(A, s0, j) be the region read by XCOL on iteration j.
Parallelizable if W(A, s0, i) ∩ R(A, s0, j) ≠ ∅ implies i=j
CRPC
Loops With Calls—Good News and Bad
• Good News- Algorithms for interprocedural analysis can be extended to work on
sections that form a lattice- Finite descending chain property
• Bad News- complexity proportional to depth of lattice- bit-vector algorithms cannot be used
CRPC
Regular Section Lattice
A(I,L)
A(*,J)
A(K,J)A(I,J)
A(I,*)
A(*,*)
∅no access
element
row or column
whole array
CRPC
Interprocedural Compilation Management
When do you know the whole program?
• at link time- many commercial systems now feature link-time interprocedural
optimization steps
• at load time- Java JIT compilers do some interprocedural transformations
• at program definition time- requires some sort of program management environment- two key functions
component import (information gathering) composition definition (information integration)
CRPC
Program Management System
ProgramCompiler
SourceImporter
OptimizingCompiler
ProgramSpecifier
Local Information Program Composition
Interprocedural AnalysisLocal and Global Inlining
Interprocedural Information
Rn Environment
CRPC
ProgramCompiler
SourceImporter
OptimizingCompiler
ProgramSpecifier
Source Changes File Changes
Determine Filesto Recompile
Interprocedural InfoRelied Upon
Inlining Info
Recompilation Analysis
CRPC
Compilation with Data
Program
FrequentlyChanging
Data
RarelyChanging
Data
ExtendedOptimizingCompiler
ObjectProgram Answers
CRPC
New Compiler Architecture
• Flexible Definition of Computation- Parameters
program scheme subprogram source files (s1, s2, ..., sn) run history (r1, r2, ..., rk) data sets (d1, d2, ..., dm) target configuration
• Compilation = Partial Evaluation- may be several compilation steps
information available at different times
• Program Management- Must decide when to back out of previous compilation decisions in
response to change- Must decide when to invalidate certain inputs
previous run histories
CRPC
Trusted Compiler
Trusted Compiler
Source fromCompany A
Source fromCompany B
LibrarySource from
Vendor C
Trusted Loader
Target Machine
Source encrypted using public key for <compiler,machine>
Validated to compile on machine x
CRPC
Summary
• Interprocedural compilation is becoming fundamental- essential to parallelization, inlining in object-oriented languages
• Solution technology has been developed- problem classification
flow-sensitive vs insensitive, forward vs backward- fast algorithms
linear time for flow-insensitive problems near linear approximations for flow-sensitive problems
• Methods can be extended to complex data structures- regular sections and bounded regular sections for arrays
• Program management systems will be required- no compilation-order dependences- recompilation analysis- inclusion of input data and run history