Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf ·...

34
CRPC Interprocedural Compilation: Algorithms and Applications Ken Kennedy Center for Research on Parallel Computation Rice University 1. Why Should You Care? 2. Why Is It Hard? 3. Solution Methods an example: linear-time flow-insensitive analysis 4. Whole-Program Management 5. New Compiler Architectures http://www.cs.rice.edu/~ken/Presentations/Interprocedural.pdf

Transcript of Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf ·...

Page 1: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Interprocedural Compilation:Algorithms and Applications

Ken KennedyCenter for Research on Parallel Computation

Rice University

1. Why Should You Care? 2. Why Is It Hard? 3. Solution Methods

an example: linear-time flow-insensitive analysis 4. Whole-Program Management 5. New Compiler Architectures

http://www.cs.rice.edu/~ken/Presentations/Interprocedural.pdf

Page 2: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Why Should You Care?

• Performance is important- Even in Java

• Machines are becoming more complicated- performance penalties for mistakes are much larger- examples:

memory hierarchy management automatic parallelization insertion of communication

• Languages are becoming more Complex- Fortran 90, C++, Java, HPF are more complex to compile- examples:

type analysis in C++ and Java distribution analysis in HPF

Page 3: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Interprocedural Distribution Analysis in HPF

• HPF allows distributions to be inherited from callers- support for libraries

SUBROUTINE FOO(A,N)

REAL A(N)

!HPF$ INHERIT A

...

A(2:N-1) = (A(1:N-2) + A(3:N))/2.0

What is the distribution of A here?

Page 4: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Impact of Inlining in Java

Page 5: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Programming in the Future

• Challenges- programming is hard- professional programmers are in short supply- high performance will continue to be important

• A Solution: Make the End User a Programmer- professional programmers develop components- users integrate components using:

problem-solving environments (PSEs) scripting languages (possibly graphical)

examples: Tcl, Visual Basic, AVS, Khoros

• Compilation for High Performance- translate scripts and components to common intermediate language- optimize the resulting program using interprocedural methods

Page 6: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Script-Based Programming System

Script

IntermediateCode

TargetMachine 1

TargetMachine 2

TargetMachine 3

Portable ICCompiler

TranslationSystem

Whole-SystemCompiler

Native ICCompiler

ProgramComponent

ProgramComponent

Page 7: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Why Is It Hard?

• Problems can be large- in some cases, thousands of procedures and call sites- simply reading all program components may be too expensive to do in

any single step fear of recompiling "the world"

• Compilation time must be manageable- compilation algorithms should be linear or near-linear in the size of the

whole program- recompilation after small change should not take time proportional to

the size of the entire program

• Information may be difficult to collect- when do you know the composition of the entire program?

Page 8: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

MOD and REF

COMMON X, Y...DO I = 1, N

s0

CALL SX(I) = X(I) + Y(I)

ENDDODoes this vectorize?

X∉REF(s0) and X∉MOD(s

0) and Y∉MOD(s

0)

Yes, if

"reference" side effect set "modification" side effect set

Page 9: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

ALIAS

SUBROUTINE S(A,X,N)REAL A(*), X, YCOMMON YDO 100 I = 1, N

s0

X = X + Y*A(I)100 CONTINUE

END

CALL S(A,Y,N)

in a register?

but what if the call is:

register possible only if Y∉ALIAS(S,X)

Page 10: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

USE

DO I = 1, Ns0

CALL S(T,A) T = X(I)*C A(I) = T + B(I)

ENDDO

PARALLEL DO I = 1, NLOCAL tCALL S(t,A)t = X(I)*CA(I) = t + B(I)

ENDDO

Is there an upwards exposed use?

If not, we may parallelize the loop as follows, assuming T is notlive on exit:

One copy per iteration

T∉ USE(s0)

"Privatization"

Page 11: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

KILL

DO I = 1, Ns0

CALL INIT(T,I)T = T + B(I)A(I) = A(I) + T

ENDDO

Can we privatize T?

Yes, if there is no upwards exposed use in INIT and there is anassignment to T on every path through INIT.

T∉USE(s0) and T∈KILL(s

0)

Page 12: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Constant Propagation

SUBROUTINE S(A,B,N,IS,I1)REAL A(*), B(*)DO I = 0, N-1

10 A(IS*I + I1) = A(IS*I + I1) + B(I+1)ENDDO

END Vectorizable or a reduction?

The code is vectorizable if the compiler can show that thevalue of IS is a constant not equal to zero, (e.g. IS = 1).

IS∈CONST(S) and valin(IS,S) = 1

Page 13: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Problem Classification

• May vs Must (Barth 78)- May: MOD- Must: KILL- not a deep distinction—

the complement of a May problem is a Must problem

• Flow Sensitive vs Flow Insensitive (Banning 79)- Insensitive: MOD- Sensitive: KILL- a deep distinction

• Propagation vs Side Effect- based on the direction of data flow- Side Effect: MOD- Propagation: CONST

Page 14: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Problem Classification Matrix

Propagation Side Effect

Insensitive

Sensitive

ALIAS1

CONST

MODREF

USEKILL

1with no pointer variables

Page 15: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Call Graph Construction

SUBROUTINE S(X,P)s0

CALL P(X)RETURN

END

What can be called here?

Answer: any procedure in CALL(s0)

To build the call graph, insert an edge between the nodefor procedure S and the node for each procedure P inCALL(s) for each call site s in S.

Page 16: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Interprocedural Type Analysis

class Foo{private int x = 0;public void inc(){

x++;}public void useGoo(Goo goo){

for(int i = 0; i<10; i++){goo.dec();x--;inc();

}}

}

The Goo used here wasinstantiated as an objectof which class?

This is a direct analog of flow-sensitive call graph construction

Page 17: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Some Results

• Flow Insensitive Problems- precise solutions in O((N+E)V) (Cooper and Kennedy 84,88,89)

N nodes and E edges in call graph V variables in the program

• Flow Sensitive Problems- precise solutions intractable (Myers 80)- approximate solutions in near-linear time (Callahan, Cooper, Kennedy

Torczon 86, Callahan 88, numerous others)

• Call Graph Construction- precise solutions in exponential time (Ryder 79, Callahan, Carle, Hall,

Kennedy 90) manageable in practice

- approximate solutions in linear time (Hall, Kennedy 92)

Page 18: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

A Sample Problem: MOD

• Subdivide the problem into ones that can be managed

MOD

Alias-FreeMOD

ALIASAnalysis

ParameterPropagation Reachability

ALIASIntegration

PairPropagation

MappingAnalysis

Page 19: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Solving Alias-Free MOD

GMOD(p): the set of variables that may be modified as a sideeffect due to the invocation of procedure p.

indirect assignment: x mapped to y∈GMOD(q)

GMOD(p) = IMOD(p) ∪ ∪ {z such that z maps to y∈GMOD(q)}

procedure p(x)...x = expr...call q(x)

end

direct assignment: x∈IMOD(p)

s=(p,q)

View it as a data-flow problem on the call graph.

Note: MOD(s) is the set of variables that map to GMOD(p), where p is called at s

Page 20: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Bad News about GMOD

Problem is neither fast nor rapid—fast data flow methodscannot be used.

procedure foo(f0,f1,...,fn)...f0 = expr...call foo(f1,f2,...,fn,y)

end

What is in GMOD(foo)?

"shift-register effect"

Assumption: maximum number of parameters to anyprocedure does not grow with the size of the program, i.e.,there is a constant upper bound.

Page 21: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Subdividing the Problem

GMOD(p) = IMOD+(p) ∪ ∪ GMOD(q) ∩ ¬LOCALs=(p,q)

Separate global variables from reference formal parameters.

IMOD+(p): the set of variables in p that are modified in p eitherdirectly or indirectly as a side effect of being passed byreference at a call site.

This can be solved by a reachability algorithm—a variable is inGMOD(p) if it is in IMOD+(p) or there is a call chain to q and it isin IMOD+(q).

Page 22: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Computing IMOD+

1. Initially let IMOD+(p) = IMOD(p) for each p in theprogram.2. For each p in the program and for each formalparameter f of p, if f is in IMOD+(p) put f on the worklist W.3. While W is non-empty, take an arbitrary formalparameter f of p from it.

a. for each call site (q,p), if the variable x mapped to fat the call site is not in IMOD+(q), add it and,b. if x is a formal parameter of q, put it on W.

Linear because number of parameters is no more than aconstant factor larger than the size of the call graph.

Page 23: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Integrating Aliases

• Initially MOD contains direct modifications- must add to MOD(s) any variable that may be aliased to a variable in

MOD(s) in the procedure containing s

• ALIAS(x,p):- every variable that may be aliased to x on entry to procedure p- can be computed in O((N+E)V) time

• How do we integrate ALIAS with MOD?- must be done in O((N+E)V) time for the whole program- is this possible?

1. every MOD(s) can be O(V) in size 2. every ALIAS(x,p) can be O(V) in size 3. for each of the O(E) call sites in the program, must take theunion of O(V) sets of size O(V) → O(EV2) time — TOO MUCH!

Page 24: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Integrating Aliases: The Trick

• Two Types of Variables in MOD(s):- global variables

O(V) of these, but each can only be aliased to formal parametersof the procedure p containing s (at most a small constant m)

- formal parameters of p up to m of these, but each can be aliased to O(V) global variablesand other parameters

• Integrating Aliases:- At each call site s

1. take the union of the sets of aliases (size m) of global variablesin MOD(s) and add the result to MOD(s) — O(mV) = O(V) time

2. for each formal parameter f in MOD(s) add ALIAS(f,p) toMOD(s) — O(mV) = O(V) time

- Total Time = O(EV)

Page 25: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Parallelizing Loops with Calls

REAL A(100,100)

DO I = 1,Ns0

CALL XCOL(A,I)ENDDO

Can this loop be parallelized?

Answer: yes, if different iterations of the loop access completelydifferent subsections of array A.

Let W(A, s0, i) be the region written by XCOL on iteration i.

Let R(A, s0, j) be the region read by XCOL on iteration j.

Parallelizable if W(A, s0, i) ∩ R(A, s0, j) ≠ ∅ implies i=j

Page 26: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Loops With Calls—Good News and Bad

• Good News- Algorithms for interprocedural analysis can be extended to work on

sections that form a lattice- Finite descending chain property

• Bad News- complexity proportional to depth of lattice- bit-vector algorithms cannot be used

Page 27: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Regular Section Lattice

A(I,L)

A(*,J)

A(K,J)A(I,J)

A(I,*)

A(*,*)

∅no access

element

row or column

whole array

Page 28: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Interprocedural Compilation Management

When do you know the whole program?

• at link time- many commercial systems now feature link-time interprocedural

optimization steps

• at load time- Java JIT compilers do some interprocedural transformations

• at program definition time- requires some sort of program management environment- two key functions

component import (information gathering) composition definition (information integration)

Page 29: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Program Management System

ProgramCompiler

SourceImporter

OptimizingCompiler

ProgramSpecifier

Local Information Program Composition

Interprocedural AnalysisLocal and Global Inlining

Interprocedural Information

Rn Environment

Page 30: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

ProgramCompiler

SourceImporter

OptimizingCompiler

ProgramSpecifier

Source Changes File Changes

Determine Filesto Recompile

Interprocedural InfoRelied Upon

Inlining Info

Recompilation Analysis

Page 31: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Compilation with Data

Program

FrequentlyChanging

Data

RarelyChanging

Data

ExtendedOptimizingCompiler

ObjectProgram Answers

Page 32: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

New Compiler Architecture

• Flexible Definition of Computation- Parameters

program scheme subprogram source files (s1, s2, ..., sn) run history (r1, r2, ..., rk) data sets (d1, d2, ..., dm) target configuration

• Compilation = Partial Evaluation- may be several compilation steps

information available at different times

• Program Management- Must decide when to back out of previous compilation decisions in

response to change- Must decide when to invalidate certain inputs

previous run histories

Page 33: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Trusted Compiler

Trusted Compiler

Source fromCompany A

Source fromCompany B

LibrarySource from

Vendor C

Trusted Loader

Target Machine

Source encrypted using public key for <compiler,machine>

Validated to compile on machine x

Page 34: Interprocedural Compilation: Algorithms and Applicationsken/Presentations/Interprocedural.pdf · CRPC Why Is It Hard? •Problems can be large-in some cases, thousands of procedures

CRPC

Summary

• Interprocedural compilation is becoming fundamental- essential to parallelization, inlining in object-oriented languages

• Solution technology has been developed- problem classification

flow-sensitive vs insensitive, forward vs backward- fast algorithms

linear time for flow-insensitive problems near linear approximations for flow-sensitive problems

• Methods can be extended to complex data structures- regular sections and bounded regular sections for arrays

• Program management systems will be required- no compilation-order dependences- recompilation analysis- inclusion of input data and run history