Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and...
-
Upload
chrystal-james -
Category
Documents
-
view
219 -
download
4
Transcript of Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and...
Optimizing Compilers for Modern Architectures
Preliminary Transformations
Chapter 4 of Allen and Kennedy
Optimizing Compilers for Modern Architectures
Overview
• Why do we need this?—Requirements of dependence testing
– Stride 1– Normalized loop– Linear subscripts– Subscripts composed of functions of loop induction
variables—Higher dependence test accuracy—Easier implementation of dependence tests
Optimizing Compilers for Modern Architectures
An Example
INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 KI = KI + INC U(KI) = U(KI) + W(J) ENDDO S(I) = U(KI)ENDDO
• Programmers optimized code—Confusing to smart compilers
Optimizing Compilers for Modern Architectures
An Example
INC = 2KI = 0DO I = 1, 100 DO J = 1, 100
! Deleted: KI = KI + INC U(KI + J*INC) = U(KI + J*INC) + W(J) ENDDO KI = KI + 100 * INC S(I) = U(KI)ENDDO
• Applying induction-variable substitution—Replace references to AIV with functions of loop index
Optimizing Compilers for Modern Architectures
An Example
INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 U(KI + (I-1)*100*INC + J*INC) = U(KI + (I-1)*100*INC + J*INC) + W(J) ENDDO ! Deleted: KI = KI + 100 * INC S(I) = U(KI + I * (100*INC))ENDDOKI = KI + 100 * 100 * INC
• Second application of IVS—Remove all references to KI
Optimizing Compilers for Modern Architectures
An Example
INC = 2! Deleted: KI = 0DO I = 1, 100 DO J = 1, 100 U(I*200 + J*2 - 200) = U(I*200 + J*2 -200) + W(J) ENDDO S(I) = U(I*200)ENDDOKI = 20000
• Applying Constant Propagation—Substitute the constants
Optimizing Compilers for Modern Architectures
An Example
DO I = 1, 100 DO J = 1, 100 U(I*200 + J*2 - 200) = U(I*200 + J*2 - 200) + W(J) ENDDO S(I) = U(I*200)ENDDO
• Applying Dead Code Elimination—Removes all unused code
Optimizing Compilers for Modern Architectures
Information Requirements
• Transformations need knowledge—Loop Stride—Loop-invariant quantities—Constant-values assignment—Usage of variables
Optimizing Compilers for Modern Architectures
Loop Normalization
• Lower Bound 1 with Stride 1
• To make dependence testing as simple as possible
• Serves as information gathering phase
Optimizing Compilers for Modern Architectures
Loop Normalization
• Algorithm
Procedure normalizeLoop(L0);
i = a unique compiler-generated LIV
S1: replace the loop header for L0
DO I = L, U, S
with the adjusted loop header
DO i = 1, (U – L + S) / S;
S2: replace each reference to I within the loop by
i * S – S + L;
S3: insert a finalization assignment
I = i * S – S + L;
immediately after the end of the loop;
end normalizeLoop;
Optimizing Compilers for Modern Architectures
Loop Normalization
• Caveat— Un-normalized:
DO I = 1, M
DO J = I, N
A(J, I) = A(J, I - 1) + 5
ENDDO
ENDDO
Has a direction vector of (<,=)
— Normalized:
DO I = 1, M
DO J = 1, N – I + 1
A(J + I – 1, I) = A(J + I – 1, I – 1) + 5
ENDDO
ENDDO
Has a direction vector of (<,>)
Optimizing Compilers for Modern Architectures
Loop Normalization
• Caveat— Consider interchanging loops
– (<,=) becomes (=,>) OK– (<,>) becomes (>,<) Problem
Handled by another transformation— What if the step size is symbolic?
– Prohibits dependence testing– Workaround: use step size 1
Less precise, but allow dependence testing
Optimizing Compilers for Modern Architectures
Definition-use Graph
• Traditionally called Definition-use Chains
• Provides the map of variables usage
• Heavily used by the transformations
Optimizing Compilers for Modern Architectures
Definition-use Graph
• Definition-use graph is a graph that contains an edge from each definition point in the program to every possible use of the variable at run time
• uses(b): the set of all variables used within the block b that have no prior definitions within the block
• defsout(b): the set of all definitions within block b that are not killed within the block
• killed(b): the set of all definitions that define variables killed by other definitions within block b
U)(
)))()(()(()(bPp
pkilledpreachespdefsoutbreaches∈
¬∩∪=
Optimizing Compilers for Modern Architectures
Definition-use Graph
• Computing reaches for one block b may immediately change all other reaches including b itself since reaches(b) is an input into other reaches equations
• Archiving correct solutions requires simultaneously solving all individual equations—There is a workaround this
Optimizing Compilers for Modern Architectures
Definition-use Graph
Optimizing Compilers for Modern Architectures
Definition-use Graph
Optimizing Compilers for Modern Architectures
Dead Code Elimination
• Removes all dead code
• What is Dead Code ?—Code whose results are never used in any ‘Useful
statements’
• What are Useful statements ?—Are they simply output statements ?—Output statements, input statements, control flow
statements, and their required statements
• Makes code cleaner
Optimizing Compilers for Modern Architectures
Dead Code Elimination
Optimizing Compilers for Modern Architectures
Constant Propagation
• Replace all variables that have constant values at runtime with those constant values
Optimizing Compilers for Modern Architectures
Constant Propagation
Optimizing Compilers for Modern Architectures
Constant Propagation
Optimizing Compilers for Modern Architectures
Static Single-Assignment
• Reduces the number of definition-use edges
• Improves performance of algorithms
Optimizing Compilers for Modern Architectures
Static Single-Assignment
• Example
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
DO I = 1, 100 K = I + 2 A(K) = A(K) + 5 ENDDO
DO I = 1, 100 A(I+2) = A(I+2) + 5ENDDO
• Example
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
• Need definition-use edges and control flow analysis
• Need to guarantee that the definition is always executed on a loop iteration before the statement into which it is substituted
• Data structure to find out if a statement S is in loop L—Test whether level-K loop containing S is equal to L
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
Optimizing Compilers for Modern Architectures
Forward Expression Substitution
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
• Definition: an auxiliary induction variable in a DO loop headed by DO I = LB, UB, S is any variable that can be correctly expressed as cexpr * I + iexprL at every location L where it is used in the loop, where cexpr and iexprL are expressions that do not vary in the loop, although different locations in the loop may require substitution of different values of iexprL
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
• Example:
DO I = 1, N
A(I) = B(K) + 1
K = K + 4
…
D(K) = D(K) + A(I)
ENDDO
Optimizing Compilers for Modern Architectures
Induction Variable Recognition
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
• Induction Variable Recognition
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
• More complex example
DO I = 1, N, 2
K = K + 1
A(K) = A(K) + 1
K = K + 1
A(K) = A(K) + 1
ENDDO
• Alternative strategy is to recognize region invariance
DO I = 1, N, 2
A(K+1) = A(K+1) + 1
K = K+1 + 1
A(K) = A(K) + 1
ENDDO
Optimizing Compilers for Modern Architectures
Induction Variable Substitution
• Driver
Optimizing Compilers for Modern Architectures
IVSub without loop normalization
DO I = L, U, S
K = K + N
… = A(K)
ENDDO
DO I = L, U, S
… = A(K + (I – L + S) / S * N)
ENDDO
K = K + (U – L + S) / S * N
Optimizing Compilers for Modern Architectures
IVSub without loop normalization
• Problem:—Inefficient code—Nonlinear subscript
Optimizing Compilers for Modern Architectures
IVSub with Loop Normalization
I = 1
DO i = 1, (U-L+S)/S, 1
K = K + N
… = A (K)
I = I + 1
ENDDO
Optimizing Compilers for Modern Architectures
IVSub with Loop Normalization
I = 1
DO i = 1, (U – L + S) / S, 1
… = A (K + i * N)
ENDDO
K = K + (U – L + S) / S * N
I = I + (U – L + S) / S
Optimizing Compilers for Modern Architectures
Summary
• Transformations to put more subscripts into standard form—Loop Normalization—Constant Propagation—Induction Variable Substitution
• Do loop normalization before induction-variable substitution
• Leave optimizations to compilers