Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and...

43
Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy

Transcript of Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and...

Page 1: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Preliminary Transformations

Chapter 4 of Allen and Kennedy

Page 2: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Overview

• Why do we need this?—Requirements of dependence testing

– Stride 1– Normalized loop– Linear subscripts– Subscripts composed of functions of loop induction

variables—Higher dependence test accuracy—Easier implementation of dependence tests

Page 3: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

An Example

INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 KI = KI + INC U(KI) = U(KI) + W(J) ENDDO S(I) = U(KI)ENDDO

• Programmers optimized code—Confusing to smart compilers

Page 4: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

An Example

INC = 2KI = 0DO I = 1, 100 DO J = 1, 100

! Deleted: KI = KI + INC U(KI + J*INC) = U(KI + J*INC) + W(J) ENDDO KI = KI + 100 * INC S(I) = U(KI)ENDDO

• Applying induction-variable substitution—Replace references to AIV with functions of loop index

Page 5: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

An Example

INC = 2KI = 0DO I = 1, 100 DO J = 1, 100 U(KI + (I-1)*100*INC + J*INC) = U(KI + (I-1)*100*INC + J*INC) + W(J) ENDDO ! Deleted: KI = KI + 100 * INC S(I) = U(KI + I * (100*INC))ENDDOKI = KI + 100 * 100 * INC

• Second application of IVS—Remove all references to KI

Page 6: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

An Example

INC = 2! Deleted: KI = 0DO I = 1, 100 DO J = 1, 100 U(I*200 + J*2 - 200) = U(I*200 + J*2 -200) + W(J) ENDDO S(I) = U(I*200)ENDDOKI = 20000

• Applying Constant Propagation—Substitute the constants

Page 7: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

An Example

DO I = 1, 100 DO J = 1, 100 U(I*200 + J*2 - 200) = U(I*200 + J*2 - 200) + W(J) ENDDO S(I) = U(I*200)ENDDO

• Applying Dead Code Elimination—Removes all unused code

Page 8: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Information Requirements

• Transformations need knowledge—Loop Stride—Loop-invariant quantities—Constant-values assignment—Usage of variables

Page 9: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Loop Normalization

• Lower Bound 1 with Stride 1

• To make dependence testing as simple as possible

• Serves as information gathering phase

Page 10: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Loop Normalization

• Algorithm

Procedure normalizeLoop(L0);

i = a unique compiler-generated LIV

S1: replace the loop header for L0

DO I = L, U, S

with the adjusted loop header

DO i = 1, (U – L + S) / S;

S2: replace each reference to I within the loop by

i * S – S + L;

S3: insert a finalization assignment

I = i * S – S + L;

immediately after the end of the loop;

end normalizeLoop;

Page 11: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Loop Normalization

• Caveat— Un-normalized:

DO I = 1, M

DO J = I, N

A(J, I) = A(J, I - 1) + 5

ENDDO

ENDDO

Has a direction vector of (<,=)

— Normalized:

DO I = 1, M

DO J = 1, N – I + 1

A(J + I – 1, I) = A(J + I – 1, I – 1) + 5

ENDDO

ENDDO

Has a direction vector of (<,>)

Page 12: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Loop Normalization

• Caveat— Consider interchanging loops

– (<,=) becomes (=,>) OK– (<,>) becomes (>,<) Problem

Handled by another transformation— What if the step size is symbolic?

– Prohibits dependence testing– Workaround: use step size 1

Less precise, but allow dependence testing

Page 13: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Definition-use Graph

• Traditionally called Definition-use Chains

• Provides the map of variables usage

• Heavily used by the transformations

Page 14: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Definition-use Graph

• Definition-use graph is a graph that contains an edge from each definition point in the program to every possible use of the variable at run time

• uses(b): the set of all variables used within the block b that have no prior definitions within the block

• defsout(b): the set of all definitions within block b that are not killed within the block

• killed(b): the set of all definitions that define variables killed by other definitions within block b

U)(

)))()(()(()(bPp

pkilledpreachespdefsoutbreaches∈

¬∩∪=

Page 15: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Definition-use Graph

• Computing reaches for one block b may immediately change all other reaches including b itself since reaches(b) is an input into other reaches equations

• Archiving correct solutions requires simultaneously solving all individual equations—There is a workaround this

Page 16: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Definition-use Graph

Page 17: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Definition-use Graph

Page 18: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Dead Code Elimination

• Removes all dead code

• What is Dead Code ?—Code whose results are never used in any ‘Useful

statements’

• What are Useful statements ?—Are they simply output statements ?—Output statements, input statements, control flow

statements, and their required statements

• Makes code cleaner

Page 19: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Dead Code Elimination

Page 20: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Constant Propagation

• Replace all variables that have constant values at runtime with those constant values

Page 21: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Constant Propagation

Page 22: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Constant Propagation

Page 23: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Static Single-Assignment

• Reduces the number of definition-use edges

• Improves performance of algorithms

Page 24: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Static Single-Assignment

• Example

Page 25: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

DO I = 1, 100 K = I + 2 A(K) = A(K) + 5 ENDDO

DO I = 1, 100 A(I+2) = A(I+2) + 5ENDDO

• Example

Page 26: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

• Need definition-use edges and control flow analysis

• Need to guarantee that the definition is always executed on a loop iteration before the statement into which it is substituted

• Data structure to find out if a statement S is in loop L—Test whether level-K loop containing S is equal to L

Page 27: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

Page 28: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

Page 29: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

Page 30: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Forward Expression Substitution

Page 31: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

• Definition: an auxiliary induction variable in a DO loop headed by DO I = LB, UB, S is any variable that can be correctly expressed as cexpr * I + iexprL at every location L where it is used in the loop, where cexpr and iexprL are expressions that do not vary in the loop, although different locations in the loop may require substitution of different values of iexprL

Page 32: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

• Example:

DO I = 1, N

A(I) = B(K) + 1

K = K + 4

D(K) = D(K) + A(I)

ENDDO

Page 33: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Recognition

Page 34: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

• Induction Variable Recognition

Page 35: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

Page 36: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

Page 37: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

• More complex example

DO I = 1, N, 2

K = K + 1

A(K) = A(K) + 1

K = K + 1

A(K) = A(K) + 1

ENDDO

• Alternative strategy is to recognize region invariance

DO I = 1, N, 2

A(K+1) = A(K+1) + 1

K = K+1 + 1

A(K) = A(K) + 1

ENDDO

Page 38: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Induction Variable Substitution

• Driver

Page 39: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

IVSub without loop normalization

DO I = L, U, S

K = K + N

… = A(K)

ENDDO

DO I = L, U, S

… = A(K + (I – L + S) / S * N)

ENDDO

K = K + (U – L + S) / S * N

Page 40: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

IVSub without loop normalization

• Problem:—Inefficient code—Nonlinear subscript

Page 41: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

IVSub with Loop Normalization

I = 1

DO i = 1, (U-L+S)/S, 1

K = K + N

… = A (K)

I = I + 1

ENDDO

Page 42: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

IVSub with Loop Normalization

I = 1

DO i = 1, (U – L + S) / S, 1

… = A (K + i * N)

ENDDO

K = K + (U – L + S) / S * N

I = I + (U – L + S) / S

Page 43: Optimizing Compilers for Modern Architectures Preliminary Transformations Chapter 4 of Allen and Kennedy.

Optimizing Compilers for Modern Architectures

Summary

• Transformations to put more subscripts into standard form—Loop Normalization—Constant Propagation—Induction Variable Substitution

• Do loop normalization before induction-variable substitution

• Leave optimizations to compilers