DEPENDENCE-DRIVEN LOOP MANIPULATION

DEPENDENCE-DRIVEN LOOPMANIPULATION

Based on notes by David PaduaUniversity of Illinois at Urbana-Champaign

1

DEPENDENCES

2

DEPENDENCES (continued)

3

DEPENDENCE ANDPARALLELIZATION (SPREADING)

4

OpenMP Implementation

5

RENAMING: To Remove Memory Related Dependencies

6

DEPENDENCES IN LOOPS

7

DEPENDENCES IN LOOPS (Cont.)

8

DEPENDENCE ANALYSIS

9

DEPENDENCE ANALYSIS (continued)

10

LOOP PARALLELIZATION AND VECTORIZATION

11

The reason is that if there are no cycles in the dependence graph, then there will be no races in the parallel loop.

A loop whose dependence graph is cycle free can be parallelized or vectorized

ALGORITHM REPLACEMENT

12

• Some program patterns occur frequently in programs. They can be replaced with a parallel algorithm.

LOOP DISTRIBUTION• To insulate these patterns, we

can decompose loops into several loops, one for each strongly-connected component (π-block)in the dependence graph.

13

LOOP DISTRIBUTION (continued)

14

LOOP INTERCHANGING• The dependence information determines whether or not the loop headers

can be interchanged. • The headers of the following loop can be interchanged

15

LOOP INTERCHANGING (continued)• The headers of the following loop can not be interchanged

16

DEPENDENCE REMOVAL: Scalar Expansion:• Some cycles in the dependence graph can be eliminated by using

elementary transformations.

17

Induction variable recognition

• Induction variable: a variable that gets increased or decreased by a fixed amount on every iteration of a loop

18

More about the DO to PARALLEL DO transformation

• When the dependence graph inside a loop has no cross-iteration dependences, it can be transformed into a PARALLEL loop.

19

do i=1,nS1: a(i) = b(i) + c(i)S2: d(i) = x(i) + 1end do

do i=1,nS1: a(i) = b(i) + c(i)S2: d(i) = a(i) + 1end do

Loop Alignment• When there are cross iteration dependences, but no cycles, do loops can be

aligned to be transformed into parallel loops (DOALLs)

21

Loop Distribution• Another method for eliminating cross-iteration dependences

22

Loop Coalescing for DOALL loops

Consider a perfectly nested DOALL loop such as

23

This could be trivially transformed into a singly nested loop with a tuple of variables as index:

This coalescing transformation is convenient for scheduling and could reduce the overhead involved in starting DOALL loops.

Extra Slides

24

25

Why loop Interchange: Matrix Multiplication Example A classic example for locality-aware programming is

matrix multiplication

for (i=0;i<N;i++)for (j=0;j<N;j++)for (k=0;k<N;k++)c[i,j] += a[i][k] * b[k][j];

There are 6 possible orders for the three loops i-j-k, i-k-j, j-i-k, j-k-i, k-i-j, k-j-i

Each order corresponds to a different access patterns of the matrices

Let’s focus on the inner loop, as it is the one that’s executed most often

26

Inner Loop Memory Accesses Each matrix element can be accessed in three modes in

the inner loop Constant: doesn’t depend on the inner loop’s index Sequential: contiguous addresses Stride: non-contiguous addresses (N elements apart)

c[i][j] += a[i][k] * b[k][j]; i-j-k: Constant Sequential Strided i-k-j: Sequential Constant Sequential j-i-k: Constant Sequential Strided j-k-i: Strided Strided Constant k-i-j: Sequential Constant Sequential k-j-i: Strided Strided Constant

27

Loop order and Performance Constant access is better than sequential

access it’s always good to have constants in loops

because they can be put in registers Sequential access is better than strided

access sequential access is better than strided because

it utilizes the cache better Let’s go back to the previous slides

28

Best Loop Ordering?c[i][j] += a[i][k] * b[k][j];

i-j-k: Constant Sequential Stridedi-k-j: Sequential Constant Sequentialj-i-k: Constant Sequential Stridedj-k-i: Strided Strided Constantk-i-j: Sequential Constant Sequentialk-j-i: Strided Strided Constant

k-i-j and i-k-j should have the best performance (no Strided) i-j-k and j-i-k should be worse ( 1 strided) j-k-i and k-j-i should be the worst (2 strided)

DEPENDENCE-DRIVEN LOOP MANIPULATION

Documents

Transcript of DEPENDENCE-DRIVEN LOOP MANIPULATION