Lecture 11: Code Optimization CS 540 George Mason University.

58
Lecture 11: Code Optimization CS 540 George Mason University

Transcript of Lecture 11: Code Optimization CS 540 George Mason University.

Page 1: Lecture 11: Code Optimization CS 540 George Mason University.

Lecture 11: Code Optimization

CS 540

George Mason University

Page 2: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 2

Code Optimization

REQUIREMENTS:• Meaning must be preserved (correctness)• Speedup must occur on average.• Work done must be worth the effort.

OPPORTUNITIES:• Programmer (algorithm, directives)• Intermediate code• Target code

Page 3: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 3

Code Optimization

Scanner(lexical

analysis)

Parser(syntax

analysis)

CodeOptimizer

SemanticAnalysis

(IC generator)

CodeGenerator

SymbolTable

Sourcelanguage

tokens Syntacticstructure

Syntactic/semanticstructure

Targetlanguage

Page 4: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 4

Levels

• Window – peephole optimization

• Basic block

• Procedural – global (control flow graph)

• Program level – intraprocedural (program dependence graph)

Page 5: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 5

Peephole Optimizations

• Constant Folding x := 32 becomes x := 64 x := x + 32

• Unreachable Code goto L2 x := x + 1 unneeded

• Flow of control optimizations goto L1 becomes goto L2 …L1: goto L2

Page 6: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 6

Peephole Optimizations

• Algebraic Simplification x := x + 0 unneeded

• Dead code x := 32 where x not used after statement

y := x + y y := y + 32

• Reduction in strength x := x * 2 x := x + x

Page 7: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 7

Peephole Optimizations

• Local in nature

• Pattern driven

• Limited by the size of the window

Page 8: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 8

Basic Block Level

• Common Subexpression elimination

• Constant Propagation

• Dead code elimination

• Plus many others such as copy propagation, value numbering, partial redundancy elimination, …

Page 9: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 9

Simple example: a[i+1] = b[i+1]

• t1 = i+1• t2 = b[t1]• t3 = i + 1• a[t3] = t2

• t1 = i + 1• t2 = b[t1]• t3 = i + 1 no longer live• a[t1] = t2

Common expression can be eliminated

Page 10: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 10

• i = 4• t1 = i+1• t2 = b[t1]• a[t1] = t2

• i = 4• t1 = 5• t2 = b[t1]• a[t1] = t2

Now, suppose i is a constant:

• i = 4• t1 = 5 • t2 = b[5]• a[5] = t2

• i = 4• t2 = b[5]• a[5] = t2

Final Code:

Page 11: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 11

Control Flow Graph - CFGCFG = < V, E, Entry >, where

V = vertices or nodes, representing an instruction or basic block (group of statements).

E = (V x V) edges, potential flow of control Entry is an element of V, the unique program entry

Two sets used in algorithms:• Succ(v) = {x in V| exists e in E, e = v x}• Pred(v) = {x in V| exists e in E, e = x v}

1 2 3 4 5

Page 12: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 12

Definitions

• point - any location between adjacent statements and before and after a basic block.

• A path in a CFG from point p1 to pn is a sequence of points such that j, 1 <= j < n, either pi is the point immediately preceding a statement and pi+1 is the point immediately following that statement in the same block, or pi is the end of some block and pi+1 is the start of a successor block.

Page 13: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 13

CFGc = a + bd = a * ci = 1

f[i] = a + bc = c * 2if c > d

g = a * c g = d * d

i = i + 1 if i > 10

points

path

Page 14: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 14

Optimizations on CFG

• Must take control flow into account – Common Sub-expression Elimination – Constant Propagation – Dead Code Elimination – Partial redundancy Elimination– …

• Applying one optimization may create opportunities for other optimizations.

Page 15: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 15

Redundant Expressions

An expression x op y is redundant at a point p if it has already been computed at some point(s) and no intervening operations redefine x or y.

m = 2*y*z t0 = 2*y t0 = 2*ym = t0*z m = t0*z

n = 3*y*z t1 = 3*y t1 = 3*yn = t1*z n = t1*z

o = 2*y–z t2 = 2*yo = t2-z o = t0-z

redundant

Page 16: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 16

Redundant Expressionsc = a + bd = a * ci = 1

f[i] = a + bc = c * 2if c > d

g = a * c g = d * d

i = i + 1 if i > 10

Candidates: a + b a * c d * d c * 2 i + 1

Definition site

Since a + b isavailable here, redundant!

Page 17: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 17

Redundant Expressionsc = a + bd = a * ci = 1

f[i] = a + bc = c * 2if c > d

g = a * c g = d * d

i = i + 1 if i > 10

Candidates: a + b a * c d * d c * 2 i + 1

Definition site

Kill site

Not available Not redundant

Page 18: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 18

Redundant Expressions

• An expression e is defined at some point p in the CFG if its value is computed at p. (definition site)

• An expression e is killed at point p in the CFG if one or more of its operands is defined at p. (kill site)

• An expression is available at point p in a CFG if every path leading to p contains a prior definition of e and e is not killed between that definition and p.

Page 19: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 19

Removing Redundant Expressionst1 = a + bc = t1d = a * ci = 1

f[i] = t1c = c * 2if c > d

g = a * c g = d*d

i = i + 1 if i > 10

Candidates: a + b a * c d * d c * 2 i + 1

Page 20: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 20

Constant Propagation

b = 5c = 4*bc > b

d = b + 2

e = a + b

b = 5c = 20c > 5

d = 7

e = a + 5e = a + b

tf tf

b = 5c = 2020 > 5

d = 7

e = a + 5

tf

Page 21: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 21

Constant Propagation

b = 5c = 2020 > 5

d = 7

e = a + 5

tf

b = 5c = 20d = 7e = a + 5

Page 22: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 22

Copy Propagation

b = ac = 4*bc > b

d = b + 2

e = a + b

b = ac = 4*ac > a

d = a + 2

e = a + ae = a + b

Page 23: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 23

Simple Loop Optimizations: Code Motion

while (i <= limit - 2)

t := limit - 2 while (i <= t)

L1: t1 = limit – 2 if (i > t1) goto L2 body of loop goto L1L2:

t1 = limit – 2L1: if (i > t1) goto L2 body of loop goto L1L2:

Page 24: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 24

Simple Loop Optimizations: Strength Reduction

• Induction Variables control loop iterations

j = j – 1t4 = 4 * jt5 = a[t4]if t5 > v

j = j – 1t4 = t4 - 4 t5 = a[t4]if t5 > v

t4 = 4*j

Page 25: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 25

Simple Loop Optimizations

• Loop transformations are often used to expose other optimization opportunities:– Normalization– Loop Interchange– Loop Fusion– Loop Reversal– …

Page 26: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 26

Consider Matrix Multiplicationfor i = 1 to n do for j = 1 to n do for k = 1 to n do C[i,j] = C[i,j] + A[i,k] + B[k,j] end endend

= +i i

j j

kk

C BA

Page 27: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 27

Memory Usage• For A: Elements are accessed across rows, spatial locality

is exploited for cache (assuming row major storage)

• For B: Elements are accessed along columns, unless cache can hold all of B, cache will have problems.

• For C: Single element computed per loop – use register to hold

= +i i

j j

kk

C BA

Page 28: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 28

Matrix Multiplication Version 2for i = 1 to n do for k = 1 to n do for j = 1 to n do C[i,j] = C[i,j] + A[i,k] + B[k,j] end endend

= +i i

j jk

C BA

loop interchange

k

Page 29: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 29

Memory Usage• For A: Single element loaded for loop body• For B: Elements are accessed along rows to

exploit spatial locality.• For C: Extra loading/storing, but across rows

= +i i

j jk

C BAk

Page 30: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 30

Simple Loop Optimizations

• How to determine safety?– Does the new multiply give the same answer?– Can be reversed??

for (I=1 to N) a[I] = a[I+1] – can this loop be safely reversed?

Page 31: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 31

Data Dependencies

• Flow Dependencies - write/readx := 4; y := x + 1

• Output Dependencies - write/writex := 4; x := y + 1;

• Antidependencies - read/writey := x + 1; x := 4;

Page 32: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 32

x := 4 y := 6 p := x + 2 z := y + p x := z y := p

x := 4 y := 6

p := x + 2

z := y + p

y := p x := zFlowOutputAnti

Page 33: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 33

Global Data Flow Analysis

Collecting information about the way data is used in a program.

• Takes control flow into account• HL control constructs

– Simpler – syntax driven– Useful for data flow analysis of source code

• General control constructs – arbitrary branchingInformation needed for optimizations such as:

constant propagation, common sub-expressions, partial redundancy elimination …

Page 34: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 34

Dataflow Analysis: Iterative Techniques

• First, compute local (block level) information.

• Iterate until no changeswhile change do change = false for each basic block apply equations updating IN and OUT if either IN or OUT changes, set change

to trueend

Page 35: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 35

Live Variable Analysis

A variable x is live at a point p if there is some path from p where x is used before it is defined.

Want to determine for some variable x and point p whether the value of x could be used along some path starting at p.

• Information flows backwards • May – ‘along some path starting at p’

is x live here?

Page 36: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 36

Global Live Variable Analysis

Want to determine for some variable x and point p whether the value of x could be used along some path starting at p.

• DEF[B] - set of variables assigned values in B prior to any use of that variable

• USE[B] - set of variables used in B prior to any definition of that variable

• OUT[B] - variables live immediately after the block OUT[B] - IN[S] for all S in succ(B)

• IN[B] - variables live immediately before the blockIN[B] = USE[B] + (OUT[B] - DEF[B])

Page 37: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 37

d1: a = 1d2: b = 2

d3: c = a + bd4: d = c - a

d8: b = a + bd9: e = c - 1

d10: a = b * dd22: b = a - d

d5: d = b * d

d6: d = a + bd7: e = e + 1

B1

B2

B3

B4

B5

B6

DEF=a,bUSE =

DEF=c,dUSE = a,b

DEF=USE = b,d

DEF=dUSE = a,b,e

DEF= eUSE = a,b,c

DEF= aUSE = b,d

Page 38: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 38

Global Live Variable Analysis

Want to determine for some variable x and point p whether the value of x could be used along some path starting at p.

• DEF[B] - set of variables assigned values in B prior to any use of that variable

• USE[B] - set of variables used in B prior to any definition of that variable

• OUT[B] - variables live immediately after the block OUT[B] - IN[S] for all S in succ(B)

• IN[B] - variables live immediately before the blockIN[B] = USE[B] (OUT[B] - DEF[B])

Page 39: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 39

IN OUT IN OUT IN OUT

B1 a,b a,b e a,b,e

B2 a,b a,b,c,d a,b,e a,b,c,d ,e a,b,e a,b,c,d,e

B3 a,b,c,d e a,b,c,e a,b,c,d,e a,b,c,d,e a,b,c,d,e a,b,c,d,e

B4 a,b,c,e a,b,c,d,e a,b,c,e a,b,c,d,e a,b,c,e a,b,c,d,e

B5 a,b,c,d a,b,d a,b,c,d a,b,d,e a,b,c,d a,b,d,e

B6 b,d b,d b,d

OUT[B] = IN[S] for all S in succ(B)IN[B] = USE[B] + (OUT[B] - DEF[B])

Block DEF USE

B1 {a,b} { }

B2 {c,d} {a,b}

B3 { } {b,d}

B4 {d} {a,b,e}

B5 {e} {a,b,c}

B6 {a} {b,d}

Page 40: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 40{ }

{b,d}

{e}

{a,b,e}

{a,b,e}

{a,b,c,d,e}{a,b,c,d,e}

{a,b,c,d,e}

{a,b,c,d}

{a,b,d,e}

{a,b,c,e}

{a,b,c,d,e}

Page 41: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 41

Dataflow Analysis Problem #2: Reachability

• A definition of a variable x is a statement that may assign a value to x.

• A definition may reach a program point p if there exists some path from the point immediately following the definition to p such that the assignment is not killed along that path.

• Concept: relationship between definitions and uses

Page 42: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 42

What blocks do definitions d2 and d4 reach?

d1 i = m – 1d2 j = n

d3 i = i + 1

d4 j = j - 1

B1

B2

B3

B4 B5

d2d4

Page 43: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 43

Reachability Analysis: Unstructured Input

1. Compute GEN and KILL at block—level

2. Compute IN[B] and OUT[B] for BIN[B] = U OUT[P] where P is a predecessor of B

OUT[B] = GEN[B] U (IN[B] - KILL[B])

3. Repeat step 2 until there are no changes to OUT sets

Page 44: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 44

Reachability Analysis: Step 1

For each block, compute local (block level) information = GEN/KILL sets– GEN[B] = set of definitions generated by B– KILL[B] = set of definitions that can not reach

the end of B

This information does not take control flow between blocks into account.

Page 45: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 45

Reasoning about Basic Blocks

Effect of single statement: a = b + c• Uses variables {b,c}• Kills all definitions of {a}• Generates new definition (i.e. assigns a value)

of {a}

Local Analysis: • Analyze the effect of each instruction• Compose these effects to derive information about

the entire block

Page 46: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 46

Exampled1 i = m – 1d2 j = nd3 a = u1

B1

B2

B3

B4

d4 i = i + 1d5 j = j - 1

d6 a = u2

d7 i = u2

Gen = 4,5Kill = 1,2,7

Gen = 1,2,3Kill = 4,5,6,7

Gen = 7Kill = 1,4Gen = 6

Kill = 3

Page 47: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 47

Reachability Analysis: Step 2

Compute IN/OUT for each block in a forward direction. Start with IN[B] = – IN[B] = set of defns reaching the start of B

= (out[P]) for all predecessor blocks in the CFG

– OUT[B] = set of defns reaching the end of B

= GEN[B] (IN[B] – KILL[B])

Keep computing IN/OUT sets until a fixed point is reached.

Page 48: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 48

Reaching Definitions Algorithm

• Input: Flow graph with GEN and KILL for each block• Output: in[B] and out[B] for each block. For each block B do out[B] = gen[B], (true if in[B] = emptyset) change := true;

while change do begin change := false; for each block B do begin in[B] := U out[P], where P is a predecessor of B; oldout = out[B]; out[B] := gen[B] U (in[B] - kill [B]) if out[B] != oldout then change := true;

end end

Page 49: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 49

IN OUT

B1 1,2,3

B2 4,5

B3 6

B4 7

IN[B] = (out[P]) for all predecessor blocks in the CFGOUT[B] = GEN[B] (IN[B] – KILL[B])

d1 i = m – 1d2 j = nd3 a = u1

B1

B2

B3

B4

d4 i = i + 1d5 j = j - 1

d6 a = u2

d7 i = u2

Gen = 4,5Kill = 1,2,7

Gen = 1,2,3Kill = 4,5,6,7

Gen = 7Kill = 1,4

Gen = 6Kill = 3

Page 50: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 50

IN OUT IN OUT

B1 1,2,3 1,2,3

B2 4,5 OUT[1]+OUT[4]

= 1,2,3,7

4,5 + (1,2,3,7

– 1,2,7)

= 3,4,5

B3 6 OUT[2] = 3,4,5 6 + (3,4,5 – 3)

= 4,5,6

B4 7 OUT[2]+OUT[3]

= 3,4,5,6

7 + (3,4,5,6 – 1,4) = 3,5,6,7

IN[B] = (out[P]) for all predecessor blocks in the CFGOUT[B] = GEN[B] + (IN[B] – KILL[B])

Page 51: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 51

IN OUT IN OUT IN OUT

B1 1,2,3 1,2,3 1,2,3

B2 4,5 1,2,3,7 3,4,5 OUT[1] + OUT[4] = 1,2,3,5,6,7

4,5 + (1,2,3,5,6,7-1,2,7) = 3,4,5,6

B3 6 3,4,5 4,5,6 OUT[2] = 3,4,5,6 6 + (3,4,5,6 – 3)

= 4,5,6

B4 7 3,4,5,6 3,5,6,7 OUT[2] + OUT[3] = 3,4,5,6

7+(3,4,5,6 – 1,4)

= 3,5,6,7

IN[B] = (out[P]) for all predecessor blocks in the CFGOUT[B] = GEN[B] + (IN[B] – KILL[B])

Page 52: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 52

Forward vs. Backward

Forward flow vs. Backward flow Forward: Compute OUT for given IN,GEN,KILL– Information propagates from the predecessors of a

vertex.– Examples: Reachability, available expressions, constant

propagation

Backward: Compute IN for given OUT,GEN,KILL– Information propagates from the successors of a vertex.– Example: Live variable Analysis

Page 53: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 53

Forward vs. Backward Equations

Forward vs. backward– Forward:

• IN[B] - process OUT[P] for all P in predecessors(B)

• OUT[B] = local U (IN[B] – local)

– Backward:• OUT[B] - process IN[S] for all S in

successor(B)

• IN[B] = local U (OUT[B] – local)

Page 54: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 54

May vs. Must

May vs. Must Must – true on all paths

Ex: constant propagation – variable must provably hold appropriate constant on all paths in order to do a substitution

May – true on some pathEx: Live variable analysis – a variable is live if it could

be used on some path; reachability – a definition reaches a point if it can reach it on some path

Page 55: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 55

May vs. Must Equations

• May vs. Must– May – IN[B] = (out[P]) for all

P in pred(B) – Must – IN[B] = (out[P]) for

all P in pred(B)

Page 56: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 56

• Reachability– IN[B] = (out[P]) for all P in pred(B)

– OUT[B] = GEN[B] + (IN[B] – KILL[B])

• Live Variable Analysis– OUT[B] = (IN[S]) for all S in succ(B)

– IN[B] = USE[B] (OUT[B] - DEF[B])

• Constant Propagation– IN[B] = (out[P]) for all P in pred(B)

– OUT[B] = DEF_CONST[B] (IN[B] – KILL_CONST[B])

Page 57: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 57

Discussion

• Why does this work?– Finite set – can be represented as bit vectors– Theory of lattices

• Is this guaranteed to terminate?– Sets only grow and since finite in size …

• Can we find ways to reduce the number of iterations?

Page 58: Lecture 11: Code Optimization CS 540 George Mason University.

CS 540 Spring 2009 GMU 58

Choosing visit order for Dataflow Analysis

In forward flow analysis situations, if we visit the blocks in depth first order, we can reduce the number of iterations.

Suppose definition d follows block path 3 5 19 35 16 23 45 4 10 17 where the block numbering corresponds to the preorder depth-first numbering.

Then we can compute the reach of this definition in 3 iterations of our algorithm.

3 5 19 35 16 23 45 4 10 17