Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

35
UNIVERSITY NIVERSITY OF OF D DELAWARE ELAWARE C COMPUTER & OMPUTER & INFORMATION NFORMATION SCIENCES CIENCES DEPARTMENT EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling John Cavazos (Ben Perry) University of Delaware

description

Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling. John Cavazos (Ben Perry) University of Delaware. Overview. Introduction Pipelining Instruction Pipeline Pipeline Execution Constraints and Dependences. Current Processors. - PowerPoint PPT Presentation

Transcript of Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

Page 1: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2011Gobal Instruction Scheduling

John Cavazos(Ben Perry)

University of Delaware

Page 2: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 2

Overview

Introduction Pipelining

Instruction Pipeline Pipeline Execution

Constraints and Dependences

Page 3: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Current Processors

Can execute several operations in a single cycle

“How fast can a program run on a processor with instruction-level parallelism?” Potential parallelism in the program Available parallelism on the processor Ability to parallelize a sequential program Find best schedule given constraints

Page 4: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 4

Best targets

Programs with operations that are completely dependent on each other are no good Focus on constraints instead of scheduling

Numeric applications with large aggregate data structures are good.

Page 5: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 5

Pipelines

Instruction Pipelines are found in every processor

Instructions go through multiple steps in the pipeline from read to execute Fetch, decode, execute, access memory,

write result Parallel processors: new instruction can

be fetched while current instruction is processed.

Each step in the pipeline takes a clock cycle

Page 6: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Example pipeline

i i+1 i+2 i+3 i+4

1 Fetch

2 Identify Fetch

3 Execute Identify Fetch

4 Read Execute Identify Fetch

5 Write Read Execute Identify Fetch

6 Write Read Execute Identify

7 Write Read Execute

8 Write Read

9 Write

6

Page 7: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 7

Pipelines – Speculative Computing

Load next instruction even if it may be branched over (speculative)

On a branch event, the pipeline is emptied and the branch must be fetched. (delay)

Hardware can predict which branch to fetch, but it may be wrong

Page 8: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Pipeline Execution

Execution of an instruction is pipelined if succeeding instructions not dependent on the result are allowed to proceed.

Hardware can often detect dependencies (superscaler machines) and pause execution if operand isn’t available

8

Page 9: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Pipeline Execution

Some processors (Android phone, perhaps), leave batch execution to compilers.

Very-long-instruction-words (VLIW) are created by compiler that indicate a batch of instructions to execute in parallel.

Out-of-order instructions can be scheduled by advanced schedulers; best done at software due to hardware limitations

9

Page 10: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Code-scheduling Constraints

Control-dependence – All operations executed in original must be executed

Data-dependence – Must produce same results as original

Resource

10

Page 11: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Data dependence

11

X = 5; Y = 6 Obviously, we can reorder these

operations. X = 5; Y = X Obviously, we cannot reorder

these.

Page 12: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Data dependence

RAW – Read after write. True dependence. If a write is followed by a read of the

same location, the read depends on the value written

WAR – Write after Read. Anti-dependence If the write happens before the read,

the read will get the wrong value.

12

Page 13: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 13

Dependence

WAW – Write after Write. If two writes go to the same location,

the value will be wrong WAR and WAW can be eliminated using

different locations to store different values.

Page 14: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Finding dependences

Compiler: GUILTY until proven innocent! (always assume operations refer to same location, and prove it otherwise).

Pointers p and (p + 10) cannot possibly refer to the same location

Array data dependence analysis: for i=0 to n: a[2i] = a[2i + 1]. No dependency in array during this loop

14

Page 15: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Finding dependences

Pointer alias analysis Two pointers are aliased if they

refer to the same object. Difficult problem.

Interprocedural Analysis Parameters passed by reference, or

if globals are passed

15

Page 16: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Register allocation

LD temporary_register1, aST b, temporary_register1LD temporary_register2, cST d, temporary_register2

Two RAWs, but can be reordered. If temporary_registers 1 and 2 get

mapped to the same physical register, we create another dependency

16

Page 17: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Control dependence

All operations in a basic block are guaranteed to execute. But they’re small And often highly related.

Optimize across other basic blocks is crucial.

17

Page 18: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Control dependence

An instruction i1 is control dependent on instruction i2 if the outcome of i2 determines whether i1 is to be executed

Speculatively execute across different basic-blocks

18

Page 19: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Speculative computing

Prefectching Bring data from memory to the

cache before it is needed Poison bits

Don’t throw exceptions when speculatively computing. Instead, set poison bit. If poison registered is really used, then throw exception.

19

Page 20: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Speculative computing

Predicated Execution Change

if (a == 0) b = c To

st r4, r3movif r2, r4, r1

Processor supports a conditional store, enabling combination of basic blocks

20

Page 21: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Basic Block List Scheduling

NP-complete, but don’t give up. Basic blocks are typically small. Start with data-dependence graph

Nodes are instructions and resource annotations

Edges are data dependences with a delay destination has to wait (some instructions may take 10 cycles, others only 1).

21

Page 22: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

List Scheduling

Data dependence cannot have cycles Build a topological ordering of the

nodes several such orderings may exist,

though some are better than others Choose an ordering of the nodes such

that for each node, any following node cannot create a dependence on it.

22

Page 23: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

List Scheduling

RT = an empty reservation tableForeach n in SortedNodes:

-Find the earliest time instruction could begin -Delay the instruction until resources are available-Schedule node after all delays-claim resources

23

Page 24: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

List Scheduling – better topologies

Longest path through the data-dependence graph is shortest schedule.

Resources available constrain; critical resource is the one with the largest ratio of uses to the number of units of that resource available.

24

Page 25: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Global Code Scheduling

Optimize use of resources across blocks.

Global Code Scheduling - Moving instructions from one basic block to another

Data AND control dependencies. All instructions still must be performed Speculative computing cannot be

disruptive.

25

Page 26: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Global Code Scheduling example

if (!a) {c=b;}e=d+d

What are the data dependences? What are the control

dependences? What can intuitively be ran in

parallel?

26

Page 27: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Global Code Scheduling Example

if (!a) {c=b;}e=d+d

Loads take two clock ticks, always hit. R1 = a, R2 = b, …,

Processor can execute two instructions

27

Page 28: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

if (!a) {c=b;}e=d+d

28

Block 1 Block 2 Block 3

load r6, r1

idle load r7, r2

idle load r8, r4

idle

noop idle noop idle noop idle

jumpz r6, b3

idle store r3, r7

idle add r8,r8,r8

idle

st r5, r8 idle

Page 29: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

if (!a) {c=b;}e=d+d

29

Block 1 Block 2 Block 3

load r6, r1

idle load r7, r2

idle load r8, r4

idle

noop idle noop idle noop idle

jumpz r6, b3

idle store r3, r7

idle add r8,r8,r8

idle

st r5, r8 idleBlock 1 Block 2 Block 3

load r6, r1

load r8, r4

st r5, r8 idle st r5, r8 st r3, r7

Load r7, r2

idle

add r8,r8,r8

jumpz r6, b3

Page 30: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Code movement

Definitions: Dominates – A dominates B if all paths

through B pass through A. Post-dominates – B post-dominates A if all

paths that pass through A pass through B. Downward – Move operation down

along control Upward – Move operation up along

control

30

Page 31: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Upward Code Movement

Moving instruction from block src to block dest. Block src comes after block dest in the topological-sorted graph. Assume no dependencies.

If dest dominates src and src post-dominates dest, then we’re done.

31

Page 32: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Upward Code Movement

If src does not postdominate dst, then we have to speculatively compute Only desirable if the operation is

cheap Only useful if src is reached.

If dst does not dominate src, copies of the instruction are needed

32

Page 33: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Downward Code Movement

Moving instruction from block src to block dest. Block src comes before block dest in the topological-sorted graph. Assume no dependencies

If src dominates dest and dest dominates src, we’re done.

33

Page 34: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Downward Code Movement

If src does not dominate dest, Writes are often overwritten Extra operations will be needed. Replicate basic blocks and place

operation in new copy of dest Alternatively, use predicated instructions (speculative)

If dest does not post-dominate src, Compensation code

34

Page 35: Optimizing Compilers CISC 673 Spring 2011 Gobal Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 35

Conclusion

Processors can execute several instructions in parallel

We take advantage of this by moving code

Code can be moved if no dependencies occur, but sometimes at a cost.