Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling :...

30
ECE 252 / CPS 220 Lecture Notes Dynamic Scheduling I 1 © 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti Dynamic Scheduling I • basic pipeline started with single, in-order issue, single-cycle operations • have extended this basic pipeline with • multi-cycle operations • multiple issue (superscalar) • now: dynamic scheduling (out-of-order issue) • Scoreboard: OoO without solving WAW/WAR • Tomasulo’s algorithm: OoO + register renaming to fix WAR/WAW • next half unit: dynamic scheduling II • dynamic scheduling + precise state + speculation • advanced topic: dynamic load scheduling

Transcript of Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling :...

Page 1: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

1© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Dynamic Scheduling I

• basic pipeline started with single, in-order issue, single-cycle operations

• have extended this basic pipeline with• multi-cycle operations

• multiple issue (superscalar)

• now: dynamic scheduling (out-of-order issue)• Scoreboard: OoO without solving WAW/WAR

• Tomasulo’s algorithm: OoO + register renaming to fix WAR/WAW

• next half unit: dynamic scheduling II• dynamic scheduling + precise state + speculation

• advanced topic: dynamic load scheduling

Page 2: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

2© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Readings

H+P

• chapter 2

Recent Research Papers (can read these soon)

• Pentium4

• Complexity-Effective Superscalar

• Checkpoint Processing and Recovery

Page 3: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

3© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Dynamic Scheduling: Motivation

• cycle4: addf stalls due to RAW hazard• OK, fundamental problem

• also cycle4: mulf stalls due to pipeline hazard (addf stalls)• why? mulf can’t proceed into ID because addf is there

• but that’s the only reason ⇒ not good enough!

• why can’t we decode mulf in cycle 4 and execute it in c5?• no fundamental reason why we can’t do this!

1 2 3 4 5 6 7 8 9 10divf f0,f2,f4 F D E/ E/ E/ E/ Waddf f6,f0,f2 F D d* d* d* E+ E+ Wmulf f8,f2,f4 F p* p* p* D E* E* W

Page 4: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

4© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Dynamic Scheduling

dynamic scheduling (out-of-order execution)

• execute instructions in non-sequential (non-vonNeumann) order

+ reduce stalls

+ improve functional unit utilization

+ enable parallel execution (not in-order ⇒ can be in parallel)

• make it appear like sequential execution: precise interrupts• very important

– but hard

• next unit of this course

Page 5: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

5© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scheduling

scheduling: re-arranging instructions to maximize performance

• requires knowledge about structure of processor

• requires knowledge about latencies and dependences

two options for who should schedule instructions

• static scheduling: by compiler

• dynamic scheduling: by hardware

Page 6: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

6© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Before We Start

why build complicated hardware if we can do this in software?

+ performance portability• don’t want to recompile for new machines

+ more information available to hardware• addresses, branch directions, cache misses unknown to compiler

+ more resources available to hardware• may not have enough architectural registers to fix WAR/WAW

+ easier to speculate in hardware• easier to recover from mis-speculation

– but compiler can look at more instructions• it’s possible to do combination of both

• compiler does as much as it can, hardware does rest

Page 7: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

7© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

The Problem with In-Order Pipelines

in-order pipeline

• simple 4-stage: IF,ID, EX (multiple cycle, includes M), WB

• structural hazard: 1 instruction register (latch) per stage• 1 instruction per stage per cycle (unless pipe is replicated)

• younger instruction can’t pass older without “killing” it

out-of-order pipeline

• must implement “passing” functionality

I$

regfile

IDIF WBEX

F/D D/XPC X/W

Page 8: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

8© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Instruction Buffer

trick: instruction buffer (many names for this buffer)

• basically: a bunch of latches for holding instructions• this is the scope of instructions that the scheduler can see

• split ID into two pieces• accumulate decoded instructions in buffer in-order

• buffer sends instructions down rest of pipe out-of-order

I$

regfile

ID1IF WBEX

F/D D/XPC X/W

ID2

instruction buffer

Page 9: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

9© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Dispatch and Issue

• dispatch (DS): first part of ID• allocate resources in instruction buffer

– new kind of structural hazard (instruction buffer could be full)

• dispatch is in-order, and stall propagates to younger instructions

• issue (IS): second part of ID• send instructions from instruction buffer to execution units

• out-of-order, wait does NOT propagate to younger instructions

I$

regfile

DSIF WBEX

F/D D/XPC X/W

IS

instruction buffer

Page 10: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

10© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

DS Method #1: Scoreboarding

instruction buffer ⇒ scoreboard

• centralized control scheme• no bypassing

• no elimination of WAR/WAW hazards

• first implementation: CDC6600 [1964]• 16 separate non-pipelined functional units

• 4 FP, 5 memory, 7 integer

• our example: Simple Scoreboard• 5 functional units: 1 ALU, 1 load, 1 store, 2 FP (3-cycle, pipelined)

• for simplicity, assume 1-wide pipeline (not superscalar)

Page 11: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

11© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Data Structures

• instruction status: 1 entry per “active” instruction• which stage instruction is in (presence in scoreboard implies DS)

• functional unit (FU) status: 1 entry per FU• busy : FU is busy, op : current operation

• R1,R2, R: source and destination registers

• T1, T2: tags of FUs producing source registers

• T: tag of FU producing destination register

• register status: 1 entry per architectural register• T: tag of FU (if any) that will write the register

• tag fields interpreted as “ready bits” (conversely “busy bits”)• tag == 0: register value is ready (in register file)

• tag != 0: register value is not ready (will be supplied by [tag])

Page 12: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

12© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

reg status

Simple Scoreboard

• instruction fields and status bits

• tags

• values

FU

value

WB

T

EXIS T2T1R2R1R

T

==

FU status

fetchedinsns

inst status

T

RF

Page 13: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

13© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Pipeline

new pipeline structure: IF, DS, IS, EX, WB

• DS (dispatch) from fetch to the scoreboard• (no scoreboard entry/structural hazard/WAW) ? (stall) : (allocate)

• IS (issue) to the functional units• (RAW hazard) ? (wait) : (read registers, go directly to execute)

• EX (execute)• execute operation, notify scoreboard when done

• WB (writeback)• (WAR hazard) ? (wait) : (write register, free scoreboard entry)

• assume• WB and RAW-dependent IS can take place in same cycle

• WB and structural-dependent DS can take place in same cycle

Page 14: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

14© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

reg status

Scoreboard: Dispatch (DS)

• stall for WAW and structural hazards, but otherwise:• allocate scoreboard entry

• copy status for input registers

• set status for output register

WB

T

EXIS

T

==

FU status

fetchedinsns

inst status

R2R1 T2T1R T

value

RF

FU

Page 15: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

15© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

reg status

Scoreboard: Issue (IS)

• wait for RAW hazards (T1 or T2 not empty), but otherwise:• read registers

FU

value

WBEXIS

==

FU status

fetchedinsns

inst status

R2R1 T2T1R T

T

T

RF

Page 16: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

16© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard: Execute (EX)

WBEXIS

FU status

fetchedinsns

inst status

R2R1 T2T1R T

value

RF

FU

Treg status

T

==

Page 17: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

17© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard: Writeback (WB)

• wait for WAR hazards, but otherwise:• writeback result

• compare tags with waiting instructions

• on match: clear tag (set input to “ready”)

value

WB

T

EXIS

T

==

FU status

fetchedRF

insns

inst status

reg status

R2R1 T2T1R T

FU

Page 18: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

18© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Running Example

SAX: simplified SAXPY DO I = 1,N Z[I] = A*X[I]

assembly code:

loop: ldf f0,X(r1) // f0=X[i], assume I in r1 mulf f4,f0,f2 // assume A in f2 stf f4,Z(r1) // Z[i]=A*X[i] add r1,r1,#4 // I=I+4 ble r1,r2,loop // assume 4N in r2

consider two iterations, ignore branch

Page 19: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

19© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Data Structures

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Nostore NoFP1 NoFP2 No

Register Statusregister Tf0f2f4r1

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)add r1,r1,#4ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)

Page 20: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

20© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 1

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Yes ldf f0 r1store NoFP1 NoFP2 No

Register Statusreg T

f0 loadf2f4r1

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1mulf f4,f0,f2stf f4,Z(r1)add r1,r1,#4ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)

allocate

Page 21: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

21© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 2

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Yes ldf f0 r1store NoFP1 Yes mulf f4 f0 f2 loadFP2 No

Register Statusregister Tf0 loadf2f4 FP1r1

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2mulf f4,f0,f2 c2stf f4,Z(r1)add r1,r1,#8ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)

allocate

Page 22: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

22© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 3

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Yes ldf f0 r1store Yes stf f4 r1 FP1FP1 Yes mulf f4 f0 f2 loadFP2 No

Register Statusregister Tf0 loadf2f4 FP1r1

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3mulf f4,f0,f2 c2stf f4,Z(r1) c3add r1,r1,#8ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)

allocatestalled on RAW

Page 23: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

23© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 4

Functional unit statusT busy op R R1 R2 T1 T2

ALU Yes add r1 r1load Nostore Yes stf f4 r1 FP1FP1 Yes mulf f4 f0 f2 loadFP2 No

Register Statusregister Tf0 loadf2f4 FP1r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4stf f4,Z(r1) c3add r1,r1,#8 c4ldf f0,X(r1)mulf f4,f0,f2stf f4,Z(r1)

result written,

free

clear status

allocate

f0 now ready

Page 24: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

24© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 5

Functional unit statusT busy op R R1 R2 T1 T2

ALU Yes add r1 r1load Yes ldf f0 r1 ALUstore Yes stf f4 r1 FP1FP1 Yes mulf f4 f0 f2FP2 No

Register Statusregister Tf0 loadf2f4 FP1r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5stf f4,Z(r1) c3add r1,r1,#8 c4 c5ldf f0,X(r1) c5mulf f4,f0,f2stf f4,Z(r1)

allocate

Page 25: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

25© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 6

Functional unit statusT busy op R R1 R2 T1 T2

ALU Yes add r1 r1load Yes ldf f0 r1 ALUstore Yes stf f4 r1 FP1FP1 Yes mulf f4 f0 f2FP2 No

Register Statusregister Tf0 loadf2f4 FP1r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5+stf f4,Z(r1) c3add r1,r1,#8 c4 c5 c6ldf f0,X(r1) c5mulf f4,f0,f2stf f4,Z(r1)

DS stall: WAW hazard w/ mulf (f4)

Page 26: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

26© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Functional unit statusT busy op R R1 R2 T1 T2

ALU Yes add r1 r1load Yes ldf f0 r1 ALUstore Yes stf f4 r1 FP1FP1 Yes mulf f4 f0 f2FP2 No

Scoreboard Example: Cycle 7

Register Statusregister Tf0 loadf2f4 FP1r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5+stf f4,Z(r1) c3add r1,r1,#8 c4 c5 c6ldf f0,X(r1) c5mulf f4,f0,f2stf f4,Z(r1) DS stall: WAW hazard w/ mulf (f4)

WB stall: WAR hazard w/ stf (r1)

Page 27: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

27© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 8

Functional unit statusT busy op R R1 R2 T1 T2

ALU Yes add r1 r1load Yes ldf f0 r1 ALUstore Yes stf f4 r1 FP1FP1 NoFP2 Yes mulf f4 f0 f2 load

Register Statusregister Tf0 loadf2f4 FP1FP2r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5+ c8stf f4,Z(r1) c3 c8add r1,r1,#8 c4 c5 c6ldf f0,X(r1) c5mulf f4,f0,f2 c8stf f4,Z(r1)

WB stall due to WAR hazard

free allocate

f4 is ready

first mulf (FP1)is finished

Page 28: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

28© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 9

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Yes ldf f0 r1 ALUstore Yes stf f4 r1FP1 NoFP2 Yes mulf f4 f0 f2 load

Register Statusregister Tf0 loadf2f4 FP2r1 ALU

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5+ c8stf f4,Z(r1) c3 c8 c9add r1,r1,#8 c4 c5 c6 c9ldf f0,X(r1) c5 c9mulf f4,f0,f2 c8stf f4,Z(r1)

add wrote

free entry

r1 is ready

DS stall due to structural hazard

Page 29: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

29© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Example: Cycle 10

Functional unit statusT busy op R R1 R2 T1 T2

ALU Noload Yes ldf f0 r1store Yes stf f4 r1 FP2FP1 NoFP2 Yes mulf f4 f0 f2 load

Register Statusregister Tf0 loadf2f4 FP2r1

Instruction Statusinstruction DS IS EX WB

ldf f0,X(r1) c1 c2 c3 c4mulf f4,f0,f2 c2 c4 c5+ c8stf f4,Z(r1) c3 c8 c9 c10add r1,r1,#4 c4 c5 c6 c9ldf f0,X(r1) c5 c9 c10mulf f4,f0,f2 c8stf f4,Z(r1) c10 WB and dependent DS in same cycle

free then allocate

Page 30: Dynamic Scheduling Ipeople.ee.duke.edu/~sorin/ece252/lectures/4.1-tomasulo.pdf · scheduling : re-arranging instructions to maximize performance • requires knowledge about structure

ECE 252 / CPS 220 Lecture NotesDynamic Scheduling I

30© 2009 by Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

Scoreboard Redux

+ cheap hardware• scoreboard is cheap (~1 FU in area)

• pretty good performance• 1.7X for FORTRAN programs

• 2.5X for hand-coded assembly (how would a compiler do?)

– no bypassing• RAW dependences handled through registers

– limited scheduling scope• WAW/structural hazards force in-order dispatch

• WAR hazards delay writeback and issue of dependent operations

• can solve these problems with register renaming!