Lecture 6: Scoreboarding and Tomasulo...

67
1 Lecture 6: Scoreboarding and Tomasulo Algorithm

Transcript of Lecture 6: Scoreboarding and Tomasulo...

Page 1: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

1

Lecture 6: Scoreboarding and TomasuloAlgorithm

Page 2: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 2

History

1966: scoreboarding in CDC6600, implementing limited dynamic schedulingThree years later: Tomasulo in IBM 360/91, introducing register renaming and reservation stationNow appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 3: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 3

Scoreboarding OverviewBasic idea:

Use scoreboard to track data (RAW) dependence through register

Main points of design:Instructions are sent to FU unit if there is no outstanding name dependenceRAW data dependence is tracked and enforced by scoreboardRegister values are passed through the register file; a child instruction starts execution after the last parent finishes executionPipeline stalls if any name dependence (WAR or WAW) is detected

Page 4: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 4

Machine CorrectnessE(D,P) = E(S,P) if

1. E(D,P) and E(S,P) execute the same set of instructions

2.2. For any inst For any inst ii, , ii receives the receives the outputs outputs in in E(D,P) E(D,P) of its parentsof its parents in E(S,P)in E(S,P)

3. In E(D,P) any register or memory word receives the output of inst j, where j is the last instruction writes to the register or memory word in E(S,P)

Scoreboarding merit: Be able to execute independent instructions in parallel without violating statement 2.

Page 5: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 5

Four Stages of Scoreboard ControlFetch first, then1. Issue—decode instructions & check for structural

hazardsWait conditions: (1) the required FU is free; (2) no other instruction writes to the same register destination (to avoid WAW)Actions: (1) the instruction proceeds to the FU; (2) scoreboard updates its internal data structure

If an instruction is stalled at this stage, no other instructions can proceed

2. Read operands—wait until no data hazards, then read operands

Wait conditions: all source operands are availableActions: the function unit reads register operands and start execution the next cycle

Page 6: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 6

Four Stages of Scoreboard Control3.Execution—operate on operands (EX)

Actions: The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.

4.Write result—finish execution (WB)Wait condition: no other instruction/FU is going to read the register destination of the instructionActions: Write the register and update the scoreboardWAR Example:

DIVD F0,F2,F4ADDD F10,F0,F8SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

Page 7: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 7

Code ExampleLD F6,34(R2)LD F2,45(R3)MULTI F0,F2,F4SUBD F8,F6,F2DIVD F10,F0,F6ADD F6,F8,F2

LD1 LD2

MULTISUBD

DIVDADD

Operation latencies: load/store 2 cycles,Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle

Page 8: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 8

Scoreboard Connections

Scoreboard

FP multFP mult

FP div

FP add

INT unit

Registers

Control/status

Control/status

Page 9: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

9

Three Parts of the Scoreboard1. Instruction status—which of 4 steps the instruction is in

2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready

3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

Page 10: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

10

Scoreboard ExampleInstruction status Read ExecutioWriteInstruction j k Issue operandcomplet ResultLD F6 34+ R2LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

FU

Page 11: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

11

Scoreboard Example Cycle 1Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU Integer

Page 12: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

12

Scoreboard Example Cycle 2Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Integer

• Issue 2nd LD?

Page 13: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

13

Scoreboard Example Cycle 3Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Integer

• Issue MULT?

Page 14: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

14

Scoreboard Example Cycle 4Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU Integer

Page 15: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

15

Scoreboard Example Cycle 5Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULT F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Integer

Page 16: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

16

Scoreboard Example Cycle 6Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULT F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult1 Integer

Page 17: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

17

Scoreboard Example Cycle 7Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult1 Integer Add

• Read multiply operands?

Page 18: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

18

Scoreboard Example Cycle 8aInstruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Integer Add Divide

Page 19: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

19

Scoreboard Example Cycle 8bInstruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 Add Divide

Page 20: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

20

Scoreboard Example Cycle 9Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

10 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 Add Divide

• Read operands for MULT & SUBD? Issue ADDD?

Page 21: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

21

Scoreboard Example Cycle 11Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

8 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

11 FU Mult1 Add Divide

Page 22: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

22

Scoreboard Example Cycle 12Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

7 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

12 FU Mult1 Divide

• Read operands for DIVD?

Page 23: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

23

Scoreboard Example Cycle 13Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

6 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

13 FU Mult1 Add Divide

Page 24: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

24

Scoreboard Example Cycle 14Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

5 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

2 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

14 FU Mult1 Add Divide

Page 25: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

25

Scoreboard Example Cycle 15Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

4 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

1 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

15 FU Mult1 Add Divide

Page 26: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

26

Scoreboard Example Cycle 16Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

3 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 No

0 Add Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

16 FU Mult1 Add Divide

Page 27: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

27

Scoreboard Example Cycle 17Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

2 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

17 FU Mult1 Add Divide

• Write result of ADDD?

Page 28: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

28

Scoreboard Example Cycle 18Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

1 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

18 FU Mult1 Add Divide

Page 29: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

29

Scoreboard Example Cycle 19Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No

0 Mult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

19 FU Mult1 Add Divide

Page 30: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

30

Scoreboard Example Cycle 20Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

20 FU Add Divide

Page 31: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

31

Scoreboard Example Cycle 21Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Yes Yes

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

21 FU Add Divide

Page 32: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

32

Scoreboard Example Cycle 22Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

40 Divide Yes Div F10 F0 F6 Yes YesRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

22 FU Divide

Page 33: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

33

Scoreboard Example Cycle 61Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide Yes Div F10 F0 F6 Yes YesRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

61 FU Divide

Page 34: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

34

Scoreboard Example Cycle 62Instruction status Read ExecutioWriteInstruction j k Issue operand completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULT F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?

Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No

0 Divide NoRegister result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

62 FU

Page 35: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

35

Scoreboard Scheduling

22161413ADDD6261218DIVD121197SUBD201996MULT8765LD4321LD

Write Result

Execution complete

Read operands

Issue

Inst

Page 36: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

36

CDC 6600 ScoreboardSpeedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefitFirst implement for dynamic scheduling (though limited)Limitations of 6600 scoreboard as for dynamic scheduling

Stall on name dependence (WAR and WAW), which is not really necessaryInstruction parallelism is limited by # of function unitsNo forwarding hardware

Page 37: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

37

Tomasulo OverviewBasic Idea:

Remove name dependence by renaming register in executionIntroduce tag-broadcasting in instruction scheduling

Main point of designInstructions are decoded and then renamedRenamed instructions are sent to reservation stationsReservation stations track and enforce register data (RAW) dependencesA child instruction can start execution after the last parent finishes writeback and does broadcasting; in this case, register values are passed through broadcastingPrevent an early register write from overwriting the value of a later register write to enforce name dependence (WAR and WAW)

Page 38: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

38

Three Stages of Tomasulo AlgorithmAfter fetch and decode, 1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2.Execution—operate on operands (EX)When both operands ready then execute;if not ready, watch Common Data Bus for result

3.Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available

Issue: build dependence for new instWriteback: Wakeup dependent instructions

Adapted from UCB CS252 S98

Page 39: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

39

Issue Stage and Renaming TableRenames its two source registers (source renaming)Assigns it to a free RSUpdates Renaming table (dest renaming)Also decodes the inst and read register values in parallel

How would the following inst be renamed?ADD $16, $8, $9ADD $17, $16, $16

Page 40: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

40

Execute Stage

Only “ready” instructions can join the competitionThere is a select logic to select instructions for FU execution

Some policy may be used, e.g. age basedNon-ready instructions can be “waken up” during writeback of its parent inst

Page 41: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

41

Writeback and Common Data BusNormal data bus: data + destination (“go to” bus)Common data bus: data + source (“come from” bus)

64 bits of data + 4 bits of source index (tag)Does the broadcast to every instruction in the fly

Child instructions do tag matching and update their ready bits and value fields (if the tag matches theirs)

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 42: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

42

Code ExampleLD F6,34(R2)LD F2,45(R3)MULTI F0,F2,F4SUBD F8,F6,F2DIVD F10,F0,F6ADD F6,F8,F2

LD1 LD2

MULTISUBD

DIVDADD

Operation latencies: load/store 2 cycles,Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle

Page 43: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

43

Tomasulo Example Cycle 0

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULT F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No0 Mult1 No0 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

0 FU

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 44: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

44

Tomasulo Example Cycle 1

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No 34+R2LD F2 45+ R3 Load2 NoMULT F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU Load1

Yes

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 45: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

45

Tomasulo Example Cycle 2

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULT F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU Load2 Load1

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 46: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

46

Tomasulo Example Cycle 3

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU Mult1 Load2 Load1

• Note: registers names are removed (“renamed”) in Reservation Stations

• Load1 completing; what is waiting for Load1?Adapted from UCB CS252 S98, Copyright 1998 USB

Page 47: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

47

Tomasulo Example Cycle 4

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU Mult1 Load2 M(34+R2) Add1

• Load2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 48: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

48

Tomasulo Example Cycle 5

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 No

Add3 No10 Mult1 Yes MULTD M(45+R3) R(F4)

0 Mult2 Yes DIVD M(34+R2) Mult1Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 49: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

49

Tomasulo Example Cycle 6

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No9 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU Mult1 M(45+R3) Add2 Add1 Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 50: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

50

Tomasulo Example Cycle 7

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No8 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU Mult1 M(45+R3) Add2 Add1 Mult2

• Add1 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 51: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

51

Tomasulo Example Cycle 8

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No 2 Add2 Yes ADDD M()-M() M(45+R3)0 Add3 No7 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU Mult1 M(45+R3) Add2 M()-M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 52: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

52

Tomasulo Example Cycle 9

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()朚() M(45+R3)0 Add3 No6 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU Mult1 M(45+R3) Add2 M()朚() Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 53: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

53

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()朚() M(45+R3)0 Add3 No5 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

10 FU Mult1 M(45+R3) Add2 M()朚() Mult2

Tomasulo Example Cycle 10

• Add2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 54: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

54

Tomasulo Example Cycle 11

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No4 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

11 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2

• Write result of ADDD here vs. scoreboard?Adapted from UCB CS252 S98, Copyright 1998 USB

Page 55: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

55

Tomasulo Example Cycle 12

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMUL F0 F2 F4 3 Load3 NoSUB F8 F6 F2 4 7 8DIVDF10 F0 F6 5ADD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No3 Mult1 Yes MULTM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

12 FU Mult1 M(45+R3) (M-M)+M()M()–M(Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 56: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

56

Tomasulo Example Cycle 13

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No2 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

13 FU Mult1 M(45+R3) (M朚)+M() M()朚() Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 57: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

57

Tomasulo Example Cycle 14

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No1 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

14 FU Mult1 M(45+R3) (M朚)+M() M()朚() Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 58: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

58

Tomasulo Example Cycle 15

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

15 FU Mult1 M(45+R3) (M朚)+M() M()朚() Mult2

• Mult1 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 59: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

59

Tomasulo Example Cycle 16

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No

40 Mult2 Yes DIVD M*F4 M(34+R2)Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

16 FU M*F4 M(45+R3) (M朚)+M() M()朚() Mult2

• Note: Just waiting for divide

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 60: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

60

Tomasulo Example Cycle 55

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

55 FU M*F4 M(45+R3) (M朚)+M() M()朚() Mult2

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 61: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

61

Tomasulo Example Cycle 56

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

56 FU M*F4 M(45+R3) (M朚)+M() M()朚() Mult2

• Mult 2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 62: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

62

Tomasulo Example Cycle 57

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30

57 FU M*F4 M(45+R3) (M朚)+M() M()朚() M*F4/M

• Again, in-oder issue, out-of-order execution, completion

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 63: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 63

Review DependencesHow are dependences are enforced or

removed in Tomasulo Algorithm?

Data dependences (RAW)Antidependence (WAR)Output Dependence (WAW)

Dependences can be through register or memory.

Page 64: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 64

Data Dependence Through RegisterFor any inst i, i receives the outputs in E(D,P) of its

parents in E(S,P)

Assume i is dependent on k through register Rx, how does i receives k’s output?

1. If k.WriteResult → i.Issue:2. If i.Issue → k.WriteResult: 3. If i.Issue ↔ k.WriteResult:

In all cases, i must receive the right value.

Page 65: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 65

Name Dependence Through RegisterFor any inst i, i receives the outputs in E(D,P) of its

parents in E(S,P)In E(D,P) any register or memory word receives the

output of inst j, where j is the last instructionwrites to the register or memory word in E(S,P)

Assume i is prior to j. 1. WAR dependence, i reads some Rx and j writes to itWhat would happen if j.WriteResult → i.Issue or

j.WriteResult → i.Execute?

2. WAW dependence, both i and j writes to some Rx What would happen to the register content if j.WriteResult

→ i.WriteResult?

Page 66: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 66

What is TagTag is a modern name

In Tomasulo, RS or load/store buffer index is used as tag.Renaming assign new inst a unique tagRS stores tags to preserve dependencesCDB broadcasts tag with data for data passing and wakeup

Why tag is so chosen?

What does tag really represent?

What can be used as tag?

Page 67: Lecture 6: Scoreboarding and Tomasulo Algorithmusers.utcluj.ro/~sebestyen/_Word_docs/Cursuri/SSC... · scoreboard updates its internal data structure If an instruction is stalled

Zhao Zhang, CPRE 581, Fall 2005 67

Tomasulo SummaryReservations stations:

Increases effective register numberDistributes scheduling logic

Register renaming: Avoids WAR and WAW dependence

Tag + Data broadcasting for waking up child instructions

Pros: can be effectively combined with speculative execution

Cons: CDB broadcasting adds one-cycle delay

Adapted from UCB CS252 S98, Copyright 1998 USB