CDA 5155
description
Transcript of CDA 5155
![Page 1: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/1.jpg)
CDA 5155
Out-of-order execution:
Scoreboarding and Tomasulo
Week 2
![Page 2: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/2.jpg)
A more realistic pipeline
![Page 3: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/3.jpg)
Dependencies
• Read after Write (RAW): true dependency– add 1 2 3 ; nand 3 4 5
• Write after Read (WAR): Anti-dependency– add 1 2 3 ; nand 5 6 2
• Write after Write (WAW): output dependency– add 1 2 3 ; nand 5 6 3
• Read after Read (RAR): usually not a problem– add 1 2 3 ; nand 1 5 6
![Page 4: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/4.jpg)
Out-of-order execution:• Variable latencies make out-of-order execution
desirable• How do we prevent WAR and WAW hazards?• How do we deal with variable latency?
– Forwarding for RAW hazards harder.
Instructionadd 1 2 3mul 4 5 6div 6 7 8add 1 2 7sub 1 2 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14IF ID EX M WB IF ID E1 E2 E3 E4 E5 E6 E7 M WB IF ID x x x x x x E1 E2 E3 E4 … IF ID EX M WB IF ID EX M WB
![Page 5: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/5.jpg)
CDC 6600
![Page 6: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/6.jpg)
Scoreboard: a bookkeeping technique• Out-of-order execution divides ID stage:
1. Issue—decode instructions, check for structural hazards2. Read operands—wait until no data hazards, then read
operands
• Scoreboards date to CDC6600 in 1963• Instructions execute whenever not dependent on
previous instructions and no hazards. • CDC 6600: In order issue, out-of-order execution, out-
of-order commit (or completion)– No forwarding!– Imprecise interrupt/exception model
![Page 7: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/7.jpg)
Scoreboard Architecture(CDC 6600)
Fu
ncti
on
al U
nit
s
Reg
iste
rs
FP MultFP Mult
FP MultFP Mult
FP DivideFP Divide
FP AddFP Add
IntegerInteger
MemorySCOREBOARDSCOREBOARD
Fetch/Issue
![Page 8: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/8.jpg)
Scoreboard Implications• Out-of-order completion => WAR, WAW hazards?• Solutions for WAR:
– Stall writeback until all prior uses of the destination register have been read
– Read registers only during Read Operands stage
• Solution for WAW:– Detect hazard and stall issue of new instruction until other
instruction completes
• Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units
• Scoreboard keeps track of dependencies between instructions that have already issued.
• Scoreboard replaces ID, EX, M and WB with 4 stages
![Page 9: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/9.jpg)
Four Stages of Scoreboard Control
• Issue—decode instructions & check for structural hazards (ID1)– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any
previously issued but uncompleted instruction (no WAW hazards)
• Read operands—wait until no data hazards, then read operands (ID2)– All real dependencies (RAW hazards) resolved in this
stage, since we wait for instructions to write back data.– No forwarding of data in this model!
![Page 10: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/10.jpg)
Four Stages of Scoreboard Control
• Execution—operate on operands (EX)– The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the scoreboard that it has completed execution. This includes memory ops.
• Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:
Example: DIV 1 2 3 ADD 3 4 5 NAND 6 7 4
CDC 6600 scoreboard would stall NAND until ADD reads operands
![Page 11: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/11.jpg)
Three Parts of the Scoreboard• Instruction status: Which state the instruction is in
• Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unit– Busy: Indicates whether the pipeline unit is busy or not– Op: Operation to perform in the unit (e.g., + or –)– Fi: Destination register (for writeback and WAW check)– Fj,Fk: Source-register numbers (for reg read and WAR check)– Qj,Qk: Functional units producing source registers Fj, Fk (wake up)– Rj,Rk: Flags indicating when Fj, Fk are ready (and when already read)
• Register result status:—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register
![Page 12: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/12.jpg)
Scoreboard ExampleInstruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
![Page 13: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/13.jpg)
Detailed Scoreboard Pipeline Control
Read operandsExecutio
n complete
Instruction status
Write result
Issue
Bookkeeping
Rj No; Rk No
f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rj(f) Yes); Result(Fi(FU)) 0; Busy(FU) No
Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;
Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;
Rj and Rk
Functional unit done
Wait until
f((Fj(f)Fi(FU) or Rj(f)=No) & (Fk(f)Fi(FU) or
Rk( f )=No))
Not busy (FU) and not result(D)
Pipeline availableand
no WAW possible
Both src operandsare available
no WAR hazard(no earlier instr has yet to read the dest reg to be written)
![Page 14: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/14.jpg)
Scoreboard Example: Cycle 1Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
![Page 15: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/15.jpg)
Scoreboard Example: Cycle 2Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
• Issue 2nd LD? No Integer pipeline is busy.
![Page 16: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/16.jpg)
Scoreboard Example: Cycle 3Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
• Issue MULT? No, still trying to issue LD
![Page 17: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/17.jpg)
Scoreboard Example: Cycle 4Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU 0
![Page 18: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/18.jpg)
Scoreboard Example: Cycle 5Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
![Page 19: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/19.jpg)
Scoreboard Example: Cycle 6Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 YesMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer
• Issue MULT? Yes, but cant read F2 until LD completes.
![Page 20: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/20.jpg)
Scoreboard Example: Cycle 7Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add
• Read multiply operands? No, F2 is not ready
![Page 21: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/21.jpg)
Scoreboard Example: Cycle 8a(First half of clock cycle)Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide
• LD completes. Update Multd and subd waiting on F2
![Page 22: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/22.jpg)
Scoreboard Example: Cycle 8b(Second half of clock cycle)Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
![Page 23: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/23.jpg)
Scoreboard Example: Cycle 9Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
• Read operands for MULT & SUB? yes • Issue ADDD? No, add pipline busy
Note Remaining
![Page 24: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/24.jpg)
Scoreboard Example: Cycle 10Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No9 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No1 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 Add Divide
![Page 25: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/25.jpg)
Scoreboard Example: Cycle 11Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
![Page 26: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/26.jpg)
Scoreboard Example: Cycle 12Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide
• Read operands for DIVD? No, F0 is not available
![Page 27: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/27.jpg)
Scoreboard Example: Cycle 13Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
![Page 28: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/28.jpg)
Scoreboard Example: Cycle 14Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
![Page 29: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/29.jpg)
Scoreboard Example: Cycle 15Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide
![Page 30: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/30.jpg)
Scoreboard Example: Cycle 16Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide
![Page 31: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/31.jpg)
Scoreboard Example: Cycle 17Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
• Why not write result of ADD???
WAR Hazard!
![Page 32: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/32.jpg)
Scoreboard Example: Cycle 18Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
![Page 33: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/33.jpg)
Scoreboard Example: Cycle 19Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide
![Page 34: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/34.jpg)
Scoreboard Example: Cycle 20Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide
![Page 35: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/35.jpg)
Scoreboard Example: Cycle 21Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide
• WAR Hazard is now gone... Complete the add
no no
![Page 36: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/36.jpg)
Scoreboard Example: Cycle 22Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
39 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide
![Page 37: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/37.jpg)
…
![Page 38: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/38.jpg)
Scoreboard Example: Cycle 61Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
0 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide
![Page 39: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/39.jpg)
Scoreboard Example: Cycle 62Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU
![Page 40: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/40.jpg)
Scoreboard Example: Cycle 62Instruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU
• In-order issue; out-of-order execute & commit
![Page 41: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/41.jpg)
CDC 6600 Scoreboard• Historical context:
– Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefit
• Limitations of 6600 scoreboard:– No forwarding hardware– Limited to instructions in basic block (small window)– Small number of functional units (structural hazards),
especially integer/load store units– Do not issue on structural hazards– Wait for WAR hazards– Prevent WAW hazards
• Precise interrupts?
![Page 42: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/42.jpg)
IBM 360/91
![Page 43: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/43.jpg)
Another Dynamic Algorithm: Tomasulo’s Algorithm
• For IBM 360/91 about 3 years after CDC 6600 (1966)• Goal: High Performance without special compilers• Differences between IBM 360 & CDC 6600 ISA
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM has 4 FP registers vs. 8 in CDC 6600– IBM has memory-register ops
• Small number of floating point registers prevented interesting compiler scheduling of operations– This led Tomasulo to try to figure out how to get more effective
registers — renaming in hardware! • Why Study? The descendants of this have flourished!
– Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, …
![Page 44: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/44.jpg)
Tomasulo Algorithm vs. Scoreboard
• Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard; – FU buffers called “reservation stations”; have pending operands
• Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming ; – avoids WAR, WAW hazards– More reservation stations than registers, so can do optimizations
compilers can’t• Results to FU from RS, not through registers, over Common Data
Bus that broadcasts results to all FUs• Load and Stores treated as FUs with RSs as well• Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue
![Page 45: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/45.jpg)
Tomasulo Organization
FP addersFP adders
Add1Add2Add3
FP multipliersFP multipliers
Mult1Mult2
From Mem FP Registers
Reservation Stations
Common Data Bus (CDB)
To Mem
FP OpQueue
Load Buffers
Store Buffers
Load1Load2Load3Load4Load5Load6
![Page 46: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/46.jpg)
Reservation Station Components
Op: Operation to perform in the unit (e.g., + or –)Vj, Vk: Value of Source operands– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source registers (value to be written)– Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.
![Page 47: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/47.jpg)
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).2. Execute—operate on operands (EX)
When both operands ready then execute; if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;
mark reservation station available• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– Does the broadcast
![Page 48: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/48.jpg)
Tomasulo ExampleInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU
![Page 49: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/49.jpg)
Tomasulo Example Cycle 1Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Load1
![Page 50: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/50.jpg)
Tomasulo Example Cycle 2Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Load2 Load1
Note: Unlike 6600, can have multiple loads outstanding(This was not an inherent limitation of scoreboarding)
![Page 51: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/51.jpg)
Tomasulo Example Cycle 3Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Mult1 Load2 Load1
• Note: registers names are removed (“renamed”) in Reservation Stations; MULT issued vs. scoreboard
• Load1 completing; what is waiting for Load1?
![Page 52: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/52.jpg)
Tomasulo Example Cycle 4Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Mult1 Load2 M(A1) Add1
• Load2 completing; what is waiting for Load1?
![Page 53: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/53.jpg)
Tomasulo Example Cycle 5Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No
10 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Mult1 M(A2) M(A1) Add1 Mult2
![Page 54: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/54.jpg)
Tomasulo Example Cycle 6Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 M(A2) Add2 Add1 Mult2
• Issue ADDD here vs. scoreboard?
![Page 55: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/55.jpg)
Tomasulo Example Cycle 7Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 M(A2) Add2 Add1 Mult2
• Add1 completing; what is waiting for it?
![Page 56: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/56.jpg)
Tomasulo Example Cycle 8Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No2 Add2 Yes ADDD (M-M) M(A2)
Add3 No7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 M(A2) Add2 (M-M) Mult2
![Page 57: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/57.jpg)
Tomasulo Example Cycle 9Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No1 Add2 Yes ADDD (M-M) M(A2)
Add3 No6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 M(A2) Add2 (M-M) Mult2
![Page 58: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/58.jpg)
Tomasulo Example Cycle 10Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No0 Add2 Yes ADDD (M-M) M(A2)
Add3 No5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 M(A2) Add2 (M-M) Mult2
• Add2 completing; what is waiting for it?
![Page 59: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/59.jpg)
Tomasulo Example Cycle 11Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
4 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• Write result of ADDD here vs. scoreboard?• All quick instructions complete in this cycle!
![Page 60: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/60.jpg)
Tomasulo Example Cycle 12Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
3 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 61: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/61.jpg)
Tomasulo Example Cycle 13Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
2 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 62: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/62.jpg)
Tomasulo Example Cycle 14Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
1 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 63: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/63.jpg)
Tomasulo Example Cycle 15Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
0 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
![Page 64: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/64.jpg)
Tomasulo Example Cycle 16Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
![Page 65: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/65.jpg)
Faster than light computation(skip a couple of cycles)
![Page 66: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/66.jpg)
Tomasulo Example Cycle 55Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
55 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
![Page 67: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/67.jpg)
Tomasulo Example Cycle 56Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• Mult2 is completing; what is waiting for it?
![Page 68: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/68.jpg)
Tomasulo Example Cycle 57Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
56 FU M*F4 M(A2) (M-M+M)(M-M) Result
• Once again: In-order issue, out-of-order execution and completion.
![Page 69: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/69.jpg)
Compare to Scoreboard Cycle 62
Instruction status: Read Exec Write Exec WriteInstruction j k Issue Oper Comp Result Issue ComplResultLD F6 34+ R2 1 2 3 4 1 3 4LD F2 45+ R3 5 6 7 8 2 4 5MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 7 9 11 12 4 7 8DIVD F10 F0 F6 8 21 61 62 5 56 57ADDD F6 F8 F2 13 14 16 22 6 10 11
• Why take longer on scoreboard/6600?•Structural Hazards•Lack of forwarding
![Page 70: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/70.jpg)
Tomasulo v. Scoreboard(IBM 360/91 v. CDC 6600)
Pipelined Functional Units Multiple Functional Units
(6 load, 3 store, 3+, 2x/÷) (1 load/store, 1+, 2x, 1÷)
window size: ≤ 14 instructions ≤ 5 instructions
No issue on structural hazard same
WAR: renaming avoids stall completion
WAW: renaming avoids stall issue
Broadcast results from FU Write/read registers
Control: reservation stations central scoreboard
![Page 71: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/71.jpg)
Tomasulo Drawbacks
• Complexity– delays of 360/91, MIPS 10000, IBM 620?
• Many associative stores (CDB) at high speed• Performance limited by Common Data Bus
– Each CDB must go to multiple functional units high capacitance, high wiring density
– Number of functional units that can complete per cycle limited to one!
• Multiple CDBs more FU logic for parallel assoc stores
• Non-precise interrupts!– We will address this later
![Page 72: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/72.jpg)
Tomasulo Loop ExampleLoop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop
• Assume Multiply takes 4 clocks• Assume first load takes 8 clocks (cache miss),
second load takes 1 clock (hit)• To be clear, will show clocks for SUBI, BNEZ• Reality: integer instructions ahead
![Page 73: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/73.jpg)
Loop ExampleInstruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F300 80 Fu
![Page 74: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/74.jpg)
Loop Example Cycle 1Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F301 80 Fu Load1
![Page 75: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/75.jpg)
Loop Example Cycle 2Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F302 80 Fu Load1 Mult1
![Page 76: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/76.jpg)
Loop Example Cycle 3Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F303 80 Fu Load1 Mult1
• Implicit renaming sets up “DataFlow” graph
![Page 77: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/77.jpg)
Loop Example Cycle 4Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F304 80 Fu Load1 Mult1
• Dispatching SUBI Instruction
![Page 78: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/78.jpg)
Loop Example Cycle 5Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F305 72 Fu Load1 Mult1
• And, BNEZ instruction
![Page 79: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/79.jpg)
Loop Example Cycle 6Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F306 72 Fu Load2 Mult1
• Notice that F0 never sees Load from location 80
![Page 80: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/80.jpg)
Loop Example Cycle 7Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F307 72 Fu Load2 Mult2
• Register file completely detached from computation• First and Second iteration completely overlapped
![Page 81: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/81.jpg)
Loop Example Cycle 8Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F308 72 Fu Load2 Mult2
![Page 82: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/82.jpg)
Loop Example Cycle 9Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F309 72 Fu Load2 Mult2
• Load1 completing: who is waiting?• Note: Dispatching SUBI
![Page 83: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/83.jpg)
Loop Example Cycle 10Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3010 64 Fu Load2 Mult2
• Load2 completing: who is waiting?• Note: Dispatching BNEZ
![Page 84: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/84.jpg)
Loop Example Cycle 11Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3011 64 Fu Load3 Mult2
• Next load in sequence
![Page 85: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/85.jpg)
Loop Example Cycle 12Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3012 64 Fu Load3 Mult2
• Why not issue third multiply?
![Page 86: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/86.jpg)
Loop Example Cycle 13Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3013 64 Fu Load3 Mult2
![Page 87: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/87.jpg)
Loop Example Cycle 14Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3014 64 Fu Load3 Mult2
• Mult1 completing. Who is waiting?
![Page 88: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/88.jpg)
Loop Example Cycle 15Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8
0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3015 64 Fu Load3 Mult2
• Mult2 completing. Who is waiting?
![Page 89: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/89.jpg)
Loop Example Cycle 16Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3016 64 Fu Load3 Mult1
![Page 90: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/90.jpg)
Loop Example Cycle 17Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3017 64 Fu Load3 Mult1
![Page 91: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/91.jpg)
Loop Example Cycle 18Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3018 64 Fu Load3 Mult1
![Page 92: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/92.jpg)
Loop Example Cycle 19Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3019 64 Fu Load3 Mult1
![Page 93: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/93.jpg)
Loop Example Cycle 20Instruction status: Exec Write
ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result status
Clock R1 F0 F2 F4 F6 F8 F10 F12 ... F3020 64 Fu Load3 Mult1
![Page 94: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/94.jpg)
Why can Tomasulo overlap iterations of loops?
• Register renaming– Multiple iterations use different physical destinations for
registers (dynamic loop unrolling).
• Reservation stations – Permit instruction issue to advance past integer control
flow operations– Also buffer old values of registers - totally avoiding the
WAR stall that we saw in the scoreboard.
• Other idea: Tomasulo building “DataFlow” graph on the fly.
![Page 95: CDA 5155](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815a93550346895dc80c47/html5/thumbnails/95.jpg)
Summary #1• Reservations stations: implicit register renaming to larger set
of registers + buffering source operands– Prevents registers as bottleneck– Avoids WAR, WAW hazards of Scoreboard– Allows loop unrolling in HW
• Not limited to basic blocks (integer units gets ahead, beyond branches)
• Helps cache misses as well• Lasting Contributions
– Dynamic scheduling– Register renaming– Load/store disambiguation (see fig 3.5 – execute stage)
• 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264