CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9:...
-
Upload
bryan-cole -
Category
Documents
-
view
219 -
download
0
Transcript of CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9:...
![Page 1: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/1.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 1
CIS 501: Computer Architecture
Unit 9: Static & Dynamic SchedulingSlides originally developed by
Drew Hilton, Amir Roth, Milo Martin and Joe Deviettiat University of Pennsylvania
![Page 2: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/2.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 2
This Unit: Static & Dynamic Scheduling
• Code scheduling• To reduce pipeline stalls• To increase ILP (insn level
parallelism)
• Static scheduling by the compiler• Approach & limitations
• Dynamic scheduling in hardware• Register renaming• Instruction selection• Handling memory operations
CPUMem I/O
System software
AppApp App
![Page 3: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/3.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 3
Readings
• Textbook (MA:FSPTCM)• Sections 3.3.1 – 3.3.4 (but not “Sidebar:”)• Sections 5.0-5.2, 5.3.3, 5.4, 5.5
• Paper for group discussion and questions:• “Memory Dependence Prediction using Store Sets”
by Chrysos & Emer
• Suggested reading• “The MIPS R10000 Superscalar Microprocessor”
by Kenneth Yeager
![Page 4: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/4.jpg)
Code Scheduling & Limitations
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 4
![Page 5: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/5.jpg)
Code Scheduling
• Scheduling: act of finding independent instructions• “Static” done at compile time by the compiler (software)• “Dynamic” done at runtime by the processor (hardware)
• Why schedule code?• Scalar pipelines: fill in load-to-use delay slots to improve
CPI• Superscalar: place independent instructions together
• As above, load-to-use delay slots• Allow multiple-issue decode logic to let them execute at
the same time
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 5
![Page 6: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/6.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 6
Compiler Scheduling
• Compiler can schedule (move) instructions to reduce stalls• Basic pipeline scheduling: eliminate back-to-back load-
use pairs• Example code sequence: a = b + c; d = f – e;
• sp stack pointer, sp+0 is “a”, sp+4 is “b”, etc… Before
ld [sp+4] r2➜ld [sp+8]➜r3add r2,r3 r1 //stall➜st r1 [sp+0]➜ld [sp+16] r5➜ld [sp+20]➜r6sub r6,r5 r4 //stall➜st r4 [sp+12]➜
After
ld [sp+4] r2➜ld [sp+8]➜r3ld [sp+16] r5➜add r2,r3 r1 //no stall➜ld [sp+20]➜r6st r1 [sp+0]➜sub r6,r5 r4 //no stall➜st r4 [sp+12]➜
![Page 7: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/7.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 7
Compiler Scheduling Requires
• Large scheduling scope• Independent instruction to put between load-use pairs+ Original example: large scope, two independent
computations– This example: small scope, one computation
• Compiler can create larger scheduling scopes• For example: loop unrolling & function inlining
Before
ld [sp+4] r2➜ld [sp+8]➜r3add r2,r3 r1 //stall➜st r1 [sp+0]➜
After (same!)
ld [sp+4] r2➜ld [sp+8]➜r3add r2,r3 r1 //stall➜st r1 [sp+0]➜
![Page 8: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/8.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Scheduling Scope Limited by Branches
r1 and r2 are inputsloop: jz r1, not_found ld [r1+0] r3➜ sub r2,r3 r4➜ jz r4, found ld [r1+4] r1➜ jmp loop
Legal to move load up past branch?No: if r1 is null, will cause a fault
Aside: what does this code do?Searches a linked list for an element
8
![Page 9: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/9.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 9
Compiler Scheduling Requires
• Enough registers• To hold additional “live” values• Example code contains 7 different values (including sp)• Before: max 3 values live at any time 3 registers enough• After: max 4 values live 3 registers not enough
Original
ld [sp+4]➜r2ld [sp+8]➜r1add r1,r2➜r1 //stallst r1 [sp+0]➜ld [sp+16]➜r2ld [sp+20]➜r1sub r2,r1➜r1 //stallst r1 [sp+12]➜
Wrong!
ld [sp+4]➜r2ld [sp+8]➜r1ld [sp+16]➜r2add r1,r2➜r1 // wrong r2ld [sp+20]➜r1st r1 [sp+0] ➜ // wrong r1sub r2,r1➜r1st r1 [sp+12]➜
![Page 10: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/10.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 10
Compiler Scheduling Requires• Alias analysis
• Ability to tell whether load/store reference same memory locations• Effectively, whether load/store can be rearranged
• Previous example: easy, loads/stores use same base register (sp)
• New example: can compiler tell that r8 != r9?• Must be conservative
Before
ld [r9+4] r2➜ld [r9+8] r3➜add r3,r2➜r1 //stallst r1➜[r9+0]ld [r8+0] r5➜ld [r8+4] r6➜sub r5,r6➜r4 //stallst r4➜[r8+8]
Wrong(?)
ld [r9+4] r2➜ld [r9+8] r3➜ld [r8+0] r5➜ //does r8==r9?add r3,r2➜r1 ld [r8+4] r6➜ //does r8+4==r9?st r1➜[r9+0] sub r5,r6➜r4st r4➜[r8+8]
![Page 11: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/11.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 11
A Good Case: Static Scheduling of SAXPY• SAXPY (Single-precision A X Plus Y)
• Linear algebra routine (used in solving systems of equations)
for (i=0;i<N;i++) Z[i]=(A*X[i])+Y[i];0: ldf [X+r1] f1 // loop➜1: mulf f0,f1 f2 // A in f0➜2: ldf [Y+r1] f3 // X,Y,Z are constant addresses➜3: addf f2,f3 f4➜4: stf f4 [Z+r1]➜5: addi r1,4 r1 // i in r1➜6: blt r1,r2,0 // N*4 in r2• Static scheduling works great for SAXPY
• All loop iterations independent• Use loop unrolling to increase scheduling scope• Aliasing analysis is tractable (just ensure X, Y, Z are
independent)• Still limited by number of registers
![Page 12: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/12.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 12
Unrolling & Scheduling SAXPY• Fuse two (in general K) iterations of loop
• Fuse loop control: induction variable (i) increment + branch
• Adjust register names & induction uses (constants constants+4)
• Reorder operations to reduce stallsldf [X+r1]➜f1mulf f0,f1➜f2ldf [Y+r1]➜f3addf f2,f3➜f4stf f4 [➜ Z+r1]addi r1,4➜r1blt r1,r2,0 ldf [X+r1] f1➜mulf f0,f1 f2➜ldf [Y+r1] f3➜addf f2,f3 f4➜stf f4 [Z+r1]➜addi r1,4 r1➜blt r1,r2,0
ldf [X+r1]➜f1mulf f0,f1➜f2ldf [Y+r1]➜f3addf f2,f3➜f4stf f4➜[Z+r1]
ldf [X+r1+4]➜f5mulf f0,f5➜f6ldf [Y+r1+4]➜f7addf f6,f7➜f8stf f8 [Z+r1➜ +4]addi r1,8 r1➜blt r1,r2,0
ldf [X+r1]➜f1ldf [X+r2+4]➜f5mulf f0,f1➜f2mulf f0,f5➜f6ldf [Y+r1]➜f3ldf [Y+r1+4]➜f7addf f2,f3➜f4addf f6,f7➜f8stf f4➜[Z+r1]stf f8 [Z+r1➜ +4]addi r1,8 r1➜blt r1,r2,0
![Page 13: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/13.jpg)
Compiler Scheduling Limitations
• Scheduling scope• Example: can’t generally move memory operations past
branches
• Limited number of registers (set by ISA)
• Inexact “memory aliasing” information• Often prevents reordering of loads above stores by
compiler
• Caches misses (or any runtime event) confound scheduling• How can the compiler know which loads will miss vs hit?• Can impact the compiler’s scheduling decisions
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 13
![Page 14: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/14.jpg)
501 News
• Paper Review #5 out• due on Wed 20 Nov
• only 2 homework assignments to go!
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 14
![Page 15: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/15.jpg)
Dynamic (Hardware) Scheduling
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 15
![Page 16: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/16.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 16
Can Hardware Overcome These Limits?
• Dynamically-scheduled processors• Also called “out-of-order” processors• Hardware re-schedules insns…• …within a sliding window of VonNeumann insns• As with pipelining and superscalar, ISA unchanged
• Same hardware/software interface, appearance of in-order
• Increases scheduling scope• Does loop unrolling transparently!• Uses branch prediction to “unroll” branches
• Examples:• Pentium Pro/II/III (3-wide), Core 2 (4-wide),
Alpha 21264 (4-wide), MIPS R10000 (4-wide), Power5 (5-wide)
![Page 17: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/17.jpg)
Example: In-Order Limitations #1
• In-order pipeline, three-cycle load-use penalty• 2-wide
• Why not the following?
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 17
0 1 2 3 4 5 6 7 8 9 10
11
12
Ld [r1] ➜ r2 F D X M1 M2 W
add r2 + r3 ➜ r4 F D d* d* d* X M1 M2
W
xor r4 ^ r5 ➜ r6 F D d* d* d* X M1
M2
W
ld [r7] ➜ r4 F D p* p* p* X M1
M2
W
0 1 2 3 4 5 6 7 8 9 10
11 12
Ld [r1] ➜ r2 F D X M1 M2 W
add r2 + r3 ➜ r4 F D d* d* d* X M1
M2
W
xor r4 ^ r5 ➜ r6 F D d* d* d* X M1
M2
W
ld [r7] ➜ r4 F D X M1
M2
W
![Page 18: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/18.jpg)
Example: In-Order Limitations #2
• In-order pipeline, three-cycle load-use penalty• 2-wide
• Why not the following:
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 18
0 1 2 3 4 5 6 7 8 9 10
11
12
Ld [p1] ➜ p2 F D X M1 M2 W
add p2 + p3 ➜ p4
F D d* d* d* X M1 M2
W
xor p4 ^ p5 ➜ p6 F D d* d* d* X M1
M2
W
ld [p7] ➜ p8 F D p* p* p* X M1
M2
W
0 1 2 3 4 5 6 7 8 9 10
11 12
Ld [p1] ➜ p2 F D X M1 M2 W
add p2 + p3 ➜ p4
F D d* d* d* X M1
M2
W
xor p4 ^ p5 ➜ p6
F D d* d* d* X M1
M2
W
ld [p7] ➜ p8 F D X M1
M2
W
![Page 19: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/19.jpg)
Out-of-Order to the Rescue
• “Dynamic scheduling” done by the hardware• Still 2-wide superscalar, but now out-of-order, too
• Allows instructions to issues when dependences are ready• Longer pipeline
• In-order front end: Fetch, “Dispatch”• Out-of-order execution core:
• “Issue”, “RegisterRead”, Execute, Memory, Writeback• In-order retirement: “Commit”
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 19
0 1 2 3 4 5 6 7 8 9 10
11
12
Ld [p1] ➜ p2 F Di I RR
X M1
M2
W C
add p2 + p3 ➜ p4
F Di I RR
X W C
xor p4 ^ p5 ➜ p6
F Di I RR
X W C
ld [p7] ➜ p8 F Di I RR
X M1
M2
W C
![Page 20: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/20.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Out-of-Order PipelineF
etch
Dec
ode
Ren
ame
Dis
patc
h
Com
mit
Buffer of instructions
Issu
e
Reg
-rea
d
Exe
cute
Writ
ebac
k
20
In-order front endOut-of-order execution
In-order commit
![Page 21: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/21.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 21
Out-of-Order Execution
• Also called “Dynamic scheduling” • Done by the hardware on-the-fly during execution
• Looks at a “window” of instructions waiting to execute• Each cycle, picks the next ready instruction(s)
• Two steps to enable out-of-order execution:Step #1: Register renaming – to avoid “false” dependenciesStep #2: Dynamically schedule – to enforce “true”
dependencies
• Key to understanding out-of-order execution:• Data dependencies
![Page 22: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/22.jpg)
Dependence types
• RAW (Read After Write) = “true dependence” (true)mul r0 * r1 ➜ r2…add r2 + r3 ➜ r4
• WAW (Write After Write) = “output dependence” (false)mul r0 * r1➜ r2…add r1 + r3 ➜ r2
• WAR (Write After Read) = “anti-dependence” (false)mul r0 * r1 ➜ r2…add r3 + r4 ➜ r1
• WAW & WAR are “false”, Can be totally eliminated by “renaming”
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 22
![Page 23: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/23.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 23
Step #1: Register Renaming• To eliminate register conflicts/hazards• “Architected” vs “Physical” registers – level of
indirection• Names: r1,r2,r3 • Locations: p1,p2,p3,p4,p5,p6,p7• Original mapping: r1p1, r2p2, r3p3, p4–p7 are
“available”
• Renaming – conceptually write each register once + Removes false dependences+ Leaves true dependences intact!
• When to reuse a physical register? After overwriting insn done
MapTable FreeList Original insns Renamed insns
r1 r2 r3p1 p2 p3 p4,p5,p6,p7 add r2,r3 r1➜ add p2,p3 p4➜p4 p2 p3 p5,p6,p7 sub r2,r1 r3➜ sub p2,p4 p5➜p4 p2 p5 p6,p7 mul r2,r3 r3➜ mul p2,p5 p6➜p4 p2 p6 p7 div r1,4 r1➜ div p4,4 p7➜
![Page 24: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/24.jpg)
Register Renaming Algorithm
• Two key data structures: • maptable[architectural_reg] physical_reg• Free list: allocate (new) & free registers (implemented as a
queue)• Algorithm: at “decode” stage for each instruction:
insn.phys_input1 = maptable[insn.arch_input1]insn.phys_input2 = maptable[insn.arch_input2]insn.old_phys_output = maptable[insn.arch_output]new_reg = new_phys_reg()maptable[insn.arch_output] = new_reginsn.phys_output = new_reg
• At “commit”• Once all older instructions have committed, free registerfree_phys_reg(insn.old_phys_output)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 24
![Page 25: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/25.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Out-of-order PipelineF
etch
Dec
ode
Ren
ame
Dis
patc
h
Com
mit
Buffer of instructions
Issu
e
Reg
-rea
d
Exe
cute
Writ
ebac
k
Have unique register namesNow put into out-of-order execution structures
25
In-order front endOut-of-order execution
In-order commit
![Page 26: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/26.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 26
regfile
D$I$
BP
insn buffer
SD
add p2,p3 p4➜sub p2,p4 p5➜mul p2,p5 p6➜div p4,4 p7➜
Ready TableP2 P3 P4 P5 P6 P7Yes YesYes Yes YesYes Yes Yes Yes YesYes Yes Yes Yes Yes Yes
div p4,4➜p7mul p2,p5➜p6sub p2,p4➜p5add p2,p3➜p4
and
Step #2: Dynamic Scheduling
• Instructions fetch/decoded/renamed into Instruction Buffer• Also called “instruction window” or “instruction scheduler”
• Instructions (conceptually) check ready bits every cycle• Execute oldest “ready” instruction, set output as “ready”
Tim
e
![Page 27: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/27.jpg)
Dynamic Scheduling/Issue Algorithm
• Data structures:• Ready table[phys_reg] yes/no (part of “issue queue”)
• Algorithm at “issue” stage (prior to read registers):foreach instruction:
if table[insn.phys_input1] == ready && table[insn.phys_input2] == ready then insn is “ready”
select the oldest “ready” instructiontable[insn.phys_output] = ready
• Multiple-cycle instructions? (such as loads)• For an insn with latency of N, set “ready” bit N-1 cycles in
future
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 27
![Page 28: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/28.jpg)
Register Renaming
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 28
![Page 29: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/29.jpg)
Register Renaming Algorithm (Simplified)
• Two key data structures: • maptable[architectural_reg] physical_reg• Free list: allocate (new) & free registers (implemented as a
queue)• Algorithm: at “decode” stage for each instruction:
insn.phys_input1 = maptable[insn.arch_input1]insn.phys_input2 = maptable[insn.arch_input2]
new_reg = new_phys_reg()maptable[insn.arch_output] = new_reginsn.phys_output = new_reg
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 29
![Page 30: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/30.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p6
p7
p8
p9
p10
30
![Page 31: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/31.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p6
p7
p8
p9
p10
xor p1 ^ p2 ➜xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
31
![Page 32: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/32.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p6
p7
p8
p9
p10
xor p1 ^ p2 ➜ p6xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
32
![Page 33: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/33.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p7
p8
p9
p10
xor p1 ^ p2 p6➜xor r1 ^ r2 ➜ r3add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
33
![Page 34: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/34.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p7
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
34
![Page 35: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/35.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p7
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 ➜ p7
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
35
![Page 36: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/36.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜
xor r1 ^ r2 r3➜add r3 + r4 ➜ r4sub r5 - r2 r3➜addi r3 + 1 r1➜
36
![Page 37: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/37.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
37
![Page 38: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/38.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 ➜ p8
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
38
![Page 39: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/39.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 ➜ r3addi r3 + 1 r1➜
39
![Page 40: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/40.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
40
![Page 41: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/41.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 ➜ p9
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
41
![Page 42: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/42.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Renaming example
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 ➜ r1
42
![Page 43: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/43.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling
Out-of-order PipelineF
etch
Dec
ode
Ren
ame
Dis
patc
h
Com
mit
Buffer of instructions
Issu
e
Reg
-rea
d
Exe
cute
Writ
ebac
k
Have unique register namesNow put into out-of-order execution structures
43
![Page 44: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/44.jpg)
Dynamic Scheduling Mechanisms
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 44
![Page 45: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/45.jpg)
Dispatch
• Put renamed instructions into out-of-order structures
• Re-order buffer (ROB)• Holds instructions until commit
• Issue Queue• Central piece of scheduling logic• Holds un-executed instructions• Tracks ready inputs
• Physical register names + ready bit• “AND” the bits to tell if ready
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 45
Insn Inp1 R Inp2 R Dst
Ready?
Bday
![Page 46: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/46.jpg)
Dispatch Steps
• Allocate Issue Queue (IQ) slot• Full? Stall
• Read ready bits of inputs• 1-bit per physical reg
• Clear ready bit of output in table• Instruction has not produced value yet
• Write data into Issue Queue (IQ) slot
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 46
![Page 47: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/47.jpg)
Dispatch Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 47
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
Insn Inp1 R Inp2 R Dst Bday
Issue Queue
p1 y
p2 y
p3 y
p4 y
p5 y
p6 y
p7 y
p8 y
p9 y
Ready bits
![Page 48: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/48.jpg)
Dispatch Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 48
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
Issue Queue
p1 y
p2 y
p3 y
p4 y
p5 y
p6 n
p7 y
p8 y
p9 y
Ready bitsxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
![Page 49: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/49.jpg)
Dispatch Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 49
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
add p6 n p4 y p7 1
Issue Queue
p1 y
p2 y
p3 y
p4 y
p5 y
p6 n
p7 n
p8 y
p9 y
Ready bitsxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
![Page 50: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/50.jpg)
Dispatch Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 50
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
add p6 n p4 y p7 1
sub p5 y p2 y p8 2
Issue Queue
p1 y
p2 y
p3 y
p4 y
p5 y
p6 n
p7 n
p8 n
p9 y
Ready bitsxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
![Page 51: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/51.jpg)
Dispatch Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 51
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
add p6 n p4 y p7 1
sub p5 y p2 y p8 2
addi p8 n --- y p9 3
Issue Queue
p1 y
p2 y
p3 y
p4 y
p5 y
p6 n
p7 n
p8 n
p9 n
Ready bitsxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
![Page 52: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/52.jpg)
Out-of-order pipeline
• Execution (out-of-order) stages• Select ready instructions
• Send for execution• Wakeup dependents
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 52
Issue
Reg-read
Execute
Writeback
![Page 53: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/53.jpg)
Dynamic Scheduling/Issue Algorithm
• Data structures:• Ready table[phys_reg] yes/no (part of issue queue)
• Algorithm at “schedule” stage (prior to read registers):foreach instruction:
if table[insn.phys_input1] == ready && table[insn.phys_input2] == ready then insn is “ready”
select the oldest “ready” instructiontable[insn.phys_output] = ready
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 53
![Page 54: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/54.jpg)
Issue = Select + Wakeup
• Select oldest of “ready” instructions “xor” is the oldest ready instruction below “xor” and “sub” are the two oldest ready instructions
below• Note: may have resource constraints: i.e.
load/store/floating point
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 54
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
add p6 n p4 y p7 1
sub p5 y p2 y p8 2
addi p8 n --- y p9 3
Ready!
Ready!
![Page 55: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/55.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 55
Issue = Select + Wakeup• Wakeup dependent instructions
• Search for destination (Dst) in inputs & set “ready” bit• Implemented with a special memory array circuit
called a Content Addressable Memory (CAM)• Also update ready-bit table for future instructions
• For multi-cycle operations (loads, floating point)• Wakeup deferred a few cycles• Include checks to avoid structural hazards
Insn Inp1 R Inp2 R Dst Bday
xor p1 y p2 y p6 0
add p6 y p4 y p7 1
sub p5 y p2 y p8 2
addi p8 y --- y p9 3
p1 y
p2 y
p3 y
p4 y
p5 y
p6 y
p7 n
p8 y
p9 n
Ready bits
![Page 56: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/56.jpg)
Issue• Select/Wakeup one cycle• Dependent instructions execute on back-to-back cycles
• Next cycle: add/addi are ready:
• Issued instructions are removed from issue queue• Free up space for subsequent instructions
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 56
Insn Inp1 R Inp2 R Dst Bday
add p6 y p4 y p7 1
addi p8 y --- y p9 3
![Page 57: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/57.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 57
p1 7
p2 3
p3 4
p4 9
p5 6
p6 0
p7 0
p8 0
p9 0
xor RDYadd sub RDY addi
![Page 58: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/58.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 58
p1 7
p2 3
p3 4
p4 9
p5 6
p6 0
p7 0
p8 0
p9 0
add RDY addi RDY
xor
p1
^ p
2
p6
➜su
b p
5 -
p2
p
8➜
![Page 59: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/59.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 59
p1 7
p2 3
p3 4
p4 9
p5 6
p6 0
p7 0
p8 0
p9 0
add
p6
+p
4 p7
➜a
ddi p
8 +
1
p9
➜
xor
7^
3 p
6➜
sub
6 -
3
p8
➜
![Page 60: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/60.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 60
p1 7
p2 3
p3 4
p4 9
p5 6
p6 0
p7 0
p8 0
p9 0
add
_ +
9
p7
➜a
ddi _
+1
p9
➜
4
p6
➜3
p
8➜
![Page 61: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/61.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 61
p1 7
p2 3
p3 4
p4 9
p5 6
p6 4
p7 0
p8 3
p9 0
13
p7
➜4
p
9➜
![Page 62: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/62.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 62
p1 7
p2 3
p3 4
p4 9
p5 6
p6 4
p7 13
p8 3
p9 4
![Page 63: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/63.jpg)
OOO execution (2-wide)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 63
p1 7
p2 3
p3 4
p4 9
p5 6
p6 4
p7 13
p8 3
p9 4
Note similarityto in-order
![Page 64: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/64.jpg)
When Does Register Read Occur?
• Current approach: after select, right before execute• Not during in-order part of pipeline, in out-of-order
part• Read physical register (renamed)• Or get value via bypassing (based on physical register
name)• This is Pentium 4, MIPS R10k, Alpha 21264, IBM Power4,
Intel’s “Sandy Bridge” (2011)• Physical register file may be large
• Multi-cycle read• Older approach:
• Read as part of “issue” stage, keep values in Issue Queue• At commit, write them back to “architectural register
file”• Pentium Pro, Core 2, Core i7• Simpler, but may be less energy efficient (more data
movement)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 64
![Page 65: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/65.jpg)
Renaming Revisited
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 65
![Page 66: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/66.jpg)
Re-order Buffer (ROB)• ROB entry holds all info for recovery/commit
• All instructions & in order• Architectural register names, physical register names, insn type• Not removed until very last thing (“commit”)
• Operation• Dispatch: insert at tail (if full, stall)• Commit: remove from head (if not yet done, stall)
• Purpose: tracking for in-order commit• Maintain appearance of in-order execution• Done to support:
• Misprediction recovery• Freeing of physical registers
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 66
![Page 67: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/67.jpg)
Renaming revisited
• Track (or “log”) the “overwritten register” in ROB• Free this register at commit• Also used to restore the map table on “recovery”
• Branch mis-prediction recovery
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 67
![Page 68: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/68.jpg)
Register Renaming Algorithm (Full)
• Two key data structures: • maptable[architectural_reg] physical_reg• Free list: allocate (new) & free registers (implemented as a
queue)• Algorithm: at “decode” stage for each instruction:
insn.phys_input1 = maptable[insn.arch_input1]insn.phys_input2 = maptable[insn.arch_input2]insn.old_phys_output = maptable[insn.arch_output]new_reg = new_phys_reg()maptable[insn.arch_output] = new_reginsn.phys_output = new_reg
• At “commit”• Once all older instructions have committed, free
registerfree_phys_reg(insn. old_phys_output)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 68
![Page 69: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/69.jpg)
Recovery
• Completely remove wrong path instructions• Flush from IQ• Remove from ROB• Restore map table to before misprediction• Free destination registers
• How to restore map table?• Option #1: log-based reverse renaming to recover each
instruction• Tracks the old mapping to allow it to be reversed• Done sequentially for each instruction (slow)• See next slides
• Option #2: checkpoint-based recovery• Checkpoint state of maptable and free list each cycle• Faster recovery, but requires more state
• Option #3: hybrid (checkpoint for branches, unwind for others)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 69
![Page 70: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/70.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 70
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p6
p7
p8
p9
p10
![Page 71: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/71.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 71
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p6
p7
p8
p9
p10
xor p1 ^ p2 ➜xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ]
![Page 72: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/72.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 72
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p7
p8
p9
p10
xor p1 ^ p2 p6➜xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ]
![Page 73: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/73.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 73
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p7
p8
p9
p10
xor p1 ^ p2 p6 ➜add p6 + p4 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ]
![Page 74: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/74.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 74
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ]
![Page 75: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/75.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 75
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p8
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ]
![Page 76: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/76.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 76
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ]
![Page 77: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/77.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 77
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p9
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 ➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
![Page 78: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/78.jpg)
Renaming example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 78
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
![Page 79: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/79.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 79
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
bnz p1, loopxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
bnz r1 loopxor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ ][ p3 ][ p4 ][ p6 ][ p1 ]
Now, let’s use this info. to recover from a branch misprediction
![Page 80: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/80.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 80
r1 p1
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
bnz p1, loopxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 ➜ p9
bnz r1 loopxor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ ][ p3 ][ p4 ][ p6 ][ p1 ]
p9
![Page 81: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/81.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 81
r1 p1
r2 p2
r3 p6
r4 p7
r5 p5
Map table Free-list
p10
bnz p1, loopxor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 ➜ p8
bnz r1 loopxor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜
[ ][ p3 ][ p4 ][ p6 ]
p9
p8
![Page 82: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/82.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 82
r1 p1
r2 p2
r3 p6
r4 p4
r5 p5
Map table Free-list
p10
bnz p1, loopxor p1 ^ p2 p6➜add p6 + p4 ➜ p7
bnz r1 loopxor r1 ^ r2 r3➜add r3 + r4 r4➜
[ ][ p3 ][ p4 ]
p9
p8
p7
![Page 83: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/83.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 83
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p10
bnz p1, loopxor p1 ^ p2 ➜ p6
bnz r1 loopxor r1 ^ r2 r3➜
[ ][ p3 ]
p9
p8
p7
p6
![Page 84: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/84.jpg)
Recovery Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 84
r1 p1
r2 p2
r3 p3
r4 p4
r5 p5
Map table Free-list
p10
bnz p1, loopbnz r1 loop [ ]
p9
p8
p7
p6
![Page 85: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/85.jpg)
Commit
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 85
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
• Commit: instruction becomes architected state
• In-order, only when instructions are finished
• Free overwritten register (why?)
![Page 86: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/86.jpg)
Freeing over-written register
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 86
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
• P3 was r3 before xor
• P6 is r3 after xor
• Anything older than xor should read p3
• Anything younger than xor should read p6 (until another insn writes r3)
• At commit of xor, no older instructions exist
![Page 87: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/87.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 87
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
p10
![Page 88: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/88.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 88
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
xor p1 ^ p2 p6➜add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
xor r1 ^ r2 r3➜add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p3 ][ p4 ][ p6 ][ p1 ]
p3
p10
![Page 89: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/89.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 89
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
add p6 + p4 p7➜sub p5 - p2 p8➜addi p8 + 1 p9➜
add r3 + r4 r4➜sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p4 ][ p6 ][ p1 ]
p4
p3
![Page 90: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/90.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 90
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
sub p5 - p2 p8➜addi p8 + 1 p9➜
sub r5 - r2 r3➜addi r3 + 1 r1➜
[ p6 ][ p1 ]
p4
p3
p6
![Page 91: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/91.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 91
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
addi p8 + 1 p9➜addi r3 + 1 r1➜ [ p1 ]
p4
p3
p6
p1
![Page 92: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/92.jpg)
Commit Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 92
r1 p9
r2 p2
r3 p8
r4 p7
r5 p5
Map table Free-list
p10
p4
p3
p6
p1
![Page 93: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/93.jpg)
Textbook OoO Terminology
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 93
![Page 94: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/94.jpg)
Textbook:Lecture “Map Table”
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 94
Textbook Lecture
instruction buffer -
decode buffer -
register map map table
reservation station issue queue entry
ROB ROB
logical register file register file
physical register file register file
![Page 95: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/95.jpg)
Lecture version of Textbook Table 3.4
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 95
Step/Stage resources read resources written/utilized
Fetch PC, branch predictor, I$
PC
Decode-rename
map table map table, ROB
Dispatch ready table ROB, issue queue
Issue issue queue, regfile functional units
Execute D$ functional units, issue queue, ROB, D$, branch predictor, regfile
Commit ROB ROB, map table, D$
![Page 96: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/96.jpg)
Dynamic Scheduling Example
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 96
![Page 97: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/97.jpg)
Dynamic Scheduling Example
• The following slides are a detailed but concrete example
• Yet, it contains enough detail to be overwhelming• Try not to worry about the details
• Focus on the big picture take-away:
Hardware can reorder instructions to extract instruction-level parallelism
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 97
![Page 98: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/98.jpg)
Recall: Motivating Example
• How would this execution occur cycle-by-cycle?
• Execution latencies assumed in this example:• Loads have two-cycle load-to-use penalty
• Three cycle total execution latency• All other instructions have single-cycle execution latency
• “Issue queue”: hold all waiting (un-executed) instructions• Holds ready/not-ready status• Faster than looking up in ready table each cycle 98
0 1 2 3 4 5 6 7 8 9 10
11
12
ld [p1] ➜ p2 F Di I RR
X M1
M2
W C
add p2 + p3 ➜ p4
F Di I RR
X W C
xor p4 ^ p5 ➜ p6
F Di I RR
X W C
ld [p7] ➜ p8 F Di I RR
X M1
M2
W C
![Page 99: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/99.jpg)
Out-of-Order Pipeline – Cycle 00 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F
add r2 + r3 ➜ r4 F
xor r4 ^ r5 ➜ r6
ld [r7] ➜ r4
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 ---
p10 ---p11 ---p12 ---
Map Table
r1 p8
r2 p7
r3 p6
r4 p5
r5 p4
r6 p3
r7 p2
r8 p1
Insn
To Free
Done?
ld noadd no
ReorderBuffer
![Page 100: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/100.jpg)
Out-of-Order Pipeline – Cycle 1a0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di
add r2 + r3 ➜ r4 F
xor r4 ^ r5 ➜ r6
ld [r7] ➜ r4
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 ---p11 ---p12 ---
Map Table
r1 p8
r2 p9
r3 p6
r4 p5
r5 p4
r6 p3
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd no
ReorderBuffer
![Page 101: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/101.jpg)
Out-of-Order Pipeline – Cycle 1b0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6
ld [r7] ➜ r4
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 ---p12 ---
Map Table
r1 p8
r2 p9
r3 p6
r4 p10
r5 p4
r6 p3
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 no
ReorderBuffer
![Page 102: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/102.jpg)
Out-of-Order Pipeline – Cycle 1c0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F
ld [r7] ➜ r4 F
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 ---p12 ---
Map Table
r1 p8
r2 p9
r3 p6
r4 p10
r5 p4
r6 p3
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor nold no
ReorderBuffer
![Page 103: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/103.jpg)
Out-of-Order Pipeline – Cycle 2a0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F
ld [r7] ➜ r4 F
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 ---p12 ---
Map Table
r1 p8
r2 p9
r3 p6
r4 p10
r5 p4
r6 p3
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor nold no
ReorderBuffer
![Page 104: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/104.jpg)
Out-of-Order Pipeline – Cycle 2b0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
xor p10 no p4 yes
p11 2
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 nop12 ---
Map Table
r1 p8
r2 p9
r3 p6
r4 p10
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold no
ReorderBuffer
![Page 105: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/105.jpg)
Out-of-Order Pipeline – Cycle 2c0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F Di
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
xor p10 no p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 nop12 no
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 106: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/106.jpg)
Out-of-Order Pipeline – Cycle 30 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F Di I
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 no p6 yes
p10 1
xor p10 no p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 no
p10 nop11 nop12 no
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 107: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/107.jpg)
Out-of-Order Pipeline – Cycle 40 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X
add r2 + r3 ➜ r4 F Di
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F Di I RR
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 no p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 yes
p10 nop11 nop12 no
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 108: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/108.jpg)
Out-of-Order Pipeline – Cycle 5a0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
add r2 + r3 ➜ r4 F Di I
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F Di I RR
X
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 yes
p10 yesp11 nop12 no
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 109: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/109.jpg)
Out-of-Order Pipeline – Cycle 5b0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
add r2 + r3 ➜ r4 F Di I
xor r4 ^ r5 ➜ r6 F Di
ld [r7] ➜ r4 F Di I RR
X
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 yes
p10 yesp11 nop12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 110: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/110.jpg)
Out-of-Order Pipeline – Cycle 60 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
add r2 + r3 ➜ r4 F Di I RR
xor r4 ^ r5 ➜ r6 F Di I
ld [r7] ➜ r4 F Di I RR
X M1
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 noadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 111: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/111.jpg)
Out-of-Order Pipeline – Cycle 70 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W
add r2 + r3 ➜ r4 F Di I RR
X
xor r4 ^ r5 ➜ r6 F Di I RR
ld [r7] ➜ r4 F Di I RR
X M1
M2
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 yesp8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 112: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/112.jpg)
Out-of-Order Pipeline – Cycle 8a0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X
xor r4 ^ r5 ➜ r6 F Di I RR
ld [r7] ➜ r4 F Di I RR
X M1
M2
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 ---p8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 noxor p3 nold p10 no
ReorderBuffer
![Page 113: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/113.jpg)
Out-of-Order Pipeline – Cycle 8b0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X W
xor r4 ^ r5 ➜ r6 F Di I RR
X
ld [r7] ➜ r4 F Di I RR
X M1
M2
W
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 yesp6 yesp7 ---p8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 yesxor p3 nold p10 yes
ReorderBuffer
![Page 114: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/114.jpg)
Out-of-Order Pipeline – Cycle 9a0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X W C
xor r4 ^ r5 ➜ r6 F Di I RR
X
ld [r7] ➜ r4 F Di I RR
X M1
M2
W
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 ---p6 yesp7 ---p8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 yesxor p3 nold p10 yes
ReorderBuffer
![Page 115: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/115.jpg)
Out-of-Order Pipeline – Cycle 9b0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X W C
xor r4 ^ r5 ➜ r6 F Di I RR
X W
ld [r7] ➜ r4 F Di I RR
X M1
M2
W
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 yesp4 yesp5 ---p6 yesp7 ---p8 yesp9 yes
p10 yesp11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 yesxor p3 yesld p10 yes
ReorderBuffer
![Page 116: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/116.jpg)
Out-of-Order Pipeline – Cycle 100 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X W C
xor r4 ^ r5 ➜ r6 F Di I RR
X W C
ld [r7] ➜ r4 F Di I RR
X M1
M2
W C
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 ---p4 yesp5 ---p6 yesp7 ---p8 yesp9 yes
p10 ---p11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 yesxor p3 yesld p10 yes
ReorderBuffer
![Page 117: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/117.jpg)
Out-of-Order Pipeline – Done!0 1 2 3 4 5 6 7 8 9 1
011
12
ld [r1] ➜ r2 F Di I RR
X M1
M2
W C
add r2 + r3 ➜ r4 F Di I RR
X W C
xor r4 ^ r5 ➜ r6 F Di I RR
X W C
ld [r7] ➜ r4 F Di I RR
X M1
M2
W C
Issue Queue
Insn
Src1
R? Src2
R? Dest
Bdy
ld p8 yes
--- yes
p9 0
add p9 yes
p6 yes
p10 1
xor p10 yes
p4 yes
p11 2
ld p2 yes
--- yes
p12 3
Ready Table
p1 yesp2 yesp3 ---p4 yesp5 ---p6 yesp7 ---p8 yesp9 yes
p10 ---p11 yesp12 yes
Map Table
r1 p8
r2 p9
r3 p6
r4 p12
r5 p4
r6 p11
r7 p2
r8 p1
Insn
To Free
Done?
ld p7 yesadd p5 yesxor p3 yesld p10 yes
ReorderBuffer
![Page 118: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/118.jpg)
Handling Memory Operations
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 118
![Page 119: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/119.jpg)
Recall: Types of Dependencies
• RAW (Read After Write) = “true dependence”mul r0 * r1 ➜ r2…add r2 + r3 ➜ r4
• WAW (Write After Write) = “output dependence”mul r0 * r1➜ r2…add r1 + r3 ➜ r2
• WAR (Write After Read) = “anti-dependence”mul r0 * r1 ➜ r2…add r3 + r4 ➜ r1
• WAW & WAR are “false”, Can be totally eliminated by “renaming”
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 119
![Page 120: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/120.jpg)
Also Have Dependencies via Memory
• If value in “r2” and “r3” is the same…• RAW (Read After Write) – True dependency
st r1 ➜ [r2]…ld [r3] ➜ r4
• WAW (Write After Write)st r1 ➜ [r2]…st r4 ➜ [r3]
• WAR (Write After Read)ld [r2] ➜ r1…st r4 ➜ [r3]
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 120
WAR/WAW are “false dependencies”- But can’t rename memory in
same way as registers- Why? Addresses are
not known at rename- Need to use other tricks
![Page 121: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/121.jpg)
Let’s Start with Just Stores
• Stores: Write data cache, not registers• Can we rename memory?
No (at least not easily)• Cache writes unrecoverable
• Solution: write stores into cache only when certain• When are we certain? At “commit”
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 121
![Page 122: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/122.jpg)
Handling Stores
• Can “st p4 ➜ [p6+8]” issue in cycle 3?• Its registers inputs are ready• Why or why not?
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
st p5 ➜ [p3+4] F Di I RR
X M W C
st p4 ➜ [p6+8] F Di I?
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 122
![Page 123: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/123.jpg)
Problem #1: Out-of-Order Stores
• Can “st p4 ➜ [p6+8]” write the cache in cycle 6?• “st p5 ➜ [p3+4]” has not yet executed
• What if p3+4 == p6+8?• The two stores write the same address! WAW
dependency!• Not known until their “X” stages (cycle 5 & 8)
• Unappealing solution: all stores execute in-order• We can do better…
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
st p5 ➜ [p3+4] F Di I RR
X M W C
st p4 ➜ [p6+8] F Di I? RR
X M W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 123
![Page 124: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/124.jpg)
Problem #2: Speculative Stores
• Can “st p4 ➜ [p6+8]” write the cache in cycle 6?• Store is still “speculative” at this point
• What if “jump-not-zero” is mis-predicted?• Not known until its “X” stage (cycle 8)
• How does it “undo” the store once it hits the cache?• Answer: it can’t; stores write the cache only at commit• Guaranteed to be non-speculative at that point
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
st p5 ➜ [p3+4] F Di I RR
X M W C
st p4 ➜ [p6+8] F Di I? RR
X M W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 124
![Page 125: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/125.jpg)
Store Queue (SQ)
• Solves two problems• Allows for recovery of speculative stores• Allows out-of-order stores
• Store Queue (SQ)• At dispatch, each store is given a slot in the Store
Queue• First-in-first-out (FIFO) queue• Each entry contains: “address”, “value”, and “bday”
• Operation:• Dispatch (in-order): allocate entry in SQ (stall if full)• Execute (out-of-order): write store value into store queue• Commit (in-order): read value from SQ and write into data
cache• Branch recovery: remove entries from the store queue
• Also solves problems with loadsCIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 125
![Page 126: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/126.jpg)
Memory Forwarding
• Can “ld [p7] ➜ p8” issue and begin execution?• Why or why not?
0 1 2 3 4 5 6 7 8 9 10
11
12
fdiv p1 / p2 ➜ p9 F Di I RR
X1 X2 X3 X4 X5 X6 W C
st p4 ➜ [p5+4] F Di I RR
X W C
st p3 ➜ [p6+8] F Di I RR
X W C
ld [p7] ➜ p8 F Di I? RR
X M1 M2
W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 126
![Page 127: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/127.jpg)
Memory Forwarding
• Can “ld [p7] ➜ p8” issue and begin execution?• Why or why not?
• If the load reads from either of the stores’ addresses…• Load must get correct value, but stores don’t write cache until
commit…
• Solution: “memory forwarding”• Loads also searches the Store Queue (in parallel with
cache access)• Conceptually like register bypassing, but different
implementation• Why? Addresses unknown until execute
0 1 2 3 4 5 6 7 8 9 10
11
12
fdiv p1 / p2 ➜ p9 F Di I RR
X1 X2 X3 X4 X5 X6 W C
st p4 ➜ [p5+4] F Di I RR
X SQ
C
st p3 ➜ [p6+8] F Di I RR
X SQ
C
ld [p7] ➜ p8 F Di I? RR
X M1 M2
W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 127
![Page 128: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/128.jpg)
Problem #3: WAR Hazards
• What if “p3+4 == p6 + 8”?• WAR: need to make sure that load doesn’t read store’s
result• Need to get values based on “program order” not
“execution order”• Bad solution: require all stores/loads to execute in-
order• Good solution: add “age” fields to store queue (SQ)
• Loads read from youngest older matching store• Another reason the SQ is a FIFO queue
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
ld [p3+4] ➜ p5 F Di I RR
X M1
M2
W C
st p4 ➜ [p6+8] F Di I RR X SQ
C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 128
![Page 129: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/129.jpg)
Memory Forwarding via Store Queue• Store Queue (SQ)
• Holds all in-flight stores• CAM: searchable by
address• Age logic: determine
youngest matching store older than load
• Store rename/dispatch• Allocate entry in SQ
• Store execution• Update SQ
• Address + Data• Load execution
• Search SQ identify youngest older matching store• Match? Read SQ• No Match? Read cache
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 129
valueaddress================
age
Data cache
head
tail
load position
addressdata in
data outStore Queue (SQ)
![Page 130: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/130.jpg)
Store Queue (SQ)
• On load execution, select the store that is:• To same address as load• Older than the load (before the load in program order)
• Of these, select the youngest store• The store to the same address that immediately precedes
the load
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 130
![Page 131: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/131.jpg)
When Can Loads Execute?
• Can “ld [p6+8] ➜ p7” issue in cycle 3• Why or why not?
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
st p5 ➜ [p3+4] F Di I RR
X SQ
C
ld [p6+8] ➜ p7 F Di I? RR
X M1
M2
W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 131
![Page 132: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/132.jpg)
When Can Loads Execute?
• Aliasing! Does p3+4 == p6+8?• If no, load should get value from memory
• Can it start to execute?• If yes, load should get value from store
• By reading the store queue?• But the value isn’t put into the store queue until
cycle 9• Key challenge: don’t know addresses until
execution!• One solution: require all loads to wait for all earlier (prior)
stores
0 1 2 3 4 5 6 7 8 9 10
11
12
mul p1 * p2 ➜ p3
F Di I RR
X1 X2 X3 X4 W C
jump-not-zero p3
F Di I RR
X W C
st p5 ➜ [p3+4] F Di I RR
X SQ
C
ld [p6+8] ➜ p7 F Di I? RR
X M1
M2
W C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 132
![Page 133: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/133.jpg)
Conservative Load Scheduling
• Conservative load scheduling:• All older stores have executed
• Some architectures: split store address / store data• Only requires knowing addresses (not the store
values)• Advantage: always safe• Disadvantage: performance (limits ILP)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 135
![Page 134: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/134.jpg)
Conservative Load Scheduling0 1 2 3 4 5 6 7 8 9 10 11 1
213
14 15
ld [p1] ➜ p4 F Di
I Rr
X M1
M2
W C
ld [p2] ➜ p5 F Di
I Rr
X M1
M2
W C
add p4, p5 ➜ p6
F Di
I Rr X W C
st p6 ➜ [p3] F Di
I Rr
X SQ
C
ld [p1+4] ➜ p7 F Di I Rr
X M1
M2
W C
ld [p2+4] ➜ p8 F Di I Rr
X M1
M2
W C
add p7, p8 ➜ p9
F Di
I Rr X W C
st p9 ➜ [p3+4] F Di
I Rr
X SQ
C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 136
Conservative load scheduling: can’t issue ld [p1+4] until cycle 7!Might as well be an in-order machine on this example Can we do better? How?
![Page 135: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/135.jpg)
Optimistic Load Scheduling0 1 2 3 4 5 6 7 8 9 10 11 1
213
14 15
ld [p1] ➜ p4 F Di
I Rr
X M1
M2
W C
ld [p2] ➜ p5 F Di
I Rr
X M1
M2
W C
add p4, p5 ➜ p6
F Di
I Rr X W C
st p6 ➜ [p3] F Di
I Rr X SQ
C
ld [p1+4] ➜ p7 F Di I Rr X M1
M2
W C
ld [p2+4] ➜ p8 F Di I Rr X M1
M2
W C
add p7, p8 ➜ p9
F Di
I Rr X W C
st p9 ➜ [p3+4] F Di
I Rr X SQ
C
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 137
Optimistic load scheduling: can actually benefit from out-of-order!But how do we know when our speculation (optimism) fails?
![Page 136: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/136.jpg)
Load Speculation
• Speculation requires two things…..• 1. Detection of mis-speculations
• How can we do this?
• 2. Recovery from mis-speculations• Squash offending load and all newer insns• Same method as branch mis-prediction recovery
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 138
![Page 137: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/137.jpg)
Load Queue
• Detects load ordering violations
• Load execution: Write address into LQ• Also note any store
forwarded from• Store execution: Search
LQ• Younger load with same
addr?• Did younger load
forward from younger store? [See slide 151]
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 139
================
Data Cache
head
tail
load queue (LQ)
address================
tail
head
age
store position flush?
SQ
![Page 138: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/138.jpg)
Store Queue + Load Queue
• Store Queue: handles forwarding• Entry per store (allocated @ dispatch, deallocated @
commit)• Written by stores (@ execute)• Searched by loads (@ execute)• Read from SQ to write data cache (@ commit)
• Load Queue: detects ordering violations• Entry per load (allocated @ dispatch, deallocated @
commit) • Written by loads (@ execute)• Searched by stores (@ execute)
• Both together• Allows aggressive load scheduling• Stores don’t constrain load executionCIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 140
![Page 139: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/139.jpg)
Optimistic Load Scheduling Problem
• Allows loads to issue before older stores• Increases ILP+ Good: When no conflict, increases performance- Bad: Conflict => squash => worse performance than
waiting
• Can we have our cake AND eat it too?
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 141
![Page 140: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/140.jpg)
Predictive Load Scheduling
• Predict which loads must wait for stores
• Fool me once, shame on you-- fool me twice?• Loads default to aggressive• Keep table of load PCs that have been caused squashes
• Schedule these conservatively+ Simple predictor- Makes “bad” loads wait for all older stores
• More complex predictors used in practice• Predict which stores loads should wait for• “Store Sets” paper
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 142
![Page 141: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/141.jpg)
Load/Store Queue Examples
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 143
![Page 142: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/142.jpg)
Initial State
144
1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 200
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
RegFile
p1 5
p2 100
p3 9
p4 200
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
RegFile
p1 5
p2 100
p3 9
p4 200
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
(Stores to different addresses)
![Page 143: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/143.jpg)
Good Interleaving
145
1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 200
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
RegFile
p1 5
p2 100
p3 9
p4 200
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
2 200 9
RegFile
p1 5
p2 100
p3 9
p4 200
p5 100
p6 5
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
2 200 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
1. St p1 ➜ [p2]
2. St p3 ➜ [p4]
3. Ld [p5] ➜ p6
(Shows importance of address check)
![Page 144: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/144.jpg)
Different Initial State
146
1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
(All to same address)
![Page 145: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/145.jpg)
Good Interleaving #1
147
1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 9
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
1. St p1 ➜ [p2]
2. St p3 ➜ [p4]
3. Ld [p5] ➜ p6
(Program Order)
![Page 146: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/146.jpg)
Good Interleaving #2
148
1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6Load
Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
2 100 9
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 9
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
2. St p3 ➜ [p4]
1. St p1 ➜ [p2]
3. Ld [p5] ➜ p6
(Stores reordered)
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
![Page 147: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/147.jpg)
149
Bad Interleaving #11. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 100
p5 100
p6 13
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 13
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
2 100 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
3. Ld [p5] ➜ p6
2. St p3 ➜ [p4]
(Load reads the cache)
![Page 148: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/148.jpg)
150
Bad Interleaving #21. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
1 100 5
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 5
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 5
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
1. St p1 ➜ [p2]
3. Ld [p5] ➜ p6
2. St p3 ➜ [p4]
(Load gets value from wrong store)
![Page 149: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/149.jpg)
151
Bad/Good Interleaving1. St p1 ➜ [p2] 2. St p3 ➜ [p4]3. Ld [p5] ➜ p6RegFil
e
p1 5
p2 100
p3 9
p4 100
p5 100
p6 ---
p7 ---
p8 ---
Load Queue
Bdy
Addr
Store Queue
Bdy
Addr
Val
2 100 9
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 9
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
2 100 9
RegFile
p1 5
p2 100
p3 9
p4 100
p5 100
p6 9
p7 ---
p8 ---
Load Queue
Bdy
Addr
3 100
Store Queue
Bdy
Addr
Val
1 100 5
2 100 9
Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache Addr
Val
100 13
200 17
Cache
2. St p3 ➜ [p4]
3. Ld [p5] ➜ p6
1. St p1 ➜ [p2]
?
(Load gets value from correct store, but does it work?)
![Page 150: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/150.jpg)
Out-of-Order: Benefits & Challenges
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 152
![Page 151: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/151.jpg)
Dynamic Scheduling Operation (Recap)• Dynamic scheduling
• Totally in the hardware (not visible to software)• Also called “out-of-order execution” (OoO)
• Fetch many instructions into instruction window• Use branch prediction to speculate past (multiple)
branches• Flush pipeline on branch misprediction
• Rename registers to avoid false dependencies• Execute instructions as soon as possible
• Register dependencies are known• Handling memory dependencies is harder
• “Commit” instructions in order• Anything strange happens before commit, just flush the
pipeline• How much out-of-order? Core i7 “Sandy Bridge”:
• 168-entry reorder buffer, 160 integer registers, 54-entry scheduler
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 153
![Page 152: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/152.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 154
![Page 153: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/153.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 155
![Page 154: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/154.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 156
![Page 155: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/155.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 157
![Page 156: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/156.jpg)
OoO Execution is all around us
• Joe’s phone: Qualcomm Krait processor• based on ARM Cortex A15 processor• out-of-order 1.5GHz dual-core• 3-wide fetch/decode• 4-wide issue• 11-stage integer pipeline• 28nm process technology• 4/4KB DM L1$, 16/16KB 4-way SA L2$, 1MB 8-way SA L3$
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 158
![Page 157: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/157.jpg)
Out of Order: Benefits
• Allows speculative re-ordering• Loads / stores• Branch prediction to look past branches
• Done by hardware• Compiler may want different schedule for different hw
configs• Hardware has only its own configuration to deal with
• Schedule can change due to cache misses• Memory-level parallelism
• Executes “around” cache misses to find independent instructions
• Finds and initiates independent misses, reducing memory latency• Especially good at hiding L2 hits (~12 cycles in Core i7)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 159
![Page 158: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/158.jpg)
Challenges for Out-of-Order Cores• Design complexity
• More complicated than in-order? Certainly!• But, we have managed to overcome the design complexity
• Clock frequency• Can we build a “high ILP” machine at high clock frequency?• Yep, with some additional pipe stages, clever design
• Limits to (efficiently) scaling the window and ILP• Large physical register file• Fast register renaming/wakeup/select/load queue/store queue
• Active areas of micro-architectural research• Branch & memory depend. prediction (limits effective window
size)• 95% branch mis-prediction: 1 in 20 branches, or 1 in 100
insn.• Plus all the issues of building “wide” in-order superscalar
• Power efficiency• Today, even mobile phone chips are out-of-order cores
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 160
![Page 159: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/159.jpg)
Redux: HW vs. SW Scheduling• Static scheduling
• Performed by compiler, limited in several ways• Dynamic scheduling
• Performed by the hardware, overcomes limitations• Static limitation ➜ dynamic mitigation
• Number of registers in the ISA ➜ register renaming• Scheduling scope ➜ branch prediction & speculation• Inexact memory aliasing information ➜ speculative memory ops• Unknown latencies of cache misses ➜ execute when ready
• Which to do? Compiler does what it can, hardware the rest• Why? dynamic scheduling needed to sustain more than 2-way issue• Helps with hiding memory latency (execute around misses)• Intel Core i7 is four-wide execute w/ scheduling window of 100+• Even mobile phones have dynamically scheduled cores (ARM A9,
A15)CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 161
![Page 160: CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling1 CIS 501: Computer Architecture Unit 9: Static & Dynamic Scheduling Slides originally developed.](https://reader038.fdocuments.us/reader038/viewer/2022102808/56649e855503460f94b879da/html5/thumbnails/160.jpg)
CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling 162
Summary: Scheduling
• Code scheduling• To reduce pipeline stalls• To increase ILP (insn-level
parallelism)
• Static scheduling by the compiler• Approach & limitations
• Dynamic scheduling in hardware• Register renaming• Instruction selection• Handling memory operations
• Up next: multicore
CPUMem I/O
System software
AppApp App