Post on 17-Apr-2020
Prof. Hakim WeatherspoonCS 3410, Spring 2015Computer ScienceCornell University
See P&H Chapter: 4.6‐4.8
Prelim next weekTuesday at 7:30. Go to location based on netid[a‐g]* → MRS146: Morrison Hall 146[h‐l]* → RRB125: Riley‐Robb Hall 125[m‐n]*→ RRB105: Riley‐Robb Hall 105[o‐s]* → MVRG71: M Van Rensselaer Hall G71[t‐z]* → MVRG73: M Van Rensselaer Hall G73
Prelim reviewsTODAY, Tue, Feb 24 @ 7:30pm in Olin 255Sat, Feb 28 @ 7:30pm in Upson B17
Prelim conflictsContact Deniz Altinbuken <deniz@cs.cornell.edu>
Prelim1:• Time: We will start at 7:30pm sharp, so come early• Location: on previous slide• Closed Book
• Cannot use electronic device or outside material
• Practice prelims are online in CMS
• Material covered everything up to end of this week• Everything up to and including data hazards• Appendix B (logic, gates, FSMs, memory, ALUs) • Chapter 4 (pipelined [and non] MIPS processor with hazards)• Chapters 2 (Numbers / Arithmetic, simple MIPS instructions)• Chapter 1 (Performance)• HW1, Lab0, Lab1, Lab2, C‐Lab0, C‐Lab1
RISC and Pipelined Processor: Putting it all togetherData Hazards• Data dependencies• Problem, detection, and solutions
– (delaying, stalling, forwarding, bypass, etc)• Hazard detection unit• Forwarding unit
Next time• Control Hazards
What is the next instruction to execute ifa branch is taken? Not taken?
Simplicity favors regularity• 32 bit instructions
Smaller is faster• Small register file
Make the common case fast• Include support for constants
Good design demands good compromises• Support for different type of interpretations/classes
All MIPS instructions are 32 bits long, has 3 formats
R‐type
I‐type
J‐type
op rs rt rd shamt func6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
op rs rt immediate6 bits 5 bits 5 bits 16 bits
op immediate (target address)6 bits 26 bits
Arithmetic/Logical• R‐type: result and two source registers, shift amount• I‐type: 16‐bit immediate with sign/zero extension
Memory Access• load/store between registers and memory• word, half‐word and byte operations
Control flow• conditional branches: pc‐relative addresses• jumps: fixed offsets, register absolute
Arithmetic/Logical• ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU• ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV, SLTI, SLTIU
• MULT, DIV, MFLO, MTLO, MFHI, MTHIMemory Access
• LW, LH, LB, LHU, LBU, LWL, LWR• SW, SH, SB, SWL, SWR
Control flow• BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ• J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL
Special• LL, SC, SYSCALL, BREAK, SYNC, COPROC
Principle:Throughput increased by parallel executionBalanced pipeline very important
Else slowest stage dominates performance
Pipelining:• Identify pipeline stages• Isolate stages from each other• Resolve pipeline hazards (this and next lecture)
Five stage “RISC” load‐store architecture
1. Instruction fetch (IF)– get instruction from memory, increment PC
2. Instruction Decode (ID)– translate opcode into control signals and read registers
3. Execute (EX)– perform ALU operation, compute jump/branch targets
4.Memory (MEM)– access memory if needed
5.Writeback (WB)– update register file
• Each instruction goes through the 5 stages• Each stage takes one clock cycle
• So slowest stage determines clock cycle time
1 2 3 4 5 6 7 8 9Clock cycle
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
lw
The pipeline achievesA) Latency: 1, throughput: 1 instr/cycleB) Latency: 5, throughput: 1 instr/cycleC) Latency: 1, throughput: 1/5 instr/cycleD) Latency: 5, throughput: 5 instr/cycleE) None of the above
1 2 3 4 5 6 7 8 9Clock cycle
Latency: 5 cyclesThroughput: 1 instr/cycleConcurrency: 5
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add
lw
CPI = 1CPI =
• Each instruction goes through the 5 stages• Each stage takes one clock cycle
• So slowest stage determines clock cycle time
• Stages must share information. How?• Add pipeline registers (flip‐flops) to pass results
between different stages
extend
registerfile
control
alu
memory
din dout
addrPC
memory
newpc
computejump/branch
targets
+4
Fetch Decode Execute Memory WB
Write‐BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
• Each instruction goes through the 5 stages• Each stage takes one clock cycle
• So slowest stage determines clock cycle time
• Stages must share information. How?• Add pipeline registers (flip‐flops) to pass results
between different stages
And is this it? Not quite….
3 kinds• Structural hazards
– Multiple instructions want to use same unit
• Data hazards– Results of instruction needed before ready
• Control hazards– Don’t know which side of branch to take
Will get back to thisFirst, how to pipeline when no hazards
Write‐BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);
Assume eight‐register machineRun the following code on a pipelined datapath
add r3 r1 r2 ; reg 3 = reg 1 + reg 2nand r6 r4 r5 ; reg 6 = ~(reg 4 & reg 5)lw r4 20 (r2) ; reg 4 = Mem[reg2+20]add r5 r2 r5 ; reg 5 = reg 2 + reg 5sw r7 12(r3) ; Mem[reg3+12] = reg 7
Slides thanks to Sally McKee
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
op
Rt
imm
valB
valA
PC+4PC+4target
ALUresult
op
dest
valB
op
dest
ALUresult
mdata
instruction
0
R2
R3
R4
R5
R1
R6
R0
R7
regAregB
Bits 26‐31
data
dest
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
Rd
data
dest
IF/ID ID/EX EX/MEM MEM/WB
extend
0MUX
0
At time 1, Fetchadd r3 r1 r2
0
Time: 0
4
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
nop
0
0
0
040
0
nop
0
0
nop
0
0
0
0
add 3 1 2
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits 26‐31
data
dest
Fetch:add 3 1 2
add 3 1 2
IF/ID ID/EX EX/MEM MEM/WB
extend
0MUX
0
/ add
/ 3
/ 4
/ 36
/ 9
/ 2
Time: 1/ 2
48
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
add
3
9
36
480
0
nop
0
0
nop
0
0
0
0nand 6 4 5
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
12
Bits 26‐31
data
dest
Fetch:nand 6 4 5
nand 6 4 5 add 3 1 2
IF/ID ID/EX EX/MEM MEM/WB
extend
2MUX
3
/ 45
/ add/ nand
/ 9/ 6
/ 8
/ 18
/ 7
/ 5 / 3
/ 4
36
9
3
Time: 2/ 3
812
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
nand
6
7
18
8124
45
add
3
9
nop
0
0
0
0lw 4 20(2)
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
45
Bits 26‐31
data
dest
Fetch:lw 4 20(2)
lw 4 20(2) nand 6 4 5 add 3 1 2
36
9
3
IF/ID ID/EX EX/MEM MEM/WB
extend
5MUX
6 32
/ 45
/ 3
nand (18 · 7)
18 = 01 00107 = 00 0111‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐3 = 11 1101
/ ‐3
/ add
/ 18
/ 7
/ nand
/ 7
/ 6
/ 8
Time: 3/ 4
1216
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
lw
20
18
9
12168
‐3
nand
6
7
add
3
45
0
0add 5 2 5
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
24
Bits 26‐31
data
dest
Fetch:add 5 2 5
add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2
18
7
6
45
3
IF/ID ID/EX EX/MEM MEM/WB
extend
4MUX
0 65
Time: 4
1620
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
add
5
7
9
162012
29
lw
4
18
nand
6
‐3
0
0sw 7 12(3)
945187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
25
Bits 26‐31
data
dest
Fetch:sw 7 12(3)
sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2
9
20
4
‐3
6
45
3
IF/ID ID/EX EX/MEM MEM/WB
extend
5MUX
5 04
Time: 5
2024
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
sw
12
22
45
2016
16
add
5
7
lw
4
29
99
0945187
36
‐3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
37
Bits 26‐31
data
dest
No moreinstructions
sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5
9
7
5
29
4
‐3
6
IF/ID ID/EX EX/MEM MEM/WB
extend
7MUX
0 55
Time: 6
2428
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
20
57
sw
7
22
add
5
16
0
0945997
36
‐3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits 26‐31
data
dest
No moreinstructions
nop nop sw 7 12(3) add 5 2 5 lw 4 20(2)
45
7
12
16
5
99
4
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
07
Time: 7
2832
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
sw
7
57
0
9459916
36
‐3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits 26‐31
data
dest
No moreinstructions
nop nop nop sw 7 12(3) add 5 2 5
2257
22
16
5
Slides thanks to Sally McKee
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
Time: 8
3236
PC
Register file
MUXA
LU
MUX
4
Datamem
+
MUX
Bits 11‐15
Bits 16‐20
9459916
36
‐3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits 21‐23
data
dest
No moreinstructions
nop nop nop nop sw 7 12(3)
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
Time: 9
3640
Pipelining is a powerful technique to mask latencies and increase throughput
• Logically, instructions execute one at a time• Physically, instructions execute in parallel
– Instruction level parallelism
Abstraction promotes decoupling• Interface (ISA) vs. implementation (Pipeline)
See P&H Chapter: 4.7‐4.8
3 kinds• Structural hazards
– Multiple instructions want to use same unit
• Data hazards– Results of instruction needed before
• Control hazards– Don’t know which side of branch to take
What about data dependencies (also known as a data hazard in a pipelined processor)?i.e. add r3, r1, r2
sub r5, r3, r4
Need to detect and then fix such hazards
Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written
– i.e instruction may need values that are being computed further down the pipeline
– in fact, this is quite common
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clock cycle1 2 3 4 5 6 7 8 9
sub r5, r3, r4
lw r6, 4(r3)
or r5, r3, r5
sw r6, 12(r3)
add r3, r1, r2
time
How many data hazards due to r3 only
A) 1B) 2C) 3D) 4E) 5
sub r5, r3, r4
lw r6, 4(r3)
or r5, r3, r5
sw r6, 12(r3)
add r3, r1, r2
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clock cycle1 2 3 4 5 6 7 8 9
2. sub r5, r3, r4
3. lw r6, 4(r3)
4. or r5, r3, r5
5. sw r6, 12(r3)
1. add r3, r1, r2
time
r3 = 10
r3 = 20
r3 = 10 r3 = 20
r3 = 10
r3 = 10
r3 = 10
OK
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clock cycle1 2 3 4 5 6 7 8 9
sub r5, r3, r4
lw r6, 4(r3)
or r5, r3, r5
sw r6, 12(r3)
add r3, r1, r2r3 = 10
r3 = 20
OK
r3 = 10 r3 = 20time
r3 = 10
r3 = 10
r3 = 10
r3 = 20
r6 = Mem[r3 + 4]
Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written
i.e. add r3, r1, r2sub r5, r3, r4
How to detect?
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addrinst
PC+4
OP
BA
Rt
BD
MD
PC+4
imm
OP
Rd
OP
Rd
PC
instmem
Rd
Ra Rb
DB
A
Rd
Detecting Data Hazards
IF/ID.Ra ≠ 0 &&(IF/ID.Ra==ID/Ex.RdIF/ID.Ra==Ex/M.RdIF/ID.Ra==M/W.Rd)
Data Hazards• register file reads occur in stage 2 (ID) • register file writes occur in stage 5 (WB)• next instructions may read values about to be written
How to detect? Logic in ID stage:stall = (IF/ID.Ra != 0 &&
(IF/ID.Ra == ID/EX.Rd || IF/ID.Ra == EX/M.Rd || IF/ID.Ra == M/WB.Rd))
|| (same for Rb)
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addr
PC
instmem
Rd
Ra Rb
DB
A
detecthazard Rd
add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8
inst
PC+4
OP
BA
Rt
BD
MD
PC+4
imm
OP
Rd
OP
Rd
Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.
What to do if data hazard detected?
What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/BypassD) All the aboveE) None. We will use some other method
What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/BypassD) All the aboveE) None. We will use some other method
Discuss today
What to do if data hazard detected?
Options• Nothing
– Change the ISA to match implementation
• Stall– Pause current and subsequent instructions till safe
• Forward/bypass– Forward data value to where it is needed
How to stall an instruction in ID stage• prevent IF/ID pipeline register update
– stalls the ID stage instruction
• convert ID stage instr into nop for later stages– innocuous “bubble” passes through pipeline
• prevent PC update– stalls the next (IF stage) instruction
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addr
PC
instmem
Rd
Ra Rb
DB
A
detecthazard Rd
add r3, r1, r2sub r5, r3, r5or r6, r3, r4 add r6, r3, r8
inst
PC+4
OP
BA
Rt
BD
MD
PC+4
imm
OP
Rd
OP
Rd
If detect hazard
WE=0
MemWr=0RegWr=0
Clock cycle1 2 3 4 5 6 7 8
add r3, r1, r2
sub r5, r3, r5
or r6, r3, r4
add r6, r3, r8
time
Clock cycle1 2 3 4 5 6 7 8
add r3, r1, r2
sub r5, r3, r5
or r6, r3, r4
add r6, r3, r8
r3 = 10
r3 = 20
time
IF ID Ex M W
IF ID Ex M W
IF ID Ex M
ID ID ID
IF IF IF
IF ID Ex
Stalls3 Stall
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
add r3,r1,r2
(MemWr=0RegWr=0)
NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))
sub r5,r3,r5
or r6,r3,r4 (WE=0)
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
nop
(MemWr=0RegWr=0)
NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))
add r3,r1,r2sub r5,r3,r5
(MemWr=0RegWr=0)
or r6,r3,r4 (WE=0)
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
(MemWr=0RegWr=0)
NOP = If(IF/ID.rA ≠ 0 &&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.RdIF/ID.rA==M/W.Rd))
add r3,r1,r2sub r5,r3,r5 nop nop
(MemWr=0RegWr=0)
(MemWr=0RegWr=0)
or r6,r3,r4 (WE=0)
Clock cycle1 2 3 4 5 6 7 8
add r3, r1, r2
sub r5, r3, r5
or r6, r3, r4
add r6, r3, r8
r3 = 10
r3 = 20
time
IF ID Ex M W
IF ID Ex M W
IF ID Ex M
ID ID ID
IF IF IF
IF ID Ex
Stalls3 Stall
How to stall an instruction in ID stage• prevent IF/ID pipeline register update
– stalls the ID stage instruction
• convert ID stage instr into nop for later stages– innocuous “bubble” passes through pipeline
• prevent PC update– stalls the next (IF stage) instruction
Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.
Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards.
Stalling introduces NOPs (“bubbles”) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. *Bubbles in pipeline significantly decrease performance.
What to do if data hazard detected?A) Wait/StallB) Reorder in Software (SW)C) Forward/Bypass
Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register).
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (WEx)• RegisterFile Bypass
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
forwardunit
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
Ex/MEM to EX Bypass• EX needs ALU result that is still in MEM stage• Resolve:
Add a bypass from EX/MEM.D to start of EX
How to detect? Logic in Ex Stage:forward = (Ex/M.WE && EX/M.Rd != 0 &&
ID/Ex.Ra == Ex/M.Rd)|| (same for Rb)
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
add r3, r1, r2
sub r5, r3, r1
datamem
instmem
DB
A
IF ID Ex M W
IF ID Ex M W
Mem/WB to EX Bypass• EX needs value being written by WB• Resolve:
Add bypass from WB final value to start of EX
How to detect? Logic in Ex Stage:forward = (M/WB.WE && M/WB.Rd != 0 &&
ID/Ex.Ra == M/WB.Rd &¬ (Ex/M.WE && Ex/M.Rd != 0 &&
ID/Ex.Ra == Ex/M.Rd)|| (same for Rb)
Check pg. 311
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
add r3, r1, r2
sub r5, r3, r1
or r6, r3, r4
datamem
instmem
DB
A
IF ID Ex M W
IF IDIF W
Ex M WID Ex M
Register File Bypass• Reading a value that is currently being written
Detect:((Ra == MEM/WB.Rd) or (Rb == MEM/WB.Rd))and (WB is writing a register)
Resolve:Add a bypass around register file (WB to ID)Better: (Hack) just negate register file clock– writes happen at end of first half of each clock cycle– reads happen during second half of each clock cycle
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• RegisterFile Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
add r3, r1, r2
sub r5, r3, r1
or r6, r3, r4
add r6, r3, r8
datamem
instmem
DB
A
IF ID Ex M W
IF IDIF W
Ex M WID Ex MIF ID Ex M W
Clock cycle1 2 3 4 5 6 7 8
add r3, r1, r2
sub r5, r3, r5
or r6, r3, r4
add r6, r3, r8
IF ID Ex M W
IF ID
IF W
Ex M W
ID Ex M
IF ID Ex
r3 = 10
r3 = 20
time
M W
Clock cycle1 2 3 4 5 6 7 8
add r3, r1, r2
sub r5, r3, r4
lw r6, 4(r3)
or r5, r3, r5
sw r6, 12(r3)
IF ID Ex M W
IF ID
IF W
Ex M W
ID Ex M
IF ID Ex
time
M W
IF ID Ex M W
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Three types of forwarding/bypass• Forwarding from Ex/Mem registers to Ex stage (MEx)• Forwarding from Mem/WB register to Ex stage (W Ex)• Register File Bypass
IF/ID ID/Ex Ex/Mem Mem/WB
Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.
Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs (“bubbles”) into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles (nops) in pipeline significantly decrease performance.
Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). Better performance than stalling.
Stall• Pause current and all subsequent instructions
Forward/Bypass• Try to steal correct value from elsewhere in pipeline• Otherwise, fall back to stalling or require a delay slot
Tradeoffs?