Pipelined Datapath - Professional Web...
Transcript of Pipelined Datapath - Professional Web...
![Page 1: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/1.jpg)
1
Pipelined Datapath
Lecture notes from MKP, H. H. Lee and S. Yalamanchili
(2)
Reading
• Sections 4.5 – 4.10
• Practice Problems: 1, 3, 8, 12
• Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in the online text.
![Page 2: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/2.jpg)
2
(3)
Pipeline Performance
• Assume time for stages isv 100ps for register read or writev 200ps for other stages
• Compare pipelined datapath with single-cycle datapath
Instr Instr fetch Register read
ALU op Memory access
Register write
Total time
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps
(4)
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
![Page 3: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/3.jpg)
3
(5)
Pipeline Speedup
• If all stages are balancedv i.e., all take the same time
• If not balanced, speedup is less• Speedup due to increased throughput
v Latency (time for each instruction) does not decrease
Inter − instruction− gappipelined =Inter − instruction− gapnonpipelined
number − of − stages
(6)
Basic IdeaAll instructions
are 32-bitsFew & regular
instruction formats
Alignment of memory operands
![Page 4: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/4.jpg)
4
(7)
Pipelining
• What makes it easyv All instructions are the same lengthv Simple instruction formatsv Memory operands appear only in loads and stores
• What makes it hard?v structural hazards: suppose we had only one memoryv control hazards: need to worry about branch
instructionsv data hazards: an instruction depends on a previous
instruction• What really makes it hard:
v exception handlingv trying to improve performance with out-of-order
execution, etc.
(8)
Pipeline registers
• Need registers between stagesv To hold information produced in previous cycle
Pipeline stage execution time
![Page 5: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/5.jpg)
5
(9)
Graphically Representing Pipelines
• Shading indicates the unit is being used by the instruction
• Shading on the right half of the register file (ID or WB) or memory means the element is being read in that stage
• Shading on the left half means the element is being written in that stage
IF ID MEM WBEX
2 4 6 8 10Time
lw
IF ID MEM WBEXadd
(10)
Graphically Representing Pipelines
• Can help with answering questions like:v how many cycles does it take to execute this code?v what is the ALU doing during cycle 4?v use this representation to help understand datapaths
IM Reg DM Reg
IM Reg DM Reg
CC1 CC2 CC3 CC4 CC5 CC6
Time(inclockcycles)
lw$10, 20($1)
Programexecutionorder(in instructions)
sub$11, $2, $3
ALU
ALU
![Page 6: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/6.jpg)
6
(11)
Structural Hazard
IF ID MEM WBEX
2 4 6 8 10Time
IF ID MEM WBEX
IF ID MEM WBEX
IF ID MEM WBEX
lw
add
sub
add
Need to separate instruction and data memory
(12)
IF for Load, Store, …
Pipeline stage execution time
![Page 7: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/7.jpg)
7
(13)
ID for Load, Store, …
Pipeline stage execution time
(14)
EX for Load
Pipeline stage execution time
![Page 8: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/8.jpg)
8
(15)
MEM for Load
Pipeline stage execution time
(16)
WB for Load
Wrongregisternumber
![Page 9: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/9.jpg)
9
(17)
Corrected Datapath for Load
Pipeline stage execution time
(18)
EX for Store
Pipeline stage execution time
![Page 10: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/10.jpg)
10
(19)
MEM for Store
Pipeline stage execution time
(20)
WB for Store
Pipeline stage execution time
![Page 11: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/11.jpg)
11
(21)
Pipelining Example
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresultM
ux
ALUZero
ID/EX
Datamemory
Address
add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
RegDst
0
1
Mux
Instruction[20–16]Instruction[15–11]
Pipeline stage execution time
Note what is happening in the register file
(22)
Pipelined Control (Simplified)
![Page 12: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/12.jpg)
12
(23)
Pipelined Control
• Control signals derived from instructionv As in single-cycle
implementation
• Pass control signals along like data
Execution/AddressCalculation stage control
linesMemory access stage
control lines
Write-backstage control
lines
InstructionRegDst
ALUOp1
ALUOp0
ALUSrc Branch
MemRead
MemWrite
Regwrite
Memto Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
(24)
Pipelined Control
![Page 13: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/13.jpg)
13
(25)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
IF: lw $10, 9($1)
(26)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
IF: sub $11, $2, $3 ID: lw $10, 9($1)
11
0100001 E
�lw�
![Page 14: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/14.jpg)
14
(27)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
11
010
00E
ID: sub $11, $2, $3 EX: lw $10, 9($1)IF: and $12, $4, $5
1
0
10
000
1100
�sub�
(28)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10E
EX: sub $11, $2, $3 MEM: lw $10, 9($1)ID: and $12, $4, $5
0
1
10
000
1100
IF: or $13, $6, $7
1101
0
�and�
![Page 15: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/15.jpg)
15
(29)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10E
MEM: sub $11, .. WB: lw $10, 9($1)
EX: and $12, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7
10000
�or�
IF: add $14, $8, $9
1
1
(30)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10E
WB: sub $11, ..MEM: and $12…
0
1
10
000
1100
EX: or $13, $6, $7
10000
�add�
ID: add $14, $8, $9
1
0
IF: xxxx
![Page 16: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/16.jpg)
16
(31)
Datapath with Control
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10
WB: and $12…
0
1
MEM: or $13, ..
10000
EX: add $14, $8, $9
1
0
IF: xxxx ID: xxxx
X
M
WB
ID/EX
E
(32)
Datapath with ControlWB: or $13…
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
MEM: add $14, ..
10000
EX: xxxx
1
0
IF: xxxx ID: xxxx
X
M
WB
ID/EX
E
![Page 17: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/17.jpg)
17
(33)
Datapath with ControlWB: add $14..MEM: xxxxEX: xxxxIF: xxxx ID: xxxx
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
1
0X
M
WB
ID/EX
E
(34)
Data Hazards (4.7)
• An instruction depends on completion of data access by a previous instructionv add $s0, $t0, $t1
sub $t2, $s0, $t3
![Page 18: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/18.jpg)
18
(35)
• Problem with starting next instruction before first is finishedv dependencies that �go backward in time� are data
hazards
Dependencies
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Program execution order (in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
(36)
• Have compiler guarantee no hazards• Where do we insert the �nops� ?
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
• Problem: this really slows us down!
Software Solution
![Page 19: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/19.jpg)
19
(37)
A Better Solution
• Consider this sequence:sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)
• We can resolve hazards with forwardingv How do we detect when to forward?
(38)
Dependencies & Forwarding
Do not wait for results to be written to the
register file – find them in the pipeline àforward to ALU
![Page 20: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/20.jpg)
20
(39)
Forwarding
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10E
MEM: sub $11, .. WB: lw $10, 9($1)
EX: and $6, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7
10000
�or�
IF: add $14, $8, $9
1
1
(40)
Forwarding (simplified)
DataMemory
RegisterFile
MUX
ID/EX EX/MEM MEM/WB
ALU
![Page 21: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/21.jpg)
21
(41)
Forwarding (from EX/MEM)
ALU
DataMemory
RegisterFile
MUX
ID/EX EX/MEM MEM/WB
MUX
MUX
(42)
Forwarding (from MEM/WB)
ALU
DataMemory
RegisterFile
MUX
ID/EX EX/MEM MEM/WB
MUX
MUX
![Page 22: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/22.jpg)
22
(43)
Forwarding (operand selection)
ALU
DataMemory
RegisterFile
MUX
ID/EX EX/MEM MEM/WB
MUX
MUX
ForwardingUnit
(44)
Forwarding (operand propagation)
ALU
DataMemory
RegisterFile
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
ForwardingUnit
Rt
Rs
MU
X
Rd
Rt
EX/MEM Rd
MEM/WB Rd
Combinational Logic!
![Page 23: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/23.jpg)
23
(45)
Detecting the Need to Forward
• Pass register numbers along pipelinev e.g., ID/EX.RegisterRs = register number for Rs
sitting in ID/EX pipeline register• ALU operand register numbers in EX stage
are given byv ID/EX.RegisterRs, ID/EX.RegisterRt
• Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Fwd fromEX/MEM
pipeline reg
Fwd fromMEM/WB
pipeline reg
(46)
Detecting the Need to Forward
• But only if forwarding instruction will write to a register!v EX/MEM.RegWrite, MEM/WB.RegWrite
• And only if Rd for that instruction is not $zerov EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0
![Page 24: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/24.jpg)
24
(47)
Forwarding Paths
(48)
Forwarding Conditions
• EX hazardv if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))ForwardA = 10
v if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
• MEM hazardv if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01
v if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
![Page 25: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/25.jpg)
25
(49)
Double Data Hazard
• Consider the sequence:add $1,$1,$2add $1,$1,$3add $1,$1,$4
• Both hazards occurv Want to use the most recent
• Revise MEM hazard conditionv Only forward if EX hazard condition isn’t true
(50)
Revised Forwarding Condition• MEM hazard
v if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01v if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))ForwardB = 01
v Checking precedence of EX hazard
![Page 26: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/26.jpg)
26
(51)
Datapath with Forwarding
(52)
Concurrent Execution
• Correct execution is about managing dependenciesv Producer-consumerv Structural (using the same hardware component)
• We will come across other types of dependencies later!
![Page 27: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/27.jpg)
27
(53)
Load-Use Data Hazard
Need to stall for one cycle
(54)
Forwarding
PC
Instructionmemory
Instruction
Add
Instruction[20–16]
Memto Reg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15–0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata M
ux
1
ALUcontrol
Shiftleft 2Re
gWr ite
MemRead
Control
ALU
Instruction[15–11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
MemWr ite
AddressDatamemory
Address
10
000
10E
MEM: lw $11, 0($2) WB: lw $10, 9($1)
EX: and $6, $4, $11
0
1
10
000
1100
ID: or $13, $6, $7
10000
�or�
IF: add $14, $8, $9
1
1
![Page 28: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/28.jpg)
28
(55)
Load-Use Hazard Detection
• Check when using instruction is decoded in ID stage
• ALU operand register numbers in ID stage are given byv IF/ID.RegisterRs, IF/ID.RegisterRt
• Load-use hazard whenv ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or(ID/EX.RegisterRt = IF/ID.RegisterRt))
• If detected, stall and insert bubble
(56)
Code Scheduling to Avoid Stalls
• Reorder code to avoid use of load result in the next instruction
• C code for A = B + E; C = B + F;
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
stall
stall
lw $t1, 0($t0)
lw $t2, 4($t0)
lw $t4, 8($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
11 cycles13 cycles
![Page 29: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/29.jpg)
29
(57)
How to Stall the Pipeline
• Force control values in ID/EX registerto 0v EX, MEM and WB perform a nop (no-operation)
• Prevent update of PC and IF/ID registerv Using instruction is decoded againv Following instruction is fetched againv 1-cycle stall allows MEM to read data for lw
o Can subsequently forward to EX stage
(58)
Stall/Bubble in the Pipeline
Stall inserted here
![Page 30: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/30.jpg)
30
(59)
Stall/Bubble in the Pipeline
Or, more accurately…
(60)
Datapath with Hazard Detection
Pipeline stage execution time
ALUSrc mux is missing!
![Page 31: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/31.jpg)
31
(61)
Control Hazards (4.8)
• Branch instruction determines flow of controlv Fetching next instruction depends on branch outcomev Pipeline cannot always fetch correct instruction
o Still working on ID stage of branch
• In MIPS pipelinev Need to compare registers and determine the branch
condition
(62)
Branch Hazards
• If branch outcome determined in MEM
PC
Flush theseinstructions(Set controlvalues to 0)
![Page 32: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/32.jpg)
32
(63)
Reducing Branch Delay
• Move hardware to determine outcome to ID stagev Target address adderv Register comparatorv Add IF.Flush signal to squash IF/ID register
• Example: branch taken36: sub $10, $4, $840: beq $1, $3, 7244: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7
...72: lw $4, 50($7)
(64)
Example: Branch Taken
![Page 33: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/33.jpg)
33
(65)
Example: Branch Taken
(66)
Data Hazards for Branches
• If a comparison register is a destination of 2nd or 3rd preceding ALU instruction
…
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
add $4, $5, $6
add $1, $2, $3
beq $1, $4, target
n Can resolve using forwarding to IDn Need to add forwarding logic!
![Page 34: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/34.jpg)
34
(67)
Data Hazards for Branches
• If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instructionv Need 1 stall cycle
beq stalled
IF ID EX MEM WB
IF ID EX MEM WB
IF ID
ID EX MEM WB
add $4, $5, $6
lw $1, addr
beq $1, $4, target
(68)
Data Hazards for Branches
• If a comparison register is a destination of immediately preceding load instructionv Need 2 stall cycles
beq stalled
IF ID EX MEM WB
IF ID
ID
ID EX MEM WB
beq stalled
lw $1, addr
beq $1, $0, target
![Page 35: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/35.jpg)
35
(69)
Delay Slot (MIPS)
• Expose pipeline• Load and jump/branch entail a “delay slot” • The instruction right after the jump or branch is
executed before the jump/branch
jal function_Aadd $4, $5, $6 ; executed before jmplw $12, 8($4) ; executed after return
• Jump/branch and the delay slot instruction are considered “indivisible”
• In the delay slot, the compiler needs to schedule v A useful instruction (either before the jmp, or after the
jmp w/o side effect)v otherwise a NOP
(70)
Branch Prediction
• Longer pipelines cannot readily determine branch outcome earlyv Stall penalty becomes unacceptable
• Predict outcome of branchv Only stall if prediction is wrong
• In MIPS pipelinev Can predict branches not takenv Fetch instruction after branch, with no delay
![Page 36: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/36.jpg)
36
(71)
MIPS with Predict Not Taken
Prediction correct
Prediction incorrect
(72)
1-Bit Predictor: Shortcoming
• Inner loop branches mispredicted twice!
outer: ……
inner: ……beq …, …, inner…beq …, …, outer
n Mispredict as taken on last iteration of inner loop
n Then mispredict as not taken on first iteration of inner loop next time around
![Page 37: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/37.jpg)
37
(73)
2-Bit Predictor: State Machine
• Only change prediction on two successive mispredictions
(74)
More-Realistic Branch Prediction
• Static branch predictionv Based on typical branch behaviorv Example: loop and if-statement branches
o Predict backward branches takeno Predict forward branches not taken
• Dynamic branch predictionv Hardware measures actual branch behavior
o e.g., record recent history of each branchv Assume future behavior will continue the trend
o When wrong, stall while re-fetching, and update history
![Page 38: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/38.jpg)
38
(75)
AMD Bobcat
http://hothardware.com
Later in this course
ECE 6100
ECE 6100
Later in this course
Instruction Level Parallelism (ILP)
(76)
Intel Sandy Bridge
bdti.com
![Page 39: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/39.jpg)
39
(77)
Exceptions and Interrupts (4.9)
• “Unexpected” events requiring changein flow of controlv Different ISAs use the terms differently
• Exceptionv Arises within the CPU
o e.g., undefined opcode, overflow, syscall, …
• Interruptv From an external I/O controller
• Updates to the data pathv Recording the cause of the exception and transferring
control to the OSv Consider the impact of hardware modifications on the
critical path
(78)
Handling Exceptions
• In MIPS, exceptions managed by a System Control Coprocessor (CP0)
• Save PC of offending (or interrupted) instructionv In MIPS: Exception Program Counter (EPC)
• Save indication of the problemv In MIPS: Cause registerv We’ll assume 1-bit
o 0 for undefined opcode, 1 for overflow
• Jump to handler at 80000180
![Page 40: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/40.jpg)
40
(79)
Exception Handling: Operations
• Add two registers to the datapathv EPC and Cause registers
• Add a state for each exception conditionv Use the ALU to compute the EPC contentsv Write the Cause register with exception conditionv Update the PC with OS handler addressv Generate control signals for each operation
• See Appendix A.7 for details of MIPS 2000/3000 implementation
(80)
The OS Interactions
• The MIPS 32 Status and Cause Registers01481531
int
enab
leEx
cep
leve
l
Use
r m
ode
Interrupt Mask
02681531
Exception CodePending Interrupts Branch Delay
• Operating System handlers interrogate these registers• Manage all state saving requirements• Read example in A.7
Status Register
Cause Register
![Page 41: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/41.jpg)
41
(81)
An Alternate Mechanism
• Vectored Interruptsv Handler address determined by the cause
• Example:v Undefined opcode: C000 0000v Overflow: C000 0020v …: C000 0040
• Instructions eitherv Deal with the interrupt, orv Jump to real handler
(82)
Handler Actions
• Read cause, and transfer to relevant handler
• Determine action required
• If restartablev Take corrective actionv use EPC to return to program
• Otherwisev Terminate programv Report error using EPC, cause, …
![Page 42: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/42.jpg)
42
(83)
Exceptions in a Pipeline
• Another form of control hazard
• Consider overflow on add in EX stageadd $1, $2, $1
v Prevent $1 from being clobberedv Complete previous instructionsv Flush add and subsequent instructionsv Set Cause and EPC register valuesv Transfer control to handler
• Similar to mispredicted branchv Use much of the same hardware
(84)
Pipeline with Exceptions
![Page 43: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/43.jpg)
43
(85)
Exception Properties
• Restartable exceptionsv Pipeline can flush the instructionv Handler executes, then returns to the instruction
o Re-fetched and executed from scratch
• PC saved in EPC registerv Identifies causing instructionv Actually PC + 4 is saved
o Handler must adjust
(86)
Exception Example
• Exception on add in40 sub $11, $2, $444 and $12, $2, $548 or $13, $2, $64C add $1, $2, $150 slt $15, $6, $754 lw $16, 50($7)…
• Handler80000180 sw $25, 1000($0)80000184 sw $26, 1004($0)…
![Page 44: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/44.jpg)
44
(87)
Exception Example
(88)
Exception Example
![Page 45: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/45.jpg)
45
(89)
Multiple Exceptions
• Pipelining overlaps multiple instructionsv Could have multiple exceptions at once
• Simple approach: deal with exception from earliest instructionv Flush subsequent instructionsv “Precise” exceptions
• In complex pipelinesv Multiple instructions issued per cyclev Out-of-order completionv Maintaining precise exceptions is difficult!
(90)
Imprecise Exceptions
• Just stop pipeline and save statev Including exception cause(s)
• Let the handler work outv Which instruction(s) had exceptionsv Which to complete or flush
o May require “manual” completion
• Simplifies hardware, but more complex handler software
• Not feasible for complex multiple-issueout-of-order pipelines
![Page 46: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/46.jpg)
46
(91)
Performance
• How do we assess the impact of stall cycles?
• How close do we approach the ideal of one instruction per cycle execution time?
• Back to the CPI model!
(92)
Recall: Program Execution time
~= Instruction_count * CPIavg * clock_cycle_time
algorithms/compiler architecture technology
CPIavg =Clock Cycles
Instruction Count= CPIi ×
Instruction Counti
Instruction Count"
#$
%
&'
i=1
n
∑
Relative frequency
ExecutionTime= Ci ×CPIii=1
n∑
#
$%%
&
'((×cycle_ time
Number of instruction classes
![Page 47: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/47.jpg)
47
(93)
Study Guide
• Given a code block, and initial register values (those that are accessed) be able to determine state of all pipeline registers at some future clock cycle.
• Determine the size of each pipeline register• Track pipeline state in the case of forwarding
and branches• Compute the number of cycles to execute a
code block• Modify the datapath to include forwarding and
hazard detection for branches (this is trickier and time consuming but well worth it)
(94)
Study Guide (cont.)
• Schedule code (manually) to improve performance, for example to eliminate hazards and fill delay slots
• Modify the data path to add new instructions such as j
• Modify the data path to accommodate a two cycle data memory access, i.e., the data memory itself is a two cycle pipelinev Modify the forwarding and hazard control logic
• Given a code sequence, be able to compute the number of stall cycles
![Page 48: Pipelined Datapath - Professional Web Presencepwp.gatech.edu/ece-ece3056-sy/wp-content/uploads/sites/546/2019/02/pipelined-datapath.pdf11 (21) Pipelining Example Instruction memory](https://reader030.fdocuments.us/reader030/viewer/2022040522/5e7cf5a82acfb93925585047/html5/thumbnails/48.jpg)
48
(95)
Study Guide (cont.)
• Track the state of the 2-bit branch predictor over a sequence of branches in a code segment, for example a for-loop
• Show the pipeline state before and after an exception has taken place.
(96)
Glossary
• Branch prediction • Branch hazards• Branch delay Control
hazard• Data hazard• Delay slot • Dynamic instruction
issue • Forwarding• Imprecise exception
• Instruction scheduling
• Instruction level parallelism (ILP)
• Load-to-use hazard• Pipeline bubbles• Stall cycles• Static instruction
issue• Structural hazard