EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
-
date post
20-Dec-2015 -
Category
Documents
-
view
227 -
download
0
Transcript of EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.
![Page 1: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/1.jpg)
EECS 470Pipeline Control Hazards
Lecture 5Coverage: Chapter 3 & Appendix A
![Page 2: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/2.jpg)
Pipeline function for BEQ
• Fetch: read instruction from memory
• Decode: read source operands from reg
• Execute: calculate target address and test for equality
• Memory: Send target to PC if test is equal
• Writeback: Nothing left to do
![Page 3: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/3.jpg)
Control Hazards
beq 1 1 10sub 3 4 5
time
fetch decode execute memory writeback
fetch decode execute
beq
sub
![Page 4: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/4.jpg)
Approaches to handling control hazards
• Avoidance– Make sure there are no hazards in the code
• Detect and Stall– Delay fetch until branch resolved.
• Speculate and Squash if wrong– Go ahead and fetch more instruction in case
it is correct, but stop them if they shouldn’t have been executed
![Page 5: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/5.jpg)
Handling branch hazards: avoid all hazards
• Don’t have branch instructions! – Maybe a little impractical
• Predication can eliminate some branches– If-conversion– Hyperblocks
![Page 6: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/6.jpg)
if-conversion
if (a == b) { x++; y = n / d;}
sub t1 a, bjnz t1, PC+2add x x, #1div y n, d
sub t1 a, badd(t1) x x, #1div(t1) y n, d
sub t1 a, badd t2 x, #1div t3 n, dcmov(t1) x t2cmov(t1) y t3
![Page 7: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/7.jpg)
Removing hazards by refining a branch instruction
• Redefine branch instructions: ptbeq regA regB offset
prepare to branch if equal
If (R[regA] = = R[regB]) execute instructions at PC+1, PC+2, PC+3 then PC+1+offset
![Page 8: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/8.jpg)
ptbnz example
t = 5n = 7g = c + 2bnz g, PC + 1m = 5a = 3
g = c + 2bnz g, PC + 4t = 5n = 7noopm = 5a = 3
![Page 9: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/9.jpg)
Problems with this solution
• Old programs (legacy code) may not run correctly on new implementations– Longer pipelines tend to need more noops
• Programs get larger as noops are included– Especially a problem for machines that try to execute
more than one instruction every cycle– Harder to find useful instructions
• Program execution is slower– CPI is one, but some I’s are noops
![Page 10: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/10.jpg)
Handling control hazards: detect and stall
• Detection:– Must wait until decode– Compare opcode to beq or jalr– Alternately, this is just another control signal
• Stall:– Keep current instructions in fetch– Pass noop to decode stage (not execute!)
![Page 11: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/11.jpg)
PC Instmem
REGfile
MUXA
LU
MUX
1
Datamemory
++
MUX
IF/ID
ID/EX
EX/Mem
Mem/WB
signext
Control
bnz r1
![Page 12: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/12.jpg)
PC Instmem
REGfile
MUXA
LU
MUX
1
Datamemory
++
MUX
IF/ID
ID/EX
EX/Mem
Mem/WB
signext
Control
noop
MUX
![Page 13: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/13.jpg)
Control Hazards
beq 1 1 10sub 3 4 5
time
fetch decode execute memory writeback
fetch fetch fetch
beq
sub fetch
or
fetchTarget:
![Page 14: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/14.jpg)
Problems with detect and stall• CPI increases every time a branch is detected!
• Is that necessary? Not always!– Only about ½ of the time is the branch taken
• Let’s assume that it is NOT taken…– In this case, we can ignore the beq (treat it like a noop)– Keep fetching PC + 1
• What if we are wrong?– OK, as long as we do not COMPLETE any instructions we
mistakenly executed (i.e. don’t perform writeback)
![Page 15: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/15.jpg)
Handling data hazards: speculate and squash
• Speculate: assume not equal– Keep fetching from PC+1 until we know that
the branch is really taken
• Squash: stop bad instructions if taken– Send a noop to:
• Decode, Execute and Memory
– Send target address to PC
![Page 16: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/16.jpg)
PC REGfile
MUXA
LU
MUX
1
Datamemory
++
MUX
IF/ID
ID/EX
EX/Mem
Mem/WB
signext
Control
equal
MUX
beqsubaddnand
add
sub
beq
beq
Instmem
noop
noop
noop
![Page 17: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/17.jpg)
Problems with fetching PC+1
• CPI increases every time a branch is taken!– About ½ of the time
• Is that necessary?
No!, but how can you fetch from the targetbefore you even know the previous instructionis a branch – much less whether it is taken???
![Page 18: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/18.jpg)
PC Instmem
REGfile
MUXA
LU
MUX
1
Datamemory
++
MUX
IF/ID
ID/EX
EX/Mem
Mem/WB
signext
Control
beq
bpc
MUX
target
targ
et
eq?
![Page 19: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/19.jpg)
Branch Target Buffer
Fetch PC
Predicted target PC
Send PCto BTB
found?
Yes
usetarget
usePC+1
No
![Page 20: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/20.jpg)
Branch prediction
• Predict not taken: ~50% accurate– No BTB needed; always use PC+1
• Predict backward taken: ~65% accurate– BTB holds targets for backward branches (loops)
• Predict same as last time: ~80% accurate– Update BTB for any taken branch
![Page 21: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/21.jpg)
What about indirect branches?
• Could use same approach– PC+1 unlikely indirect target– Indirect jumps often have multiple targets (for
same instruction)• Switch statements• Virtual function calls• Shared library (DLL) calls
![Page 22: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/22.jpg)
Indirect jump: Special Case
• Return address stack– Function returns have deterministic behavior
(usually)• Return to different locations (BTB doesn’t work well)• Return location known ahead of time
– In some register at the time of the call
– Build a specialize structure for return addresses• Call instructions write return address to R31 AND RAS• Return instructions pop predicted target off stack
– Issues: finite size (save or forget on overflow?);– Issues: long jumps (clear when wrong?)
![Page 23: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/23.jpg)
Branch prediction
• Pentium: ~85% accurate
• Pentium Pro: ~92% accurate
• Best paper designs: ~96% accurate
![Page 24: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/24.jpg)
Costs of branch prediction/speculation
• Performance costs?– Minimal: no difference between waiting and squashing; and it is
a huge gain when prediction is correct!
• Power?– Large: in very long/wide pipelines many instructions can be
squashed• Squashed = # mispredictions pipeline length/width before target
resolved
• Area?– Can be large: predictors can get very big as we will see next
time
• Complexity?– Designs are more complex– Testing becomes more difficult
![Page 25: EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.](https://reader036.fdocuments.us/reader036/viewer/2022062407/56649d415503460f94a1c0f0/html5/thumbnails/25.jpg)
What else can be speculated?
• Dependencies– I think this data is coming from that store instruction)
• Values – I think I will load a 0 value
• Accuracy?– Branch prediction (direction) is Boolean (T,NT)– Branch targets are stable or predictable (RAS)– Dependencies are limited– Values cover a huge space (0 – 4B)