Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by...
Transcript of Anne Bracy CS 3410• Add pipeline registers (flip-flops)for isolation • Each stage begins by...
AnneBracyCS3410
ComputerScienceCornellUniversity
The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.
2
insn0.fetch, dec, exec
Single-cycle
insn1.fetch, dec, exec
Pipelinedinsn0.decinsn0.fetch
insn1.decinsn1.fetchinsn0.exec
insn1.exec
5-stagePipeline• Implementation• WorkingExample
3
Hazards• Structural• DataHazards• ControlHazards
Write-BackMemory
InstructionFetch Execute
InstructionDecode
extend
registerfile
control
4
alu
memory
din dout
addrPC
memory
newpc
inst
IF/ID ID/EX EX/MEM MEM/WB
imm
BA
ctrl
ctrl
ctrl
BD D
M
computejump/branch
targets
+4
5
1 2 3 4 5 6 7 8 9Cycle
Latency:Throughput:
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
Latency: 5cyclesThroughput: 1insn/cycle CPI=1
add
nand
lw
add
sw
• Breakdatapath intomultiplecycles (here5)• Parallelexecutionincreasesthroughput• Balancedpipelineveryimportant
• Sloweststagedeterminesclockrate• Imbalancekillsperformance
• Addpipelineregisters(flip-flops) forisolation• Eachstagebeginsbyreadingvaluesfromlatch• Eachstageendsbywritingvaluestolatch
• Resolvehazards
6
7
Stage PerformFunctionality Latchvaluesofinterest
Fetch UsePCtoindexProgramMemory,incrementPC
Instructionbits(tobedecoded)PC+4(tocomputebranchtargets)
Decode Decodeinstruction,generatecontrolsignals,readregisterfile
Controlinformation,Rdindex,immediates,offsets,register values(Ra,Rb),PC+4(tocomputebranchtargets)
ExecutePerformALUoperationComputetargets(PC+4+offset,etc.)incasethisisabranch,decideifbranchtaken
Controlinformation,Rdindex, etc.ResultofALUoperation,valueincasethisisastoreinstruction
Memory Performload/storeifneeded,addressisALUresult
Controlinformation,Rdindex,etc.Resultofload,passresultfrom execute
Writeback Selectvalue,writetoregisterfile
8
PC
instructionmemory
inst
addr mc
00=readword
IF/ID
Restofp
ipeline
+4
PC+4
pc-sel
pc-regpc-rel
pc-abs•PC+4•pc-reg (PCregisters:JR)•pc-rel (PC-relative: BEQ,BNE)•pc-abs(PCabsolute:JandJAL)
9
ctrl
ID/EX
Restofp
ipeline
PC+4
inst
IF/ID
PC+4
Stage1:InstructionFetch
registerfile
WERd
Ra Rb
DB
A
BA
extend imm
decode
result
dest
Stage2:InstructionDe
code
pc-rel
pc-abs
10
ctrl
EX/MEM
Restofp
ipeline
BD
ctrl
ID/EX
PC+4
BA
alu
+�
branch?im
mpc-sel
pc-reg
target
11
ctrl
MEM/WB
Restofp
ipeline
Stage3:Execute
MD
ctrl
EX/MEM
BD
memory
din doutaddr
mctarget
branch?pc-sel
pc-rel
pc-abs
pc-reg
12
Stage4:M
emory
ctrl
MEM/WB
MD
result
dest
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addrinst
PC+4
BA
Rt
BD
MD
PC+4
imm
ctrl
target
OP
Rd
OP
PC
instmem
Rd
Ra Rb
DB
A
Rd
13
Consideranon-pipelinedprocessorwithclockperiodC (e.g.,50ns).IfyoudividetheprocessorintoN stages(e.g.,5),yournewclockperiodwillbe:
A. CB. NC. lessthanC/ND. C/NE. greaterthanC/N
14
• Instructionssamelength• 32bits,easytofetchandthendecode
• 3typesofinstructionformats• Easytoroutebitsbetweenstages• Canreadaregistersourcebeforeevenknowing
whattheinstructionis• Memoryaccessthroughlwandswonly
• AccessmemoryafterALU
15
5-stagePipeline• Implementation• WorkingExample
16
Hazards• Structural• DataHazards• ControlHazards
add r3 ß r1, r2 nand r6 ß r4, r5 lw r4 ß 20(r2)add r5 ß r2, r5sw r7 à 12(r3)
Assume8-registermachine
17
data
dest
IF/ID ID/EX EX/MEM MEM/WB
extend
0MUX
0
Time:018
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
nop
0
0
0
040
0
nop
0
0
nop
0
0
0
0
add312
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits26-31
data
dest
Fetch:add312
add312
IF/ID ID/EX EX/MEM MEM/WB
extend
0MUX
0
Time:119
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
add
3
9
36
480
0
nop
0
0
nop
0
0
0
0nand645
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
12
Bits26-31
data
dest
Fetch:nand645
nand 645 add312
IF/ID ID/EX EX/MEM MEM/WB
extend
2MUX
3
Time:220
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
nand
6
7
18
8124
45
add
3
9
nop
0
0
0
0lw420(2)
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
45
Bits26-31
data
dest
Fetch:lw420(2)
lw420(2) nand 645 add312
36
9
3
IF/ID ID/EX EX/MEM MEM/WB
extend
5MUX
6 32
Time:321
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
lw
20
18
9
12168
-3
nand
6
7
add
3
45
0
0add525
912187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
24
Bits26-31
data
dest
Fetch:add525
add525 lw420(2) nand 645 add312
18
7
6
45
3
IF/ID ID/EX EX/MEM MEM/WB
extend
4MUX
0 65
Time:4
nand
18=0100107=000111-------------------3=111101
22
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
add
5
7
9
162012
29
lw
4
18
nand
6
-3
0
0sw712(3)
945187
36
41
0
22
R2
R3
R4
R5
R1
R6
R0
R7
25
Bits26-31
data
dest
Fetch:sw712(3)
sw712(3) add525 lw420(2) nand 645add312
9
20
4
-3
6
45
3
IF/ID ID/EX EX/MEM MEM/WB
extend
5MUX
5 04
Time:523
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
sw
12
22
45
2016
16
add
5
7
lw
4
29
99
0945187
36
-3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
37
Bits26-31
data
dest
Nomoreinstructions
nop sw712(3) add525 lw420(2) nand 645
9
7
5
29
4
-3
6
IF/ID ID/EX EX/MEM MEM/WB
extend
7MUX
0 55
Time:624
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
20
57
sw
7
22
add
5
16
0
0945997
36
-3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits26-31
data
dest
Nomoreinstructions
nop nop sw712(3) add525 lw420(2)
45
7
12
16
5
99
4
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
07
Time:725
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
sw
7
57
0
9459916
36
-3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits26-31
data
dest
Nomoreinstructions
nop nop nop sw712(3) add525
2257
22
16
5
SlidesthankstoSallyMcKee
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
Time:826
PC
Registerfile
MUXA
LU
MUX
4
Datamem
+
MUX
Bits11-15Bits16-20
9459916
36
-3
0
22
R2
R3
R4
R5
R1
R6
R0
R7
Bits21-23
data
dest
Nomoreinstructions
nop nop nop nop sw712(3)
IF/ID ID/EX EX/MEM MEM/WB
extend
MUX
Time:927
Pipeliningisgreatbecause:
A. Youcanfetchanddecodethesameinstructionatthesametime.
B. Youcanfetchtwoinstructionsatthesametime.C. Youcanfetchoneinstructionwhiledecoding
another.D. Instructionsonlyneedtovisitthepipeline
stagesthattheyrequire.E. CandD
28
5-stagePipeline• Implementation• WorkingExample
29
Hazards• Structural• DataHazards• ControlHazards
Correctnessproblemsassociatedw/processordesign
1. StructuralhazardsSameresourceneededfordifferentpurposesatthesametime(Possible:ALU, RegisterFile,Memory)
2. DatahazardsInstructionoutputneededbeforeit’savailable
3. ControlhazardsNextinstructionPCunknownattimeofFetch
30
31
addr3,r2,r1nopnop
addr6,r5,r4
datamem
instmem
DB
A
IF ID Ex M WIF ID Ex M W
IF ID Ex M W
add r3, r2,r1nopaddr6,r5,r4
Problem: NeedtoreadfromandwritetoRegisterFileatthesametimeSolution: negateRFclock:writefirsthalf,readsecondhalf
nop
IF ID Ex M W
Dependence:relationshipbetweentwoinsns• Data:twoinsnsusesamestoragelocation• Control: 1insnaffectswhetheranotherexecutesatall• Notabadthing,programswouldbeboring otherwise• Enforcedbymakingolderinsngobeforeyoungerone
– Happensnaturallyinsingle-/multi-cycledesigns– Butnotinapipeline
Hazard:dependence&possibilityofwronginsnorder• Effectsofwronginsnordercannotbeexternallyvisible• Hazardsareabadthing:mostsolutionseithercomplicatethehardwareorreduceperformance
32
DataHazards• registerfile(RF)readsoccurinstage2(ID)• RFwritesoccurinstage5(WB)• RFwrittenin½half,readinsecond½halfofcycle
x10: add r3 ß r1, r2x14: sub r5 ß r3, r4
1.Isthereadependence?2.Isthereahazard?
33
A) YesB) NoC) Cannottellwiththe
informationgiven.
Whichofthefollowingstatementsistrue?
A.Whetherthereisadatadependencebetweentwoinstructionsdependsonthemachinetheprogramisrunningon.B.Whetherthereisadatahazardbetweentwoinstructionsdependsonthemachinetheprogramisrunningon.C.BothA&BD.NeitherAnorB
34
35
IF ID MEM
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clockcycle1 2 3 4 5 6 7 8 9
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
addr3,r1,r2
time
WBX
X
X
X
X
36
IF ID MEM
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clockcycle1 2 3 4 5 6 7 8 9
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
addr3,r1,r2
time
WBX
X
X
X
X
backwardsarrowsrequiretimetravel
37
IF ID MEM
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clockcycle1 2 3 4 5 6 7 8 9
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
addr3,r1,r2
time
WBX
X
X
X
X
38
IF ID MEM
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
IF ID MEM WB
Clockcycle1 2 3 4 5 6 7 8 9
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
addr3,r1,r2
time
WBX
X
X
X
X
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addrinst
PC+4
OP
BA
Rt
BD
MD
PC+4
imm
OP
Rd
OP
Rd
PC
instmem
Rd
Ra Rb
DB
A
Rd
Detecting Data Hazards
IF/ID.Ra ≠0?
39
Ra==? Ra==
?
add r3, r1, r2subr5,r3,r4
Stall=(IF/ID.Ra !=0&& (IF/ID.Ra ==ID/EX.Rd||IF/ID.Ra ==EX/M.Rd))
1. DoNothing• ChangetheISAtomatchimplementation• “Heycompiler:don’tcreatecodew/datahazards!”
(Wecandobetterthanthis)
2. Stall• Pausecurrentandsubsequentinstructionstillsafe
3. Forward/bypass• Forwarddatavaluetowhereitisneeded
(Onlyworksifvalueactuallyexistsalready)
40
HowtostallaninstructioninIDstage• preventIF/IDpipelineregisterupdate
– stallstheIDstageinstruction
• convertIDstageinsn intonop forlaterstages– innocuous“bubble”passesthroughpipeline
• preventPCupdate– stallsthenext(IFstage)instruction
41
IF/ID
+4
ID/EX EX/MEM MEM/WB
mem
din dout
addr
PC
instmem
Rd
Ra Rb
DB
A
42
Rd
addr3,r1,r2subr5,r3,r5orr6,r3,r4addr6,r3,r8
inst
PC+4
OP
BA
Rt
BD
MD
PC+4
imm
OP
Rd
OP
Rd
Ifhazard:
WE=0MemWr=0RegWr=0
detecthazard
43
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
addr3,r1,r2
(MemWr=0RegWr=0)
NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))
subr5,r3,r5
orr6,r3,r4 (WE=0)
STALLCONDITIONMET
44
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
nop
(MemWr=0RegWr=0)
NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))
addr3,r1,r2subr5,r3,r5
(MemWr=0RegWr=0)
orr6,r3,r4 (WE=0)
STALLCONDITIONMET
45
datamem
B
A
B
D
M
Dinstmem
DrD B
A
Rd RdRd
WE
WE
Op
WE
Op
rA rB
PC
+4
Opnop
inst
/stall
nop
NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))
addr3,r1,r2subr5,r3,r5
(MemWr=0RegWr=0)
orr6,r3,r4 (WE=1)NOSTALLCONDITIONMET:suballowedtoleavedecodestage
nop
46
Clockcycle1 2 3 4 5 6 7 8
addr3,r1,r2
subr5,r3,r5
or r6,r3,r4
addr6,r3,r8
time
47
Clockcycle1 2 3 4 5 6 7 8
addr3,r1,r2
subr5,r3,r5
or r6,r3,r4
addr6,r3,r8
r3=10
r3=20
time
IF ID Ex M W
IF ID Ex M W
IF ID Ex M
ID* ID*
IF* IF*
IF ID Ex
2StallCycles
1. DoNothing• ChangetheISAtomatchimplementation• “Compiler:don’tcreatecodewithdatahazards!”
(Nicetry,wecandobetterthanthis)
2. Stall• Pausecurrentandsubsequentinstructionstillsafe
3. Forward/bypass• Forwarddatavaluetowhereitisneeded
(Onlyworksifvalueactuallyexistsalready)
48
49
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
IF/ID ID/Ex Ex/Mem Mem/WB
50
datamem
imm
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MC
Ra
MC
forwardunit
detecthazard
Twotypesofforwarding/bypass• ForwardingfromEx/Mem registerstoExstage(M®Ex)• ForwardingfromMem/WBregistertoExstage(W® Ex)
IF/ID ID/Ex Ex/Mem Mem/WB
51
addr3,r1,r2
subr5,r3,r1
datamem
instmem
DB
A
IF ID Ex M W
IF ID Ex M W
addr3,r1,r2subr5,r3,r1
Problem:EXneedsALUresultthatisinMEMstageSolution:addabypassfromEX/MEM.DtostartofEX
Ex/Mem
52
datamem
instmem
DB
A
DetectionLogicinExStage:forward=(Ex/M.WE&&EX/M.Rd !=0&&
ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)
addr3,r1,r2subr5,r3,r1
Ex/Mem
53
addr3,r1,r2
subr5,r3,r1
orr6,r3,r4
datamem
instmem
DB
A
IF ID Ex M WIF ID
IF WEx M WID Ex M
Problem:EXneedsvaluebeingwrittenbyWBSolution:AddbypassfromWBfinalvaluetostartofEX
Mem/WB
add r3, r1,r2subr5,r3,r1orr6,r3,r4
54
datamem
instmem
DB
A
DetectionLogic:forward=(M/WB.WE&&M/WB.Rd !=0&&
ID/Ex.Ra ==M/WB.Rd &¬(ID/Ex.WE &&Ex/M.Rd !=0&&
ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)
Mem/WB
add r3, r1,r2subr5,r3,r1orr6,r3,r4
55
Clockcycle1 2 3 4 5 6 7 8
addr3,r1,r2
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
time
56
Clockcycle1 2 3 4 5 6 7 8
addr3,r1,r2
sub r5,r3,r4
lwr6,4(r3)
or r5,r3,r5
sw r6,12(r3)
IF ID Ex M W
IF ID
IF W
Ex M W
ID Ex M
IF ID Ex
time
M W
IF ID Ex M W
Datadependencyafteraloadinstruction:• ValuenotavailableuntilaftertheMstageàNextinstructioncannotproceedifdependent
THEKILLERHAZARD57
datamem
instmem
DB
A
lwr4,20(r8)orr6,r3,r4
58
lwr4,20(r8)
or r6,r3,r4
datamem
instmem
DB
A
lwr4,20(r8)orr6,r4,r1
59
lwr4,20(r8)
or r6,r3,r4
datamem
instmem
DB
A
IF ID Ex
IF ID
lwr4,20(r8)orr6,r4,r1
60
datamem
instmem
DB
A
NOPorr6,r4,r1 lwr4,20(r8)
lwr4,20(r8)
or r6,r3,r4
IF ID Ex M W
IF ID Ex M WID*Stall
61
datamem
instmem
DB
A
NOPorr6,r4,r1 lwr4,20(r8)
Ex
lwr4,20(r8)
or r6,r3,r4
IF ID Ex M W
IF ID Ex M WID*Stall
62
datamemim
m
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MCRa
MC
forwardunit
detecthazard
IF/ID ID/Ex Ex/Mem Mem/WB
Stall=If(ID/Ex.MemRead &&IF/ID.Ra ==ID/Ex.Rd
RdMC
Mostfrequent3410non-solutiontoload-usehazardsWhyisthis“solution”sosososososoawful? 63
datamemim
m
B
A
B
D
M
Dinstmem
DB
A
Rd Rd
Rb
WE
WE
MCRa
MC
forwardunit
detecthazard
IF/ID ID/Ex Ex/Mem Mem/WB
RdMC
ForwardingvaluesdirectlyfromMemorytotheExecutestagewithoutstoringtheminaregisterfirst:
A. Doesnotremovetheneedtostall.B. AddsonetoomanypossibleinputstotheALU.C. Willcausethepipelineregistertohavethe
wrongvalue.D. Halvesthefrequencyoftheprocessor.E. BothA&D
64
TwoMIPSSolutions:• MIPS2000/3000:delayslot
– ISAsaysresultsofloadsarenotavailableuntilonecyclelater
–Assemblerinsertsnop,orreorderstofilldelayslot
• MIPS4000onwards:stall– Butreally,programmer/compilerreorderstoavoidstallingintheloaddelayslot
65
5-stagePipeline• Implementation• WorkingExample
66
Hazards• Structural• DataHazards• ControlHazards
67
for (i = 0; i < max; i++) {n += 2;
}i = 7;n--;
x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 addi r1, r0, 7 # i = 7x24 subi r2, r2, 1 # n++
i à r1Assume:n à r2max à r3
ControlHazards• instructionsarefetchedinstage1(IF)• branchandjumpdecisionsoccurinstage3(EX)à nextPCnotknownuntil2cycles after branch/jump
x1C blt r1, r3, Loop x20 addi r1, r0, 7
x24 subi r2, r2, 1
68
Branchnot taken?NoProblem!
Branchtaken?Justfetched2addi’sà Zap&Flush
69
1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1
datamem
instmem D
B
A
PC
+4
NOPIF ID Ex M W
IF ID NOP NOPNOPIF NOP NOP NOP
branchcalc
decidebranch
IF ID Ex M W
IfbranchTaken®Zap
• preventPCupdate• clearIF/IDlatch• branchcontinues
NewPC=1C
14 L:addi r2,r2,2
70
1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1
datamem
instmem D
B
A
PC
+4
NOPIF ID Ex M W
IF ID NOP NOPNOPIF NOP NOP NOP
branchcalc
decidebranch
IF ID Ex M W
IfbranchTaken®Zap
• preventPCupdate• clearIF/IDlatch• branchcontinues
NewPC=1C
14 L:addi r2,r2,2
Foreverytakenbranch?OUCH!!!
1. DelaySlot• YouMUSTdothis• MIPSISA:1insn afterctrlinsn always executed
• Whetherbranchtakenornot
2. ResolveBranchatDecode• SomegroupsdothisforProject2,yourchoice• Movebranchcalc fromEXtoID• Alternative:justzap2nd instructionwhenbranchtaken
3. BranchPrediction• Notin3410,buteveryprocessorworthanythingdoesthis
(nooffense!)
71
datamem
instmem D
B
A
PC
+4
branchcalc
decidebranchNewPC=1C
1C blt r1, r3, Loop F D X
20 addi r1, r0, 7 F D
24 subi r2, r2, 1 F
73
for (i = 0; i < max; i++) {n += 2;
}i = 7;n--;x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nopx24 addi r1, r0, 7 # i = 7x28 subi r2, r2, 1 # n++
i à r1Assume:n à r2max à r3
datamem
instmem D
B
A
PC
+4
branchcalc
decidebranchNewPC=1C
1C blt r1, r3, Loop F D X
20 nop F D
24 addi r1, r0, 7 F
75
datamem
instmem D
B
A
PC
+4
NewPC=1C
branchcalc
decidebranch
1C blt r1, r3, Loop F D X
20 nop F D
14 Loop:addi r2,r2,2 F
76
x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nop
x10 addi r1, r0, 0 # i=0x14 Loop: addi r1, r1, 1 # i++x18 blt r1, r3, Loop # i<max?x1C addi r2, r2, 2 # n x+= 2
Compiler transforms code
77
datamem
instmem D
B
A
PC
+4
NewPC=1C
branchcalc
decidebranch
1C blt r1, r3, Loop F D X
20 addi r2,r2,2 F D
14 Loop:addi r1,r1,1 F
MostprocessorsupportSpeculativeExecution• Guess directionofthebranch
– Allowinstructionstomovethroughpipeline– Zapthemlaterifguessturnsouttobewrong
• Amustforlongpipelines
78
Datahazardsoccurwhenaoperand(register)dependsontheresultofapreviousinstructionthatmaynotbecomputedyet.Pipelinedprocessorsneedtodetectdatahazards.
Stalling,preventingadependentinstructionfromadvancing,isonewaytoresolvedatahazards.StallingintroducesNOPs(“bubbles”)intoapipeline.IntroduceNOPsby(1)preventingthePCfromupdating,(2)preventingwritestoIF/IDregistersfromchanging,and(3)preventingwritestomemoryandregisterfile.Nops significantlydecreaseperformance.
Forwardingbypassessomepipelinedstagesforwardingaresulttoadependentinstructionoperand(register).Betterperformancethanstalling.
79
ControlhazardsoccurbecausethePCfollowingacontrolinstructionisnotknownuntilcontrolinstructionisexecuted.Ifbranchistakenà needtozapinstructions.1cycleperformancepenalty.
DelaySlotscanpotentiallyincreaseperformanceduetocontrolhazards.Theinstructioninthedelayslotwillalwaysbeexecuted.Requiressoftware(compiler)tomakeuseofdelayslot.Putnop indelayslotifnotabletoputusefulinstructionindelayslot.
WecanreducecostofacontrolhazardbymovingbranchdecisionandcalculationfromExstagetoIDstage.Withadelayslot,thisremovestheneedtoflushinstructionsontakenbranches.
80