CS222: Pipeline: Branch Performance · 2017. 4. 12. · Pipeline: Branch Performance &...
Transcript of CS222: Pipeline: Branch Performance · 2017. 4. 12. · Pipeline: Branch Performance &...
CS222CS222: Pipeline: Branch PerformancePipeline: Branch Performance
& Superscalar/VLIW
Dr. A. Sahu
Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.
Indian Institute of Technology Guwahati
Outline• Improving Branch Performance
P i Cl B h Eli i i B h–Previous Class : Branch Elimination, Branch Speed up
–Branch Prediction• Fixed, Static, DynamicFixed, Static, Dynamic
–Branch target capture • BTB, BTAC, BTIC
• Introduction to VLIW and Superscalarp
Improving Branch Performance
• Branch EliminationBranch Elimination– replace branch with other instructions
• Branch Speed Upp p– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Branch EliminationBranch Elimination
Use conditional instructions
C(predicated execution)
T
F
S C : S
OP1BC CC Z 2
OP1BC CC = Z, ∗ + 2ADD R3, R2, R1OP2
ADD R3, R2, R1, NZOP2
Branch Speed Up : p pearly target address generation
• Assume each instruction is Branch• Assume each instruction is Branch
• Generate target address while decoding
• If target in same page omit translation
• After decoding discard target address if not Branch
IF IF IF D TIF TIF TIFAG
BC
Branch Speed Up : p pincrease CC ‐ branch gap
Increase the gap between condition checkingIncrease the gap between condition checking and branching
l• Early CC setting
• Delayed branch
Improving Branch Performance
• Branch EliminationBranch Elimination– replace branch with other instructions
• Branch Speed Upp p– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Branch PredictionBranch Prediction
• Treat conditional branches as unconditionalTreat conditional branches as unconditional branches / NOP
• Undo if necessary• Undo if necessary
Strategies:– Fixed (always guess inline or guess target)
– Static (guess on the basis of instruction type)
– Dynamic (guess based on recent history)
Static Branch Prediction
Instr % Guess Branch CorrectInstr % Guess Branch Correct
uncond 14.5 always 100% 14.5%
cond 58 never 54% 27%
loop 9.8 always 91% 9%
call/ret 17 7 always 100% 17 7%call/ret 17.7 always 100% 17.7%
Total 68.2%Total 68.2%
B h P di tiBranch Prediction: (guess inline, go inline)CC
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIFI‐1
CC
IF IF D AG AG TIF TIF
IF IF DI+1
I
IF IF D
I+1
I+2delay = 0
I+2
B h P di tiBranch Prediction: guess inline, goto target
CC
IF IF D AG AG DF DF EX EXI‐1
CC
IF IF D AG AG TIF TIF
IF IF D’ D AG
I
IF IF D D AG
IF IF’ D’ IF IF D
T
T+1 IF IF D IF IF D
delay = 6T+1
B h P di tiBranch Prediction: guess target, go inline
CCIF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
I‐1CC
IF IF D AG AG TIF TIFI
T D
D’ DI+1
T
D’ DI+2
delay = 5
B h P di tiBranch Prediction: guess target, goto target
CC
IF IF D AG AG DF DF EX EXI‐1
CC
IF IF D AG AG TIF TIF
IF IF D’ D AG
I
IF IF D D AG
IF IF’ D’ IF IF D
T
T+1 IF IF D IF IF D
delay = 4T+1
S diti l b hSame as unconditional branch
Static prediction strategyStatic prediction strategy
Let p = probability of taking branchp p y g
guess target: delayt = 4 p + 5 (1 ‐ p) = 5 ‐ p
guess inline: delay 6 p + 0 (1 p) 6 pguess inline: delayi = 6 p + 0 (1 ‐ p) = 6 p
⇒ if (delayt < delayi) guess targetelse guess inline
(delayt < delayi) ⇒ 5 ‐ p < 6 p( yt yi) p p
⇒ p > 5/7 = .71
Static prediction strategy ‐p gythresholds for different instructions
CC
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
I‐1
CC
actual→ T I
IF IF D AG AG TIF TIFI
guess T 4 5
↓ I 6 0
guess target if 4 p + 5 (1 ‐ p) < 6 p + 0 (1 ‐ p)
i e p > 71i.e. p > .71
Static prediction strategy ‐p gythresholds for different instructions
CCIF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF EX EX
I‐1CC
actual→ T I
IF IF D AG AG TIF TIF EX EXILoop control
guess T 4 6
↓ I 7 1
guess target if 4 p + 6 (1 ‐ p) < 7 p + 1 (1 ‐ p)
i e p > 62i.e. p > .62
Static prediction strategy ‐p gythresholds for different instructions
CC
IF IF D AG AG DF DF EX EX
IF IF D AG TIF TIF
I‐1
CC
actual→ T I
IF IF D AG TIF TIFIregister address
guess T 3 5
↓ I 6 0guess target if 3 p + 5 (1 ‐ p) < 6 p + 0 (1 ‐ p)
i e p > 62i.e. p > .62
Dynamic Branch Prediction
Dynamic Branch Prediction ‐ybasic idea
Predict based on the history of previous branchPredict based on the history of previous branch
loop: xxx 2 miss‐predictions
fxxx for every
xxx occurrence
xxx
BC loopBC loop
Dynamic Branch Prediction ‐y2 bit prediction scheme
N
0 1
T
T
0/1 3/2T
N
N
T Npredict taken predict not taken
2 3
T
N
Dynamic Branch Prediction ‐ysecond scheme
Predict based on the history of previous nPredict based on the history of previous nbranches e.g., if n = 3 then
3 branches taken⇒ predict taken3 branches taken ⇒ predict taken
2 branches taken ⇒ predict taken
1 branch taken ⇒ predict not takenp
0 branches taken ⇒ predict not taken
Dynamic Branch Prediction ‐yBimodal predictor
Maintain saturating counters
0 1 2 3
T T TTN
N N N
One counter per branch orOne counter per cache line -
merge results if multiple branchesmerge results if multiple branches
Dynamic Branch Prediction ‐yHistory of last n occurrences
current entry updated entrycurrent entry updated entry
outcome of lastthree occurrences t l t
1 1 0 1 1 1three occurrencesof this branch
actual outcome‘taken’
0 : not taken1 : taken
prediction using majority decision
Correlation between branchesCorrelation between branches
B1: if (x) • B3 can be predictedB1: if (x)
...
• B3 can be predicted with 100% accuracy
B2: if (y) based on the outcomes of B1 and
...
z = x && y
outcomes of B1 and B2
z = x && y
B3: if (z)
...
Improving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
Branch Target CaptureBranch Target Capture• Branch Target Buffer (BTB)• Target Instruction Buffer (TIB)• Target Instruction Buffer (TIB)
instr addr pred stats targettarget addrprob of target change < 5% target addrtarget instr
prob of target change < 5%
BTB PerformanceBTB Performance
BTB missgo inline
BTB hitgo to target
decision4 6go inline
inline
go to target
result target inline target
.4 .6
dela 0 5 4 0
.8 .2 .2 .8
delay 0 5 4 0
.4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*00 88= 0.88
BTC: Structure of TablesBTC: Structure of Tables
Instruction fetch path withInstruction fetch path with
• BTAC (Branch Target Add Cache)( g )
• BTIC (Branch Target Ins Cache)
Compute/fetch scheme(no dynamic branch prediction)
IF
InstructionFetch address
BTA
A I I + 1 I + 2 I + 3
I ‐ cache
FAR
Compute
IIFA
++BTA
Next sequentialaddress BTI BTI+1 BTI+2 BTI+3BTI BTI+1 BTI+2 BTI+3
BTAC scheme
IF
InstructionFetch address
BTA
A I I + 1 I + 2 I + 3BA BTA
I ‐ cache
FAR
IIFABTAC
++
Next sequentialaddress BTI BTI+1 BTI+2 BTI+3BTI BTI+1 BTI+2 BTI+3
BTIC scheme ‐ 1BTIC scheme 1
IF
InstructionFetch address
BTA
A IBA BTI BTA+
I ‐ cache
FAR
IIFABTIC
++
Next sequentialaddress
To decoder
Superscalar/VLIWp /• Instruction level parallelism
• EndSem Exam : Covers only post Midsem part• EndSem Exam : Covers only post Midsem part
• VLIW (Intel Itanium, TI OMAP)
• Superscalar (Pentium, Athlon)– Parallel Issue, Parallel Decode
– Dependency Check (Reservation Station, Renaming)
– Parallel Execute, Serial Commit