Dynamic Predication
description
Transcript of Dynamic Predication
10/25/2006 1
Dynamic Predication
ACAL Group Seminar
Alok Garg
10/25/2006 2
What is Predicated Execution?
Conditional instruction Executed : if condition is true NOP: if condition is false
Eliminate simple branches If(A==0) { S = T}
Convert control dependencies into data dependencies
BNEZ R1, LADDU R2, R3, R0L:
CMOVZ R2, R3, R1
10/25/2006 3
Simple Example
A
B C
D
E
Normal Execution
A [B D E] C D E
Pipeline flushdue to misprediction
Predicted Execution
A [C[!p] B[p]] D E
Conditional instructions
T NT
Limitations of software predication:
1. If branch is NT 98% of time 2. Delayed execution of blocks B or C
10/25/2006 4
Limitations of Predication
ISA support Predicate registers Predicated instructions
Performance overhead Instruction fetch from both paths Can not execute predicated instructions until the predicate
value is resolved Ideal predication speedup - 16.4%
Only small subset of control-flow graph is covered Compiler cannot if-convert Complex control-flow Ideal predication for all conditional branches – 37.4%
10/25/2006 5
Motivation
Some branches are still very hard to predict with conventional branch predictors
Mispredictions lead to costly pipeline flushes Performance Energy
Predication is used to avoid pipeline flushes for those hard to predict branches
10/25/2006 6
Paper Covered
Dynamic Hammock Predication for Non-predicated Instruction Set Architecture. Artur Klauser, Todd Austin, Dirk Gruwald, and Brad Calder – Pact 1998
Wish Branches: Combining Conditional Branching with Predication for Adaptive Predicated Execution. Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt – MICRO 2005, IEEE MICRO TOP PICKS 2006
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths. Hyesoon Kim, Jose A. Joao, Onur Mutlu, and Yale N. Patt – MICRO 2006
10/25/2006 7
Type of Control-flow graphs
A
B C
D
E
Simple hammock
F
A
B C
E
H
Nested hammock
I
D F G
A
B C
D
E
Frequently hammock
H
G F
10/25/2006 8
Type of Control-flow graphs
L
A
B
Loop
C
A
B C
E
Non-merging control flow
D F G
10/25/2006 9
Distribution of mispredicted branches
Simple + Nested : 16 % of all mispredictions All except non-merging: 66 % of all mispredictions
10/25/2006 10
Dynamic Hammock Predication
Target first limitation of software predication Get rid of ISA support required
Dynamic predication for simple hammock 11% of all mispredictions Compiler support to mark simple hammock boundaries
Predication decision Dynamic decision Static profile based
10/25/2006 11
Support for Dynamic Predication
a) R1 := …b) R2 := …c) R3 := …d) R4 := …e) B - cc (i)
Fork Context
f) R1 := R1 + R2g) R3 := R1 x 2h) BR (k)
i) R2 := R1 – R2j) R3 := R2 x 2
k) RA := R1l) RB := R2m) RC := R3n) RD := R4
Then Contextcc is false
Else Contextcc is true
Join Context
10/25/2006 12
Support for Dynamic Predication
a) R1.a := …b) R2.b := …c) R3.c := …d) R4.d := …e) PL.e
Fork Context
f) R1 := R1 + R2g) R3 := R1 x 2h) BR (k)
i) R2 := R1 – R2j) R3 := R2 x 2
k) RA := R1l) RB := R2m) RC := R3n) RD := R4
Then Contextcc is false
Else Contextcc is true
Join Context
Predicate Value = 0
R1R2R3R4
fork then elseabcd
Rename Tablef) R1.f := R1.a + R2.bg) R3.g := R1.f x 2h) Removed
f
g
Predicate Value = 1
i) R2.i := R1.a – R2.bj) R3.j := R2.i x 2
ij
k) R1.k := PL.e : R1.a : R1.fl) R2.l := PL.e : R2.i : R2.bm) R3.m:= PL.e : R3.j : R3.gn) RA.n := R1.ko) RB.o := R2.lp) RC.p := R3.mq) RD.q := R4.d
R1R2R3R4
fork then elseklmd
Rename Table
10/25/2006 13
Wish Branches
Target second and third limitation of software predication Dynamic decision based on confidence estimator Improved coverage by predicating loops
Uses compiler generated predicated blocks Add “wish” code for dynamic decision
Define how to include simple loops for predication
10/25/2006 14
Wish Jumps and Wish Joins
Code
Branch Code
Predicated Code
Wish jump/join code
10/25/2006 15
Wish Loops
Code
Normal CodeWish Loop Code
Branch Type Normal Exit
(X1 X2 X3 Y)
Early Exit
(X1 X2 Y)
Late Exit
(X1 X2 X3 X4 X5 Y)
No Exit
(X1 X2 X3 X4 … XN)
High Confidence No Overhead FLUSH (Y) Flush (X4 X5 Y) Flush (X4 X5 … XN)
Low Confidence Predication delay FLUSH (Y) Predicate (X4 X5) Flush (X4 X5 … XN)
10/25/2006 16
Dynamic Number of Wish Branches
Performance improvement: 10.7% over predicated code
10/25/2006 17
Dynamic Number of Wish Loops
Performance improvement: 13.3% over predicated code
10/25/2006 18
Diverge-Merge Processor (DMP)
Target all 3 limitations of software predication Dynamic Predication - Little compiler support Dynamic decision based on confidence estimation Only on frequently executed control-flow paths
Software support Compiler mark all diverge and merge points
Hardware support – similar to Dynamic Hammock predication Enters predication mode at diverge point Predicate only frequently executed paths
10/25/2006 19
1. Dynamically predicate: Blocks B C E Reduces predication overhead
2. Improve predication coverage by including complex control flow graphs
Frequently Executed Control-Flow Paths
10/25/2006 20
Comparison of Various Predication Schemes
Model Simple H Nested H Freq H Loop Non-Merging Coverage
DHP B,C,D,E,F - - - - 11%
SP B,C,D,E,F B,C,D,E,F,G,H,I
- - - 16%
WB B,C,D,E,F B,C,D,E,F,G,H,I
- A,A,B,C - 26%
DMP B,C,D,E,F B,C,D,G,H,I B,C,D,E,H A,A,B,C - 66%
Dual Path
Path1:B,D,E,F
Path2:C,D,E,F
Path1:B,D,H,I
Path2:C,G,H,I
Path1:B,D,E,H
Path2:C,E,H
Path1:A,A,B,C
Path2:B,C
Path1:B…
Path2:C…A
B C
D
E
Simple hammock
F
A
B C
E
H
Nested hammock
I
D F G
A
B C
D
E
Frequently hammock
H
G F
L
A
B
Loop
C
A
B C
E
Non-merging control flow
D F G
10/25/2006 21
Performance
19.3% average performance improvement 38% reduction in pipeline flushes Consumes 9% less energy
10/25/2006 22
Conclusion
Most of the hard to predict branches (66%) have convergence point
Dynamic predication is more effective than software predication in terms of: Number of miss-predicted branches covered Accuracy of coverage
Effectively reduce large number of pipeline flushes