Dynamic Predication

Post on 14-Jan-2016

38 views 0 download

description

Dynamic Predication. ACAL Group Seminar Alok Garg. What is Predicated Execution?. Conditional instruction Executed : if condition is true NOP: if condition is false Eliminate simple branches If(A==0) { S = T} Convert control dependencies into data dependencies. BNEZ R1, L - PowerPoint PPT Presentation

Transcript of Dynamic Predication

10/25/2006 1

Dynamic Predication

ACAL Group Seminar

Alok Garg

10/25/2006 2

What is Predicated Execution?

Conditional instruction Executed : if condition is true NOP: if condition is false

Eliminate simple branches If(A==0) { S = T}

Convert control dependencies into data dependencies

BNEZ R1, LADDU R2, R3, R0L:

CMOVZ R2, R3, R1

10/25/2006 3

Simple Example

A

B C

D

E

Normal Execution

A [B D E] C D E

Pipeline flushdue to misprediction

Predicted Execution

A [C[!p] B[p]] D E

Conditional instructions

T NT

Limitations of software predication:

1. If branch is NT 98% of time 2. Delayed execution of blocks B or C

10/25/2006 4

Limitations of Predication

ISA support Predicate registers Predicated instructions

Performance overhead Instruction fetch from both paths Can not execute predicated instructions until the predicate

value is resolved Ideal predication speedup - 16.4%

Only small subset of control-flow graph is covered Compiler cannot if-convert Complex control-flow Ideal predication for all conditional branches – 37.4%

10/25/2006 5

Motivation

Some branches are still very hard to predict with conventional branch predictors

Mispredictions lead to costly pipeline flushes Performance Energy

Predication is used to avoid pipeline flushes for those hard to predict branches

10/25/2006 6

Paper Covered

Dynamic Hammock Predication for Non-predicated Instruction Set Architecture. Artur Klauser, Todd Austin, Dirk Gruwald, and Brad Calder – Pact 1998

Wish Branches: Combining Conditional Branching with Predication for Adaptive Predicated Execution. Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt – MICRO 2005, IEEE MICRO TOP PICKS 2006

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths. Hyesoon Kim, Jose A. Joao, Onur Mutlu, and Yale N. Patt – MICRO 2006

10/25/2006 7

Type of Control-flow graphs

A

B C

D

E

Simple hammock

F

A

B C

E

H

Nested hammock

I

D F G

A

B C

D

E

Frequently hammock

H

G F

10/25/2006 8

Type of Control-flow graphs

L

A

B

Loop

C

A

B C

E

Non-merging control flow

D F G

10/25/2006 9

Distribution of mispredicted branches

Simple + Nested : 16 % of all mispredictions All except non-merging: 66 % of all mispredictions

10/25/2006 10

Dynamic Hammock Predication

Target first limitation of software predication Get rid of ISA support required

Dynamic predication for simple hammock 11% of all mispredictions Compiler support to mark simple hammock boundaries

Predication decision Dynamic decision Static profile based

10/25/2006 11

Support for Dynamic Predication

a) R1 := …b) R2 := …c) R3 := …d) R4 := …e) B - cc (i)

Fork Context

f) R1 := R1 + R2g) R3 := R1 x 2h) BR (k)

i) R2 := R1 – R2j) R3 := R2 x 2

k) RA := R1l) RB := R2m) RC := R3n) RD := R4

Then Contextcc is false

Else Contextcc is true

Join Context

10/25/2006 12

Support for Dynamic Predication

a) R1.a := …b) R2.b := …c) R3.c := …d) R4.d := …e) PL.e

Fork Context

f) R1 := R1 + R2g) R3 := R1 x 2h) BR (k)

i) R2 := R1 – R2j) R3 := R2 x 2

k) RA := R1l) RB := R2m) RC := R3n) RD := R4

Then Contextcc is false

Else Contextcc is true

Join Context

Predicate Value = 0

R1R2R3R4

fork then elseabcd

Rename Tablef) R1.f := R1.a + R2.bg) R3.g := R1.f x 2h) Removed

f

g

Predicate Value = 1

i) R2.i := R1.a – R2.bj) R3.j := R2.i x 2

ij

k) R1.k := PL.e : R1.a : R1.fl) R2.l := PL.e : R2.i : R2.bm) R3.m:= PL.e : R3.j : R3.gn) RA.n := R1.ko) RB.o := R2.lp) RC.p := R3.mq) RD.q := R4.d

R1R2R3R4

fork then elseklmd

Rename Table

10/25/2006 13

Wish Branches

Target second and third limitation of software predication Dynamic decision based on confidence estimator Improved coverage by predicating loops

Uses compiler generated predicated blocks Add “wish” code for dynamic decision

Define how to include simple loops for predication

10/25/2006 14

Wish Jumps and Wish Joins

Code

Branch Code

Predicated Code

Wish jump/join code

10/25/2006 15

Wish Loops

Code

Normal CodeWish Loop Code

Branch Type Normal Exit

(X1 X2 X3 Y)

Early Exit

(X1 X2 Y)

Late Exit

(X1 X2 X3 X4 X5 Y)

No Exit

(X1 X2 X3 X4 … XN)

High Confidence No Overhead FLUSH (Y) Flush (X4 X5 Y) Flush (X4 X5 … XN)

Low Confidence Predication delay FLUSH (Y) Predicate (X4 X5) Flush (X4 X5 … XN)

10/25/2006 16

Dynamic Number of Wish Branches

Performance improvement: 10.7% over predicated code

10/25/2006 17

Dynamic Number of Wish Loops

Performance improvement: 13.3% over predicated code

10/25/2006 18

Diverge-Merge Processor (DMP)

Target all 3 limitations of software predication Dynamic Predication - Little compiler support Dynamic decision based on confidence estimation Only on frequently executed control-flow paths

Software support Compiler mark all diverge and merge points

Hardware support – similar to Dynamic Hammock predication Enters predication mode at diverge point Predicate only frequently executed paths

10/25/2006 19

1. Dynamically predicate: Blocks B C E Reduces predication overhead

2. Improve predication coverage by including complex control flow graphs

Frequently Executed Control-Flow Paths

10/25/2006 20

Comparison of Various Predication Schemes

Model Simple H Nested H Freq H Loop Non-Merging Coverage

DHP B,C,D,E,F - - - - 11%

SP B,C,D,E,F B,C,D,E,F,G,H,I

- - - 16%

WB B,C,D,E,F B,C,D,E,F,G,H,I

- A,A,B,C - 26%

DMP B,C,D,E,F B,C,D,G,H,I B,C,D,E,H A,A,B,C - 66%

Dual Path

Path1:B,D,E,F

Path2:C,D,E,F

Path1:B,D,H,I

Path2:C,G,H,I

Path1:B,D,E,H

Path2:C,E,H

Path1:A,A,B,C

Path2:B,C

Path1:B…

Path2:C…A

B C

D

E

Simple hammock

F

A

B C

E

H

Nested hammock

I

D F G

A

B C

D

E

Frequently hammock

H

G F

L

A

B

Loop

C

A

B C

E

Non-merging control flow

D F G

10/25/2006 21

Performance

19.3% average performance improvement 38% reduction in pipeline flushes Consumes 9% less energy

10/25/2006 22

Conclusion

Most of the hard to predict branches (66%) have convergence point

Dynamic predication is more effective than software predication in terms of: Number of miss-predicted branches covered Accuracy of coverage

Effectively reduce large number of pipeline flushes