Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The...

22
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Transcript of Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The...

Page 1: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution

The University of Texas at Austin *Oregon Microarchitecture LabElectrical and Computer Engineering Intel Corporation

Hyesoon KimOnur MutluJared Stark*Yale N. Patt

Page 2: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

2

Talk Outline

Problem

Wish Branches

Experimental Methodology

Results

Conclusion

Page 3: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

3

Predicated Execution

Convert control flow dependency to data dependencyPro: Eliminate hard-to-predict branches

(normal branch code)

C B

D

AT N

p1 = (cond) branch p1, TARGET

mov b, 1 jmp JOIN

TARGET: mov b, 0

A

B

C

B

C

D

A

(predicated code)

A

B

C

if (cond) { b = 0;}else { b = 1;}

Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved

Dadd x, b, 1

p1 = (cond)

(!p1) mov b, 1

(p1) mov b, 0

Page 4: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

4

p1 = (cond)

(!p1) mov b, 1

(p1) mov b, 0

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

No

rma

lize

d e

xe

cu

tio

n t

ime

PREDICATED CODE

NO-DEPENDENCY

NO-DEPENDENCY + NO-FETCH

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

No

rma

lize

d e

xe

cu

tio

n t

ime

PREDICATED CODE

NO-DEPENDENCY

NO-DEPENDENCY + NO-FETCH

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG

No

rma

lize

d e

xe

cu

tio

n t

ime

PREDICATED CODE

NO-DEPENDENCY

NO-DEPENDENCY + NO-FETCH

The Overhead of Predicated Execution

If all overhead is ideally eliminated, predicated execution would

provide 16% improvement in average execution time

A

B

C

(Predicated code)

D add x, b, 1

non-predicated

p1 = (cond)

(0) mov b,1

(1) mov b,0

-2%13%16%

Page 5: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

5

The Problem

Due to the predication overhead, predicated execution sometimes reduces

performance

Branch misprediction characteristics are dependent on run-time behavior: input set,

control-flow path and phase behavior. The compiler cannot accurately

estimate the run-time behavior of branches

Page 6: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

6

Talk Outline

Problem

Wish Branches

Experimental Methodology

Results

Conclusion

Page 7: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

7

Wish Branches

A new type of control flow instruction 3 types: wish jump/join and wish loop

The compiler generates code (with wish branches) that can be executed either as predicated code or non-predicated code (normal branch code)

The hardware decides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction

Easy to predict: normal branch code Hard to predict: predicated code

Page 8: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

8

TARGET: (p1) mov b,0TARGET: (1) mov b,0

(!p1) mov b,1 wish.join !p1 JOIN

(1) mov b,1 wish.join (1) JOIN

Low ConfidenceWish Jump/Join

p1 = (cond) branch p1, TARGET

C B

D

AT N

mov b, 1 jmp JOIN

TARGET: mov b,0

normal branch code

A

B

C

B

C

D

A

p1 = (cond)

(!p1) mov b,1

(p1) mov b,0

predicated code

A

B

C

wish jump/join code

B

A

C

D

wish jump

p1=(cond) wish.jump p1 TARGET

A

B

C

wish join

DJOIN:

High Confidence

nop

nop

Taken

Not-Taken

Page 9: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

9

Low Confidence

Wish Loop

X

Y

N

T

LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP

EXIT:

X

Y

N

T

H

mov p1, 1

LOOP: (p1) add a, a, 1 (p1) add i, i, 1 (p1) p1 = (cond) wish. loop p1, LOOP

EXIT:

normal backward branch code

do {

a++;

i++;

} while (i<N);

XH

X

wish loop code

Y Y

High Confidence

(1)(1)(1)

Page 10: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

10

Mispredicted Case 1: Early-Exit

X1 X2 X3 Y

T T N

Correct execution:

Early-exit:

(Low confidence)

X1 X2

T

Y

N

X3 Y

N

Flush pipeline

Compared to normal branch code: predicate data dependency and one extra instruction (-)

X

Y

N

T

H

H

H

Page 11: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

11

Mispredicted Case 2: Late-Exit

X1 X2 X3 Y

T T N

Correct execution:

Late-exit:

(Low confidence)

X1 X2

T

X3

T

Compared to normal branch code: pro: reduce flush penalty (+++)

cons: predicate data dependency and one extra instruction (-)

T

X4

T

X5

N

Y …nop nopX

Y

N

T

H

H

H

Page 12: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

12

Mispredicted Case 3: No-Exit

X1 X2 X3 Y

T T N

Correct execution:

No-exit:

(Low confidence)

X1 X2

T

X3

T

Compared to normal branch code: predicate data dependency and one extra instruction (-)

T

X4

T

X5

T

X6 …

T

Flush pipeline

Y

X

Y

N

T

H

H

H

Page 13: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

13

Advantages/Disadvantages of Wish Branches

Advantages compared to predicated execution Reduce the overhead of predication Increase the benefits of predicated code by

allowing the compiler to generate more aggressively-predicated code

Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops)

Make predicated code less dependent on machine configuration (eg. branch predictor)

Page 14: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

14

Advantages/Disadvantages of Wish Branches

Disadvantages compared to predicated execution

Extra branch instructions use machine resources

Extra branch instructions increase the contention for branch predictor table entries

May constrain the compiler’s scope for code optimizations

Page 15: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

15

Wish Branch Support

ISA Support predicated execution, wish branch instruction

Compiler Support Wish branch generation algorithms

The compiler needs to decide which branches are predicated, which are converted to wish branches, and which stay as normal branches

Hardware Support Confidence estimator Front-end and branch misprediction

detection/recovery module

Page 16: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

16

Talk Outline

Problem

Wish Branches

Experimental Methodology

Results

Conclusion

Page 17: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

17

Experimental Infrastructure

IA-64 provides full support for predication Convert IA-64 traces to micro-ops to simulate

an out-of-order superscalar processor model

IA-64Compiler

(ORC)

SourceCode

IA-64 Binary

IA-64 Trace µopsTrace

generationmodule

Micro-opTranslator

Micro-opSimulator

Page 18: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

18

Simulation Methodology

Nine SPEC 2000 integer benchmarks Baseline Processor Configuration

Front End Large and accurate branch predictor (64KB

hybrid branch predictor: gshare + local) Minimum 30-cycle branch misprediction penalty 64KB, 2-cycle latency I-cache

Execution Core 8-wide out-of-order processor 512-entry instruction window

Confidence Estimator 1KB tagged 16-bit history JRS confidence

estimator (Jacobsen et al. MICRO-29)

Page 19: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

19

Talk Outline

Problem

Wish Branches

Experimental Methodology

Results

Conclusion

Page 20: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

20

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

No

rma

lize

d e

xecu

tion

tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

SELECTIVE-PREDICATION: branches are selectively predicated using

compile-time cost-benefit analysis

AGGRESSIVE-PREDICATION: all branches that are suitable for if-

conversion are predicated

16% over conditional branch prediction (w/o mcf)

11% over selective-predication (w/o mcf)

7 % over aggressive predication (w/o mcf)

14% over conditional branch prediction and

13% over selective-predication and

16% over aggressive-predication

12% over conditional branch prediction

11% over selective-predication

13 % over aggressive predication

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

No

rma

lize

d e

xecu

tion

tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

No

rma

lize

d e

xecu

tion

tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

2.02

0

0.2

0.4

0.6

0.8

1

1.2

gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf

No

rma

lize

d e

xecu

tion

tim

e.

SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop

Performance Improvement

24% 8% 14%-4%

non-predicated

2.02

Page 21: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

21

Talk Outline

Problem

Wish Branches

Experimental Methodology

Results

Conclusion

Page 22: Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.

22

Conclusion

New control flow instructions: wish branches (jump/join/loop)

Wish branches improve performance by dividing the work of

predication between the compiler and the microarchitecture Compiler: analyzes the control-flow graph and generates code

Microarchitecture: makes run-time decision to use predication

Wish branches provide significant performance benefits 16% compared to conditional branch prediction

13% compared to selectively predicated code

Wish branches can make predicated execution more viable

and effective in high performance processors By enabling adaptive and aggressive predicated execution