Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

22
1 Understanding the Energy- Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9

description

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture. G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9. Motivation. Source of complexity on high-performance VLIW processors : hardware duplication - PowerPoint PPT Presentation

Transcript of Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

Page 1: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

1

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

G. Pokam, F. BodinCPC 2004

Chiemsee, Germany, July 7-9

Page 2: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

2

Motivation Source of complexity on high-

performance VLIW processors:

hardware duplication many FUs of different types (ALUs, LSUs, FPUs, BR, etc.) need large register file

Power growth factor IPCPower ~

compiler

architecturecomplexity

Page 3: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

3

Motivation Assume a fixed ; does compiling

for higher ILP results in dissipating less power ?

Which issues (architecture, software, etc.) affect power when compiling for ILP ?Try to figure out what happens analytically !

Page 4: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

4

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 5: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

5

Metric Performance to energy ratio (PTE)

[Gonzales, R. et al.]

: nb. of oper. per Basic Block : average nb. of oper. per bundle : energy per Basic Block

EDelayEnergy BBBBBB

NIPCenergy

eperformancPTE

1

N

IPC

EBB

higher is better

Page 6: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

6

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 7: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

7

Energy Model The execution of a bundle dissipates

an energy :

Consider loop intensive kernels …

wnEPB nw

EEEIPCEEPB misssopwcw qlpmnn

Energybase cost

Energy due toexecution of bundle

Energy due toD-cache misses

Energy due toI-cache misses

EEEIPCEEPB misssopwcw qlpmnn

Page 8: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

8

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 9: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

9

Analysis Use as a lever for power

exploration

Assume R is a CFG region to be transformed into an ILP region H

a sufficient condition for this is given by

PTE

PTEPTE RH

Page 10: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

10

Analysis Idea:

keep track of IPC values that improve energy efficiency

solve the PTE inequality at :

: avg. #oper. in transformed region : avg. #oper. in the CFG region R

IPC IPCIPC RH rmILPtransfo

IPCH

IPCR

Page 11: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

11

EnNfnfN opHHHRRRmC

Analysis

IPCIPC

IPCR

RR CB

ArmILPtransfo

where

EsNEnNf sHHCHHHA

EsNEnfN sRRCRRRmB • f : exec. freq.• N : # of oper.• n : # of bundles• s : # stall due to dmiss • m : #of BB in region

C is a measure of extra work!

Shape of ILPtransform function depends on sign of C

Page 12: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

12

vs. IPCH IPCR

C < 0: •exponential shape means high extra work!•dependence height mismatch•resource contention

C = 0• linear shape•negligible extra work

C > 0•Optimal scenario•Logarithmic shape

e.g. Hyperblock:Compensation code

e.g. Hyperblock:Instruction merging

Page 13: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

13

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 14: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

14

Hyperblock framework predication model via the select instruction

slct dest = cond, src1, src2

only hammock regions are considered

single entry – single exit hyperblock

Page 15: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

15

Transformation heuristic

1. build the loop tree2. traverse the loop tree from innermost to

outermost loop3. evaluate profit for each candidate loop region4. propagate profit to CFG after transformation

PTEPTEPTE

original

originaldtransformeprofit

Page 16: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

16

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 17: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

17

Platform Lx Platform from STMicroelectronics

4-issue VLIW machine 64 GPRs, 8 CBRs 4 ALUs, 1 LD/ST, 2 MULs, 1 BU

Instruction-based energy model from STMicroelectronics

Lx compiler prefetch disabled only scalar optimizations (-O2)

Page 18: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

18

Methodology Post-pass optimization

absciss

SALTOLx Compiler

.s file

.s file

Instrumentation:•BB frequency•Dmiss per BB

• Hyperblock formation • Hyperblock optimization

• instr. promotion• instr. merging• instr. renaming

source

phase 1

phase 2

• original CFG• selective hyperblock• all hyperblock

Page 19: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

19

Results

negligible IPCimprovement

relative larger increase of operation count andstatic schedule length

?

Page 20: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

20

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

Page 21: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

21

Conclusions Analytical scheme to understand the impact of ILP

compilation on energy Heuristic shows 17% energy-delay improvement on a

restricted hyperblock scheme programs suffer from limited ILP which quickly turns into

wasted energy need to go beyond compiler-centric approaches in order to

overcome ILP limitations What is missing:

impact of post-optimization passes has not been determined only a restricted hyperblock scheme has been evaluate

Page 22: Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

22

Thanks!