ASAP 2005 Samos, Greece July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe...
-
Upload
christian-melton -
Category
Documents
-
view
217 -
download
0
Transcript of ASAP 2005 Samos, Greece July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe...
ASAP 2005 Samos, Greece July 23-25, 2005 1
Exploring Design Space of Exploring Design Space of VLIW ArchitecturesVLIW Architectures
Giuseppe Ascia, Vincenzo Catania, Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Maurizio Palesi and Davide PattiDavide Patti
Università di Catania
Dipartimento di Ingegneria Informatica e delle Telecomunicazioni
DIIT - University of Catania, Italy
ASAP 2005 Samos, Greece July 23-25, 2005 2
OutlineOutline IntroductionIntroductionVLIW in past & futureVLIW in past & futureDesign Exploration FrameworkDesign Exploration Framework ILP oriented compilationILP oriented compilationGenetic Design Space ExplorationGenetic Design Space ExplorationConclusionsConclusions
ASAP 2005 Samos, Greece July 23-25, 2005 3
Instruction Level Instruction Level ParallelismParallelism
high performance processors in the 1980s: maximize ILPIssue more than one single instruction in a
given clock cycleWho decides which instructions can be
executed in parallel?
Two different philosophies:SuperscalarVery Long Instruction Word (VLIW)
ASAP 2005 Samos, Greece July 23-25, 2005 4
ILP philosophy: ILP philosophy: SuperscalarSuperscalar
Hide the process of finding ILP ILP is discovered dynamically at run-time by the
control hardware of the processor
HW
Op1,Op2Op3Op4,Op5…
Foo.c
Op1Op2Op3Op4Op5…
compiler
Instruction stream
Run-time
ASAP 2005 Samos, Greece July 23-25, 2005 5
ILP philosophy: VLIWILP philosophy: VLIW Hardware resources are architecturally visible to the compiler Compiler can create a sequence of Very Long Instructions
that defines the plan of execution HW simply execute the plan
HWFoo.cOp1,Op2Op3
Op4,Op5compiler
Hardware resources configuration
Plan of execution
Run-time
ASAP 2005 Samos, Greece July 23-25, 2005 6
VLIW past & futureVLIW past & future Decline of VLIWs for general purpose
systems:Couldn’t be integrated in a single chipBinary compatibility between implementations
Rediscovery of VLIW in embbededNo more integrability issuesBinary incompatibility not relevant Advanteges of VLIW:
Simplified hardwareoptimize ad-hoc the architecture to achieve ILP
ASAP 2005 Samos, Greece July 23-25, 2005 7
Reference architecture Reference architecture (HPL-PD)(HPL-PD)
L2 U
nifie
d C
ache
L2 U
nifie
d C
ache
PrefetchCache
PrefetchUnit
FetchUnit Instruction
Queue
Dec
od
e an
dC
on
tro
l Lo
gic
PredicateRegisters
BranchRegisters
GeneralPrupose
Registers
FloatingPoint
Registers
ControlRegisters
Load/StoreUnit
BranchUnit
IntegerUnit
FloatingPointUnit
L1
Dat
aC
ach
e
L1
Dat
aC
ach
eL
1 In
stru
ctio
nC
ach
e
L1
Inst
ruct
ion
Cac
he
ASAP 2005 Samos, Greece July 23-25, 2005 8
Configuration SpaceConfiguration SpaceThree main parameter categories:
VLIW core: Number of Registers in each register file (from 16 to 256) Number of istancies for Functional Units of each type (from 1 to 6)
Mem Hierarchy: Size, Blocksize, Associativity for each of the caches (L1 Instruction, L1 Data, L2)
Compiler: Conservative compilation strategy (basic blocks) Aggressive ILP oriented compilation strategy (hyperblocks)
Total space size: 1.47 x 1013 configurations !
ASAP 2005 Samos, Greece July 23-25, 2005 9
Required ToolsRequired Tools
High level estimation models Design Space Exploration strategy
Paretoconfigurations
Paretoconfigurations
ExplorationAlgorithm
ExplorationAlgorithmApplication.cApplication.c
ConfigurationConfiguration
Performances,Power,…
CompilerSimulatorEstimator
CompilerSimulatorEstimator
ASAP 2005 Samos, Greece July 23-25, 2005 10
An Open Platform: EPIC An Open Platform: EPIC Explorer Explorer
Interfacing to the Trimaran framework that provide VLIW compiler and simulator for dynamic statistics.
Estimator component implementing high level models
Explorer component implementing multi-objective design space exploration algorithms
ASAP 2005 Samos, Greece July 23-25, 2005 11
The Exploration Data The Exploration Data FlowFlow
IMPACTIMPACTFoo.cFoo.c
System configuration
ProcessorProcessor
MemoryMemory
EmulibEmulib
foo.exefoo.exe
Execution statisticsExecution statistics
EstimatorEstimator
EnergyEnergy PowerPowerCyclesCycles
ExplorerExplorer
ELCORELCOR
ASAP 2005 Samos, Greece July 23-25, 2005 12
Energy estimationEnergy estimation Subdivide architecture in Functional Block Unit (FBU)
Instruction decode logic, Integer units, floating point units, register files For each FBU (from ST Microelectronics LX)
Active power: average power dissipated when the FBU is used Inactive power: average power dissipated when the FBU is not used
From the execution statistic, we know how many cycles each FBU has been active/inactiveEFBU=(Pactive cyclesactive+ Pinactivecyclesinactive) Tclock
Discrete degree of accuracy (about 25%) investigate relative power savings beetween designs
ASAP 2005 Samos, Greece July 23-25, 2005 13
Reference Application Reference Application SetSetChosen from MediaBench suite
Application CategoryG721 encode Voice compression
Gsm encode Speech transcoding
Gsm decode Speech transcoding
Ieee 810 IEEE 1180 inverse DCT
JPEG Image compression
MPEG2 decode Video decoding
ADPCM encode Speech encoding
ADPCM decode Speech decoding
Fir FIR filter
ASAP 2005 Samos, Greece July 23-25, 2005 14
Exploration Exploration MethodologyMethodology Preliminary analisys of compilation
Impact of ILP oriented code transformations Predict the right compilation strategy:
Basic Blocks (conservative) Hyper Blocks (aggressive, ILP-oriented)
Multi-objective Design Space Exploration Extract Pareto Set
ASAP 2005 Samos, Greece July 23-25, 2005 15
Preliminary Analisys Preliminary Analisys (1/3)(1/3)
For each objective, Unpaired two sample t-test allows to estimate the average effect of hyperblock formation
ConfigurationSpace
CN
CH
Random subsets of n configurations
T-test
ON
OH
Compilation with (H) and without (N) hyperblock formation
Is the mean effect on the objective significant respect to the chosen critical difference?
ASAP 2005 Samos, Greece July 23-25, 2005 16
Preliminary Analisys Preliminary Analisys (2/3)(2/3) Example of a metric for critical difference in means: d > 50% M
ASAP 2005 Samos, Greece July 23-25, 2005 17
Preliminary Analisys Preliminary Analisys (3/3)(3/3)
Application Time (ms) Power (W) Energy (mJ)
Δ μN-μH Δ μN-μH Δ μN-μH
ieee810 16.64 6.76+1.84 1.64 0.38+0.16 49.01 30.82+4.55
gsm-enc 36.62 33.25+4.79 0.88 -0.48+0.14 79.28 55.84+9.82
jpeg 4.07 -0.97+0.51 0.89 -0.07+0.09 9.72 -2.31+1.01
adpcm-enc 15.8 8.17+2.2 1.25 -0.89+0.14 46.12 -8.56+3.73
MPEG dec 33.39 -5.28+4.85 0.88 0.25+0.16 62.50 -3.48+9.88
G721-enc 22.76 -7.23+2.95 0.76 -0.39+0.08 65.53 -32.4+5.9
adpcm-dec 24.2 -6.19+3.31 1.02 -0.5+0.12 58.54 -27.74+7.3
Fir 0.68 -0.26+0.08 0.79 -0.27+0.09 1.40 -0.97+0.12
gsm-dec 21.55 -23.83+2.58 0.54 -0.24+0.09 59.60 -56.6+6.43
ILP-oriented compilation impact (positive,negative)
ASAP 2005 Samos, Greece July 23-25, 2005 18
DSE: Genetic MappingDSE: Genetic Mapping
VLIWcore
VLIWcore
CacheCache
Bus ctrlBus ctrl
MemMem
Chromosome Size BSize Assoc Func units Register Files
ASAP 2005 Samos, Greece July 23-25, 2005 19
DSE: Genetic IterationDSE: Genetic Iteration
Current Population
Fitness Evaluation
SimulationEstimation
PerformancePower
Architectureconfiguration
Architectureconfiguration
IndividualIndividual
New Architectureconfiguration
New Architectureconfiguration
Selected ?
DiscendantDiscendant
CrossoverMutation
ASAP 2005 Samos, Greece July 23-25, 2005 20
DSE: Experimental DSE: Experimental ResultsResults Parameters Parameters :
Initial population: 30 individualsCrossover probability: 0.8Mutation probability: 0.1Generations: 50
Example of two different scenarios:G721 encode: exploration should include the
exploration of compilation strategyGsm-encode: hyperblock formation is predicted to
be a better choice
ASAP 2005 Samos, Greece July 23-25, 2005 21
Pareto Set (G721 Pareto Set (G721 encode)encode)
ASAP 2005 Samos, Greece July 23-25, 2005 22
Pareto Set (GSM-Pareto Set (GSM-encode)encode)
ASAP 2005 Samos, Greece July 23-25, 2005 23
ConclusionsConclusions Open platform for VLIW space exploration
Estimate Power, Energy and PerformancePreliminary Analisys of ILP-oriented compilation Genetic multi-objective design space exploration
Future developmentsClustered VLIW Network-on-chip multiprocessorsOpen source:
http://epic-explorer.sourceforge.net
ASAP 2005 Samos, Greece July 23-25, 2005 24
Thanks for your attention !
ASAP 2005 Samos, Greece July 23-25, 2005 25
AppendixAppendix Bus Power Estimation Bus Power Estimation Implemented AlgorithmsImplemented Algorithms Multiobjective Fitness assignmentMultiobjective Fitness assignment How Many Generations?How Many Generations?
ASAP 2005 Samos, Greece July 23-25, 2005 26
Summarizing TableSummarizing Table
Benchmark Visited configurations
Elapsed Time
Pareto Set
Power trade-off
Exec time Trade-off
Mpeg2dec 1137 47h 73 7x 6.8x
Jpeg 1012 17h 83 6x 8.2x
Adpcm-enc 1543 56h 64 4x 3x
Adpcm-dec 1433 44h 76 3.5x 4x
G721-enc 1256 83h 94 2.5x 2x
ASAP 2005 Samos, Greece July 23-25, 2005 27
Power Estimation Power Estimation (buses)(buses) Bus lines transitions computed from the list of
data/address memory accesses
Pbus = 0.5 (Vdd)2 f Cl
Vdd supply voltage
switching activityf clock frequencyCl capacity of a bus line
ASAP 2005 Samos, Greece July 23-25, 2005 28
Design Space ExplorationDesign Space Exploration
Implemented Algorithms :
Exhaustive: intuitive, simple and …unfeasible Dependency analysis (dep), Givargis et al.,
[TVLSI’02]
GA-based DSE (ga), Palesi et al., [CODES’01]
Sensitivity Analysis, Fornaciari et al., [DAES’02]Pareto-based Sensitivity Analysis (pbsa), Palesi et
al., [VLSI-SOC’01]
ASAP 2005 Samos, Greece July 23-25, 2005 29
Multiobjective Fitness Multiobjective Fitness assignmentassignment
Strength Pareto Approach [Zitzler,Thiele] From current population P , is extracted an
external set P*, containing the nondominated configuration of P.
Fitness of P* element j : fj = n/(N+1)N = total size of Pn = # of P configurations dominated by j
Fitness of P element i: 1/S . S is the sum of the fitness values of the P* elements
that dominates i
ASAP 2005 Samos, Greece July 23-25, 2005 30
How Many How Many Generations?Generations?Fixed number of generationsAutostop criteria
Based on convergency
power
dela
y