Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
description
Transcript of Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
1
Single-ISA Heterogeneous Multi-Core Architectures:
The Potential for Processor Power Reduction
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi,
Parthasarathy Ranganathan, Dean M. TullsenPresenter: Borys Bradel
2
Introduction
Different programs have different requirements (e.g. ILP) Extends to phases of a single program Heterogeneous cores Use core that matches the requirements
Reuse existing cores Use multiple generations of the same
family of processors
3
Outline
Methodology Hardware Assumptions Power
Experiments Optimal – energy/energy delay product Heuristic based – static/dynamic
Related Work Conclusion
4
Single ISA Multi-Core Benefits
Small area overhead because of the growth in core sizes between generations
Clock frequencies of older cores would scale with technology P3 1 GHz = P4 1.4 GHz Increased pipeline depth precisely
because could not scale
5
Hardware – Alpha Family
2 in order cores EV4=21064 EV5=21164
2 out of order cores EV6=21264 EV8-=21464 (multi thread support
removed)
7
Assumptions
Can switch cores dynamically Private L1 cache and common L2 cache All cores use 0.10 micron technology Single process executing on a single core at any one
time 2.1 GHz clock (=21264 0.35 micron 600 MHz) Input voltage 1.2V Cores shut down when idle 1000 cycle restart cost (staged, phase lock loop left
alone) 150 ms memory access Stall cycles through CACTI
9
Power Model
Use Wattch to account for activity based dissipation
Use scaling and offset factors to account for other factors
This hybrid model is closer to manufacturer’s data points
Peak power: data sheets less L2 cache and output pins
Typical power: scaled based on Intel chips
11
Performance Modeling
Use SMTSIM, a cycle accurate simulator
simpoint is used to identify representative instructions of programs and how many instructions need to be fast forwarded
18
Others
Voltage/frequency scaling – not as good
Static core selection only EV6 and EV8- are used
Dynamic heuristic Running average performance within 10% Every 100 time intervals (100 million
instructions) cores are sampled for 5 intervals
Select best core based on sampling
21
Related Work
Gating based power optimization Cannot gate at a fine enough
granularity May still have leakage This could be thought of as gating to
reduce capabilities of different units Voltage and frequency scaling
Chip wide – one size does not fit all Fine grained – granularity problems