Reducing Peak Power with a Table-Driven Adaptive Processor Core
description
Transcript of Reducing Peak Power with a Table-Driven Adaptive Processor Core
![Page 1: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/1.jpg)
Reducing Peak Power with a Table-Driven Adaptive Processor Core
Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)
![Page 2: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/2.jpg)
The Power Problem
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 2
Power related issues: Wall power costs Processor design
constraints Power delivery
network Thermals Packaging Reliability
$
$$$
![Page 3: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/3.jpg)
The Power Problem
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 3
Power related issues: Wall power costs Processor design
constraints Power delivery
network Thermals Packaging Reliability
$
$$$
Average Power
![Page 4: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/4.jpg)
The Power Problem
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 4
Power related issues: Wall power costs Processor design
constraints Power delivery
network Thermals Packaging ReliabilityPeak
Power
![Page 5: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/5.jpg)
Theoretical Peak vs Execution Peak
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 5
Time
Power
![Page 6: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/6.jpg)
Theoretical Peak vs Execution Peak
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 6
Time
PowerAverage
![Page 7: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/7.jpg)
Theoretical Peak vs Execution Peak
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 7
Time
PowerAverage
Execution Peak
![Page 8: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/8.jpg)
Theoretical Peak vs Execution Peak
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 8
Time
PowerAverage
Execution Peak
Theoretical Peak
![Page 9: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/9.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 9
Our Approach Motivation:
Most applications have few resource bottlenecks. Ample opportunity to disable core components
without hurting performance Goal:
Partially disable core components to limit Peak Power
Method: Each resource can be maximally configured Not all resources maximized at the same
time (centralized control mechanism).
![Page 10: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/10.jpg)
Media Olden Spec-int Spec-fp nas Average1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.80 Max configuration
Spee
dup
over
min
con
fig
lrbBZI1WhgkbAuCtDfZuWf
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 10
Motivating Experiment:
MIN MAXINT inst. Queue 16 32
FP Queue 16 32
INT regs 64 128
FP regs 64 128
INT alus 2 4
FP alus 1 3
LdSt units 1 2
ROB 128 256
Icache 4K 32K
Dcache 4K 32K
Min config
Max config
We reduce 10 core resources
![Page 11: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/11.jpg)
Media Olden Spec-int Spec-fp nas Average1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.80All_param_max1_param_max
Spee
dup
over
min
con
fig
NQhAksbd1jSmhfp1uhb92N KbB4I5TCpYydrZAwpoUZUP
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 11
Motivating Experiment: We reduce 10
core resources We selectively
maximize resources
1 out of 10 parameters max
Min config
10 params max
![Page 12: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/12.jpg)
Media Olden Spec-int Spec-fp nas Average1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.802_param_max1_param_max
Spee
dup
over
min
con
fig
eKdNSu0FHcC6DkjpnzK4f0
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 12
Motivating Experiment:
2 out of 10 parameters max
Min config
10 params max
We reduce 10 core resources
We selectively maximize resources
![Page 13: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/13.jpg)
Media Olden Spec-int Spec-fp nas Average1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.802_param_max1_param_max
Spee
dup
over
min
con
fig
idcm4BKIiMd0CxmxzHdlCF
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 13
Motivating Experiment:
3 out of 10 parameters max
Min config
10 params max
We reduce 10 core resources
We selectively maximize resources
![Page 14: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/14.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 14
Motivating Experiment:
Media Olden Spec-int Spec-fp nas Average1.00
1.10
1.20
1.30
1.40
1.50
1.60
1.70
1.802_param_max1_param_max
Spee
dup
over
min
con
fig
idcm4BKIiMd0CxmxzHdlCF
We reduce 10 core resources
We selectively maximize resources
We can aggressively reduce core components and give up little performance
![Page 15: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/15.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 15
Outline
Introduction Architecture Results Conclusions
![Page 16: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/16.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 16
Outline
Introduction Architecture Results Conclusions
![Page 17: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/17.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 17
Baseline Architecture
![Page 18: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/18.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 18
Baseline Architecture with Average Power Management
![Page 19: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/19.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 19
Proposed Architecture with Peak Power Management
![Page 20: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/20.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 20
Proposed Architecture with Peak Power Management
Holds possible coreconfigurations Does bookkeeping and
enforces configurations
![Page 21: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/21.jpg)
Two Critical Issues
Which configurations to make available? (contents of Config ROM) How to transition among the available
configurations?(Adaptation manager policies)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 21
![Page 22: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/22.jpg)
Two Critical Issues
Which configurations to make available? (contents of Config ROM) How to transition among the available
configurations?(Adaptation manager policies)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 22
![Page 23: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/23.jpg)
Finding Appropriate Configurations
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 23
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 1 2 0 0 0 0 2 1
![Page 24: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/24.jpg)
Finding Appropriate Configurations
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 24
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 1 2 0 0 0 0 2 1
![Page 25: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/25.jpg)
Finding Appropriate Configurations
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 25
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 1 2 0 0 0 0 2 1
… … … … … … … … … …
Consider all possible configurations
69% 71%
![Page 26: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/26.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 26
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 1 2 0 0 0 0 2 1
… … … … … … … … … …
Consider all possible configurations
Remove configs exceeding targeted peak power threshold
69% 71%
Finding Appropriate Configurations
![Page 27: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/27.jpg)
Finding Appropriate Configurations
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 27
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 0 1 0 0 0 0 2 1
… … … … … … … … … …
Consider all possible configurations
Remove configs exceeding targeted peak power threshold
69% 68%
![Page 28: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/28.jpg)
Finding Appropriate Configurations
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 28
Config ROM - 70% of core peak powerIq Fq ialu falu ldstu rob Iregs fregs icache dcache
0 0 1 1 0 0 0 0 2 1
0 0 0 1 0 0 0 0 2 1
… … … … … … … … … …
Consider all possible configurations
Remove configs exceeding targeted peak power threshold
Remove redundant configs
69% 68%
![Page 29: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/29.jpg)
Contents of the Config ROM
Manageable number of configurations We find the best configuration faster
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 29
Relative power threshold
# of possible configurations
# of non-redundant configurations
70% 493 132
75% 1658 279
80% 3418 360
85% 4987 285
100% 6144 1
![Page 30: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/30.jpg)
Implementation Overhead
Area: <1.25% increase(~0.5KB for Config ROM)
Peak Power: < 1.1% overhead Average Power: negligible
(infrequent epoch-based adaptation) Power-gating delays of up to 650
cycles. Verification Cost higher than non-
adaptive core, less than fully-adaptive core
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 30
![Page 31: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/31.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 31
Outline
Introduction Architecture Results
Dynamic Adaptation vs Static Tuning Realistic Adaptive Techniques Voltage Variation and Decoupling Capacitance
Benefits Conclusions
![Page 32: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/32.jpg)
Media Olden SpecINT SpecFP NAS average0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
BEST_STATIC IDEAL_ADAPT MAX_CONF
Spe
edup
ove
r BE
ST_
STA
TIC
Dynamic Adaptation vs Static Tuning
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 32
Best Static Configuration:iqs:32. fqs:32 ialu:2 falu:1 ldst:1 ics:16KB dcs:16KB ipr:64 fpr:64
rob:256
70% of core peak
![Page 33: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/33.jpg)
parser g721d mg.big0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Media Olden SpecINT SpecFP NAS average0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
BEST_STATIC IDEAL_ADAPT MAX_CONF
Spe
edup
ove
r BE
ST_
STA
TIC
Dynamic Adaptation vs Static Tuning
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 33
INT ALUs needed
70% of core peak FP REGs
needed
Nothing needed
Best Static Configuration:iqs:32. fqs:32 ialu:2 falu:1 ldst:1 ics:16KB dcs:16KB ipr:64 fpr:64
rob:256
![Page 34: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/34.jpg)
Two Critical Issues
Which configurations to make available? (contents of Config ROM) How to transition among the available
configurations?(Adaptation manager policies)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 34
![Page 35: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/35.jpg)
When to adapt ? Which configuration to
choose ?
How to evaluate a configuration ?
Interval(INTV): every fixed interval of cycles
(2M cycles) RANDOM: randomly pick the next configuration
NONE: pick the chosen configuration, do not
evaluate
EventDriven(EVDRIV): capture phase changes by adapting when IPC or
cache misses/instr. change by more than 30% SAMPLE: sample
different configurations and pick the one with
highest instructions per cycle (ipc)
SCORE: evaluate configurations based on which provides
more of the bottleneck resource.
Choose the highest score .
AdaptiveInterval(INTVAD): mitigate
adaptation costs by extending interval when
cannot find better configurations, shrink it otherwise. (0.5M – 8M
cycles)
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 35
![Page 36: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/36.jpg)
When to adapt ? Which configuration to
choose ?
How to evaluate a configuration ?
Interval(INTV): every fixed interval of cycles
(2M cycles) RANDOM: randomly pick the next configuration
NONE: pick the chosen configuration, do not
evaluate
EventDriven(EVDRIV): capture phase changes by adapting when IPC or
cache misses/instr. change by more than 30% SAMPLE: sample
different configurations and pick the one with
highest instructions per cycle (ipc)
SCORE: evaluate configurations based on which provides
more of the bottleneck resource.
Choose the highest score.
AdaptiveInterval(INTVAD): mitigate
adaptation costs by extending interval when
cannot find better configurations, shrink it otherwise. (0.5M – 8M
cycles)
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 36
![Page 37: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/37.jpg)
When to adapt ? Which configuration to
choose ?
How to evaluate a configuration ?
Interval(INTV): every fixed interval of cycles
(2M cycles) RANDOM: randomly pick the next configuration
NONE: pick the chosen configuration, do not
evaluate
EventDriven(EVDRIV): capture phase changes by adapting when IPC or
cache misses/instr. change by more than 30% SAMPLE: sample
different configurations and pick the one with
highest instructions per cycle (ipc)
SCORE: evaluate configurations based on which provides
more of the bottleneck resource.
Choose the highest score.
AdaptiveInterval(INTVAD): mitigate
adaptation costs by extending interval when
cannot find better configurations, shrink it otherwise. (0.5M – 8M
cycles)
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 37
![Page 38: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/38.jpg)
When to adapt ? Which configuration to
choose ?
How to evaluate a configuration ?
Interval(INTV): every fixed interval of cycles
(2M cycles) RANDOM: randomly pick the next configuration
NONE: pick the chosen configuration, do not
evaluate
EventDriven(EVDRIV): capture phase changes by adapting when IPC or
cache misses/instr. change by more than 30% SAMPLE: sample
different configurations and pick the one with
highest instructions per cycle (ipc)
SCORE: evaluate configurations based on which provides
more of the bottleneck resource.
Choose the highest score.
AdaptiveInterval(INTVAD): mitigate
adaptation costs by extending interval when
cannot find better configurations, shrink it otherwise. (0.5M – 8M
cycles)
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 38
![Page 39: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/39.jpg)
When to adapt ? Which configuration to
choose ?
How to evaluate a configuration ?
Interval(INTV): every fixed interval of cycles
(2M cycles) RANDOM: randomly pick the next configuration
NONE: pick the chosen configuration, do not
evaluate
EventDriven(EVDRIV): capture phase changes by adapting when IPC or
cache misses/instr. change by more than 30% SAMPLE: sample
different configurations and pick the one with
highest instructions per cycle (ipc)
SCORE: evaluate configurations based on which provides
more of the bottleneck resource.
Choose the highest score.
AdaptiveInterval(INTVAD): mitigate
adaptation costs by extending interval when
cannot find better configurations, shrink it otherwise. (0.5M – 8M
cycles)
Realistic Adaptive Techniques
e.g. INTVAD_SCORE_SAMPLEMicro'09: Kontorinis, Shayan,
Kumar, Tullsen 39
![Page 40: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/40.jpg)
Media Olden Spec-int Spec-fp NAS average0.8
0.9
1
1.1
1.2
INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE
Spe
edup
ove
r BE
ST_
STA
TIC
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 40
![Page 41: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/41.jpg)
Media Olden Spec-int Spec-fp NAS average0.8
0.9
1
1.1
1.2
INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE
Spe
edup
ove
r BE
ST_
STA
TIC
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 41
Most configs in Config ROM perform poorly
![Page 42: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/42.jpg)
Media Olden Spec-int Spec-fp NAS average0.8
0.9
1
1.1
1.2
INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE
Spe
edup
ove
r BE
ST_
STA
TIC
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 42
SCORE marginally better than BEST_STATIC
![Page 43: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/43.jpg)
Media Olden Spec-int Spec-fp NAS average0.8
0.9
1
1.1
1.2
INTV_RANDOMINTV_SCORE_NONEINTV_SCORE_SAMPLEEVDRIV_SCORE_SAMPLEINTVAD_SCORE_SAMPLE
Spe
edup
ove
r BE
ST_
STA
TIC
Realistic Adaptive Techniques
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 43
SAMPLING a big win!
![Page 44: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/44.jpg)
70% 75% 80%0.8
0.9
1.0
1.1
1.2
1.3 BEST_STATICINTVAD_SCORE_SAMPLEINTVAD_SCORE_SAMPLE_redIDEAL_ADAPTMAX_CONFIG
Spe
edup
ove
r BE
ST_
STA
TIC
Results Across Peak Power Budgets
vs Maximized Core
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 44
Reducing the configurations in Config ROM further improves performance
At 75% within 5% of maximized core
At 80% within 2.5% of maximized core
Peak power constraint
![Page 45: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/45.jpg)
So what have we gained?
Metrics Power efficiency
AP_ratio =
Decoupling Capacitance (% of total core area)
Voltage Variation (% of Vdd)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 45
PowerPeak Power Average
![Page 46: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/46.jpg)
100% 85% 80% 75% 70%0
5
10
15
20
25
30 Average Peak
Peak Power Constraint
Pow
er (W
)
Power Efficiency
Both average and peak power decrease
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 46
![Page 47: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/47.jpg)
100% 85% 80% 75% 70%0
5
10
15
20
25
30 Average Peak
Peak Power Constraint
Pow
er (W
)
Power Efficiency
Both average and peak power decrease
AP_ratio improves as we constrain the peak power
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 47
AP_ratio: 56% 61% 63% 64% 67%
![Page 48: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/48.jpg)
Voltage variation and Decoupling Capacitance benefits
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 48
Constant Voltage Variation
Constant Decoupling Cap.
Relative power threshold On-chip Decap (%of total Core Area)
Max. Voltage Variation (% VDD )
70% 9% 4.48%
75% 9.7% 4.80%
80% 10.5% 5.12%
85% 11.5% 5.44%
100% 15% 6.48%
![Page 49: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/49.jpg)
Voltage variation and Decoupling Capacitance benefits
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 49
Constant Voltage Variation
Constant Decoupling Cap.
Relative power threshold On-chip Decap (%of total Core Area)
Max. Voltage Variation (% VDD )
70% 9% 4.48%
75% 9.7% 4.80%
80% 10.5% 5.12%
85% 11.5% 5.44%
100% 15% 6.48%
![Page 50: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/50.jpg)
Voltage variation and Decoupling Capacitance benefits
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 50
Constant Voltage Variation
Constant Decoupling Cap.
Relative power threshold On-chip Decap (%of total Core Area)
Max. Voltage Variation (% VDD )
70% 9% 4.48%
75% 9.7% 4.80%
80% 10.5% 5.12%
85% 11.5% 5.44%
100% 15% 6.48%
![Page 51: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/51.jpg)
Voltage variation and Decoupling Capacitance benefits
Reduced Peak Power Less required on-chip decap
Smaller Voltage VariationMicro'09: Kontorinis, Shayan,
Kumar, Tullsen 51
Constant Voltage Variation
Constant Decoupling Cap.
Relative power threshold On-chip Decap (%of total Core Area)
Max. Voltage Variation (% VDD )
70% 9% 4.48%
75% 9.7% 4.80%
80% 10.5% 5.12%
85% 11.5% 5.44%
100% 15% 6.48%
![Page 52: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/52.jpg)
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 52
Conclusions
Peak power is a first-class design constraint Impacts the efficiency and cost of power
delivery. Affects on-chip decoupling capacitance and
voltage variation Table-driven adaptation can be employed
to limit peak power while giving up little performance Reduces Peak power by 25% while giving up
less than 5% performance.
![Page 53: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/53.jpg)
Reducing Peak Power with a Table-Driven Adaptive Processor Core
Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)
![Page 54: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/54.jpg)
Backup slides
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 54
![Page 55: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/55.jpg)
Design Space
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 55
INT instruction queue 16,32 entries
FP instruction queue 16,32 entries
INT registers 64,128
FP registers 64,128
INT alus 2,4
FP alus 1,2,3
Load/Store units 1,2
Reorder Buffer 128,256 entries
Icache 1,2,4,8 ways of 4K each
Dcache 1,2,4,8 ways of 4K each
![Page 56: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/56.jpg)
Multiple Config ROMs and their potential applications
Dynamic Thermal Management Hot Spot Avoidance Combat process variation Budget Peak Power across multiple
cores to maximize throughput
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 56
![Page 57: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/57.jpg)
Benchmarks performing better with fewer resources
Explanation: Going further down the wrong path puts extra pressure in the memory subsystem . May negatively affect performance.
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 57
vpr-route crafty0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
BEST_STATIC IDEAL_ADAPT MAX_CONF
Spe
edup
ove
r BE
ST_
STA
TIC
![Page 58: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/58.jpg)
Decoupling capacitance on die
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 58
Decoupling Capacitors
Core
Decoupling Ring
![Page 59: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/59.jpg)
Delays
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 59
![Page 60: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/60.jpg)
Adaptation transitionIq Fq ialu falu ldstu rob Iregs fregs Icache dcache
Config 1 32 16 2 1 1 256 128 0 32k 16k
Config 2 16 16 2 2 1 256 128 0 32k 16k
Micro'09: Kontorinis, Shayan, Kumar, Tullsen 60
Time(cyc)
AdaptationTriggered – Reg. Renaming Throttled
0 1000 2000 3000
Instructions in Iqueue
3216
Iqueue Powergating beginsIqueue
Powergating ends – Reg. Renaming restarts
Active falus
Time(cyc)
12
Falu power-up begins
Falu power-up ends
![Page 61: Reducing Peak Power with a Table-Driven Adaptive Processor Core](https://reader036.fdocuments.us/reader036/viewer/2022062410/56816471550346895dd65644/html5/thumbnails/61.jpg)
Reducing Peak Power with a Table-Driven Adaptive Processor Core
Vasileios Kontorinis (UCSD)Amirali Shayan (UCSD)Rakesh Kumar (UIUC)Dean Tullsen (UCSD)