Power-Optimal Pipelining in Deep Submicron...
Transcript of Power-Optimal Pipelining in Deep Submicron...
![Page 1: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/1.jpg)
Power-Optimal Pipelining in Deep Submicron Technology
Seongmoo Heo and Krste AsanoviComputer Architecture Group, MIT CSAIL
ISLPED 20048/10/2004
![Page 2: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/2.jpg)
Traditional Pipelining• Goal: Maximum performance
Clk
Clk-Q SetupPropagation Delay
Clk
Clk
Vdd
![Page 3: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/3.jpg)
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
![Page 4: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/4.jpg)
Pipelining as a Low-Power Tool
Clk
Clk-Q SetupPropagation Delay
Time Slack
• Goal: Low-Power, Fixed Throughput
Time Slack
Vdd
Clk
Clk
Traded for Power(supply voltage scaling)
![Page 5: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/5.jpg)
Pipelining as a Low-Power Tool
Delay
Power
PipeliningTime slack
Flip-flopPowerOverhead
* Clock frequency fixed
![Page 6: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/6.jpg)
Pipelining as a Low-Power Tool
Delay
Power
Supply voltage scaling
Power Saving
* Clock frequency fixed
![Page 7: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/7.jpg)
Power-Optimal Pipelining• Power reduction from pipelining limited by power
overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
![Page 8: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/8.jpg)
Power-Optimal Pipelining
Delay
Power
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
![Page 9: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/9.jpg)
Power-Optimal Pipelining
Delay
Power
Too deep pipelining
Too shallow pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
![Page 10: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/10.jpg)
Power-Optimal Pipelining
Delay
Power
Optimal Power Saving
Too deep pipelining
Too shallow pipeliningOptimal pipelining
• Power reduction from pipelining limited by power overhead of increased number of flip-flops →→→→ Power-Optimal Pipelining
![Page 11: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/11.jpg)
Contribution• Pipelining is an old idea.• Research focus has been on performance impact of
pipelining.• Idea of using pipelining [Chandrakasan ’92] to lower
power has not been fully explored in deep submicron technology.
• Analysis and circuit-level simulation of Power-Optimal Pipelining for different regimes of Vth, activity factor, clock gating
![Page 12: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/12.jpg)
1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without
clock-gating)
Bottom-to-Top Approach
TotalPower
(clock-gated)
Power
Timeactive activeinactive
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
![Page 13: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/13.jpg)
1. Impact of pipelining on power component2. Impact of pipelining on total power (with/without
clock-gating)
Bottom-to-Top Approach
TotalPower
(not clock-gated)
SwitchingPower
Component
LeakagePower
Component
Idle Power
Component
Power
Timeactive inactive active
*Idle power = power consumed when circuit is idle and not clock-gated
![Page 14: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/14.jpg)
• Target digital system: Fixed throughput, Highly parallel computation, Logic-dominant
• Test bench– BPTM (Berkeley Predictive Technology Model)
70nm process: – LVT(0.17/-0.2), MVT(0.19/-0.22), HVT(0.21/-0.24)– Hspice simulation at 100°C, Clock = 2 GHz
Methodology
BaselineN FO4 inverters (N = 2 ~ 24)
One Pipeline Stage
TG flip-flops TG flip-flops
![Page 15: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/15.jpg)
Pipelining and Switching Power:Analytical Trend
O(N2)
O(1/N)
Number of FO4 per stage, N
Sw
itch
ing
Po
wer
OptimalSaving
Optimal FO4
Quadratic reductionof logic switching power
Flip-flop overhead
∝∝∝∝ Vdd2 ∝∝∝∝ N2
![Page 16: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/16.jpg)
O(1/N)
Lea
kag
e P
ow
er
O(Nαααα ) (1<αααα< 2)
OptimalSaving
Optimal FO4Superlinear reductionof logic leakage power
Flip-flop overhead
Pipelining and Leakage Power:Analytical Trend
Number of FO4 per stage, N
∝∝∝∝ Vdd * e(ηηηηVdd) ∝∝∝∝ Nαααα
DIBL effect
![Page 17: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/17.jpg)
Pipelining and Idle Power:Analytical Trend
• Clock-gating is not always possible– Increased control complexity – insufficient setup time of clock enable signal
• Leakage Power + Flip-flop Switching Power– Between leakage power scaling and flip-flop
switching power scaling depending on leakage level
![Page 18: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/18.jpg)
Pipelining and Idle Power:Analytical Trend
O(1/N)
Number of FO4 per stage, N
Rel
ativ
e P
ow
er
O(Nαααα ) (1<αααα< 2)
OptimalSaving
Optimal FO4
O(1/N)
Number of FO4 per stage, N
O(N)
OptimalSaving
Optimal FO4Linear reduction of Flip-flop switching power∝∝∝∝ 1/N * Vdd
2 ∝∝∝∝ N
Leakage Power Scale
Flip-flop Switching Power Scale
Idle Power
![Page 19: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/19.jpg)
Simulation Results:Power Components
866N*
70(LVT)~ 75(HVT)%
O(Nαααα ) (1<αααα< 2)
Leakage Power
55(HVT)~ 70(LVT)%
79(HVT)~ 82(LVT)%
Saving*
O(N) or O(Nαααα ) (1<αααα< 2)
O(N2)Right hand side curve
Idle Power
Switching Power
Power Components
N = Number of FO4 inverters per stage(Not including flip-flop delay)
N* = Optimal N Saving* = Optimal power saving by pipelining
Fixed Throughput @ 2 GHz
![Page 20: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/20.jpg)
Optimal Power Saving
Optimal FO4 = 6
ClockGating
NoClockGating
Optimal FO4 = 6~8
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er *2 GHz*Flip-flop delaynot included in optimal FO4
![Page 21: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/21.jpg)
IdlePower
SwitchingPower
SwitchingPower
LeakagePower
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
ClockGating
NoClockGating
![Page 22: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/22.jpg)
activity factor activity factor
rela
tive
pow
er
rela
tive
pow
er
Optimal Power Saving
Optimal FO4 = 6 Optimal FO4 = 6~8
LVT
ClockGating
NoClockGating
![Page 23: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/23.jpg)
Discussion• LVT can be fast and power-efficient
– enables lower Vdd
• Flip-flop delay more important than flip-flop power for power-optimal pipelining
![Page 24: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/24.jpg)
Limitation of This Work
↑↑↑↑↓↓↓↓Reduced glitches
↓↓↓↓↑↑↑↑Parasitic wire capacitance
↑↑↑↑
↑↑↑↑
Effect on optimal logic depth
↓↓↓↓Additional memory
↓↓↓↓Super-linear growth of flip-flops
Effect on optimal power saving
![Page 25: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/25.jpg)
Conclusion• Pipelining is an effective low-power tool
when used to support voltage scaling in digital system implementing highly parallel computation.
• Optimal Logic Depth: 6-8 FO4– ~ 8-10 FO4 including flip-flop delay
• Optimal Power Saving: 55 – 80% – It depends on Vth, AF, Clock-Gating
• Insights:– Pipelining is more effective with High AF
• Pipelining is most effective at saving switching power
– Pipelining is more effective with lower Vth• Except for when leakage power is dominant.
– Pipelining is more effective with clock-gating • reduced flip-flop overhead.
![Page 26: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/26.jpg)
Acknowledgments• Thanks to SCALE group members and
anonymous reviewers
• Funded by NSF CAREER award CCR-0093354, NSF ITR award CCR-0219545, and a donation from Intel Corporation.
![Page 27: Power-Optimal Pipelining in Deep Submicron Technologyscale.eecs.berkeley.edu/papers/pipe-islped2004-slides.pdfPower-Optimal Pipelining in Deep Submicron Technology Seongmoo Heo and](https://reader036.fdocuments.us/reader036/viewer/2022071515/61381fe10ad5d20676491146/html5/thumbnails/27.jpg)
BACKUP SLIDES