Author: Meg Dunford (Project Officer School programs, NSW ...
Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux , Shahriar Mirabbasi, William Dunford
description
Transcript of Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux , Shahriar Mirabbasi, William Dunford
Mehdi Alimadadi, Samad Sheikhaei,
Guy Lemieux, Shahriar Mirabbasi, William Dunford
University of British Columbia, Canada
Patrick Palmer
University of Cambridge, UK
Energy Recovery
from High-frequency Clocks
using DC-DC Converters
2
Problem
Clock power in high-performance CPUsCPU Year Clock Power % Power
for ClockClock Power
Intel McKinley2002
(180nm)1 GHz 130W 33% 43W
Intel Montecito2005
(90nm)2.5 GHz 85W 30% 25W
IBM Power 62007
(65nm)5 GHz > 100W 22% > 22W
• Cause– Charge big clock capacitor Cclk with energy– Discharge Cclk energy to GND (WASTE IT!!)– Repeat every clock cycle
3
Primary Contribution of This Work
• Primary contribution– Discharge Cclk using DC-DC converter instead of GND
• Use converter to power useful load (Rload)• Integrated clock drivers with DC-DC converters• Net savings in power
Voltage feedback (for regulation)
Useful
Load
4
Summary Results
• Explore 3 main DC-DC power converter topologies– Buck converter our previous work [ ISSCC 2007 ]– Boost converter this paper [ ISVLSI 2008 ]– Buck-boost converter this paper [ ISVLSI 2008 ]
• 90nm layouts, 3GHz operation, < 0.3mm2
Clock-only power (input)
Extra power to operate
converter (input)
Converter output power
% clock energy
recovered
Buck converter [ ISSCC2007 ]
40mW 16mW 26mW 50%
Boost converter
100mW 25mW 28mW 20%
Buck-boost converter
100mW 72mW 48mW 30%
Background
6
Background – Typical Clocking Architecture
Level 3 Gaters & Final drivers
Final H-tree
Bottom mesh
Level 1 & Level 2 H-tree
Clock
Source
7
Background – Typical Clocking Architecture
• Clock distribution
– Majority of energy used by final drivers
– Levels 1, 2• H-trees• Tunable delays (CVDs) to eliminate skew• Low-swing, differential low power, noise immunity• ~ 5W of power
– Level 3• Gaters reduce clock activity 50-85% (Power6)
– Can’t eliminate all activity still need a clock to compute• Final clock drivers
– Full-rail swing tapered inverters drive hundreds latches, high power• H-tree with ends shorted by Mesh low skew, high power
• ~15W to 40W of power
8
Background –Reducing Clock Power
• Clock distribution– Low-swing (differential) signals
• Final drivers need full-rail
– Resonant clocking (saves 80%)• Final drivers need square clock
• Final clock drivers– Adiabatic switching
• Low-performance, < 100MHz
– Double-edge clocking• Feasible, but complex flip-flops, larger loads• Compatible with energy recovery in this paper
9
Background – Switch Mode Power Supplies
• Basic DC-DC converter topologies– Buck
• Step down• 0 Vout VDD
– Boost• Step up• VDD Vout
– Buck-boost• Negative step up/down• Vout 0
CF
LF
D
S
RL
+
CF
LF D
S RL
+
CF
DS
RLLF
+
10
Background – Switch Mode Power Supplies
• DC-DC buck converter– CMOS inverter as power switches
• Implementation of zero-voltage switching (ZVS)– Turn on NMOS when Vinv= 0– Turn on PMOS when Vinv=Vdd
C R
Vgate VoutVinv
Vdd
S
D
IL
LL
R
VoutVinv
DS
-+Vin C
Vgate
IL
Background
ISSCC 2007 Design
• ZVS delay circuit• Integrated clock driver / power converter
12
Integration of Clock and SMPS
• CPU clock: 3GHz clock and large Cclk
• SMPS: large Mp, Mn drive chain
13
Integration of Clock and SMPS
• Combine the driver circuits
Vclk
Cclk
CLK in
Mp
Mn
VoutLf
Cf Rload
CLK in
14
Key Concept: Energy Recycling
• Benefits– Shared driver chain
– Cclk added to SMPS
• Red path– NMOS drains Cclk wastes charge!
• Blue path– Delay NMOS turn-on recovers clock charge!– ZVS (zero voltage switching) in power electronics
Vclk
Cclk
CLK in
VoutLf
Cf Rload
15
ZVS Detailed Operation
• ZVS delay circuit – Delay only rising edge of Vn
– Implemented inside the clock chain
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
16
ZVS Detailed Operation (Mode 1)
• Mode 1 (0 < t < DTsw)
– Mp is ON
– Current builds up in the inductor
– Cclk charges up
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Switching period
17
ZVS Detailed Operation (Mode 2)
• Mode 2 (DTsw < t < DTsw+Tzvs)– Both power transistors are OFF
– Inductor current discharges Cclk
– Cclk charge is recycled to output load
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
18
ZVS Detailed Operation (Mode 3)
• Mode 3 (DTsw+Tzvs < t < Tsw)
– Mn turns ON when Vclk 0
• ZVS for Mn
– Inductor current decreases linearly
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
19
Detailed Operation
• ZVS delay circuit for Mn
– Delay rising edge of Vn
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
3
4
Vout
RloadCclk
Lf
Cf
20
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
Detailed Operation
• ZVS delay circuit for Mn
– Falling edges of Vp and Vn are synchronized
21
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Time (nSec)V
olt
age
(V)
VclkVclk-refVload
Simulation Voltages
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
22
Simulation Currents
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Time (nSec)
Vo
ltag
e (V
)
VclkVclk-refVload
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
Time (nSec)
Cu
rren
t (m
A)
LfMnMp
23
Effective Efficiency
• How to measure power efficiency after clock drivers are integrated with DC-DC converters ?– Converter gets “free energy” from clock
– Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter
Raw efficiency Effective efficiency
1001
in
outraw P
P
Raw Efficiency
Pin1 Pout
Integrated Clock Driverand Power Converter
orStand-alone Power Converter
dummyEffective Efficiency
Pin2
Pin1 – Pin2 PoutPin1
Clock Driver Portion
Power Converter Portion
Recycled Energy(not counted as
input power)
10021
inin
outeffective PP
P
24
Buck Converter – Simulation Results
0
50
100
150
200
250
300
40 50 60 70 80 90 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=30%D=40%D=50%D=60%D=70%
0
0.25
0.5
0.75
1
10 20 30 40 50 60 70 80
Duty Ratio (%)
Vo
ut
(V)
Iout=30
Iout=50
Iout=70
Iout=100
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
25
ISSCC 2007
• 90nm test chip 1mm2, buck converter 0.27mm2
26
Buck Converter – Chip Measurement vs. Simulation Results
0
50
100
150
200
250
300
40 50 60 70 80 90 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=30%D=40%D=50%D=60%D=70%
Chip Measurement Simulation (3GHz)
Fsw Sweep (D=50%)
0
40
80
120
160
200
240
30 40 50 60 70 80 90 100 110
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
3.5GHz3GHz2.5GHz2GHz
ISVLSI 2008New Design 1
Boost Converter
28
Boost Converter
• Basic operation– Vclk provides power & timing
• 0th order result… Vout = D/(1-D)*Vdd
29
Boost Converter
192/0.1
Wp/Lp = 576/0.1 Wp/Lp = 192/0.1
VpulseMp1
Wp/Lp = 48/0.1 Wp/Lp = 16/0.1
Wp/Lp = 192/0.1 Wp/Lp = 64/0.1
64/0.1
4096/0.1
1024/0.1
512/0.1 x2
Clock Load Capacitance
+
ILf
Cshift=21pF
Vclk
Vclk_scaled
4096/0.1
2048/0.1Mp2
Mp3
Mn2
Mn3
Mn1
Cclk_scaled
Vshift
Dshift
Vout
1kW
Cclk=21pF
+CF=378pF
2.2pF
LF=310pH
216/0.75
36720/0.75
VDD
2016/0.75Cclk=Cshift
30
Boost Converter – Simulation Results
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
0
0.5
1
1.5
2
2.5
30 40 50 60 70 80
Duty Ratio (%)
Vo
ut
(V)
Iout=10mAIout=30mAIout=50mAIout=70mAIout=100mA
0
25
50
75
100
125
0 20 40 60 80 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=40%D=50%D=60%D=70%D=80%
ISVLSI 2008New Design 2
Buck-boost Converter
32
Buck-boost Converter
• Basic operation– Vclk provides power & timing
• 0th order result… Vout = -D2/(1-D)*Vdd
33
Buck-boost Converter
192/0.1
Wp/Lp = 576/0.1 Wp/Lp = 192/0.1
VpulseMp1
Mn1
ILf LF
Vclk
Clock Load Capacitance
Vinv
Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 2016/0.75
Wp/Lp = 192/0.1 Wp/Lp = 64/0.1
64/0.1
4096/0.1
4096/0.1
1024/0.1
1024/0.1
+
+10.4kW
128/0.1
310pH
Cbias
2016/0.75 Cshift=21pF
Mp2
Mp3
4096/0.1Mp4
Mp5
Mn2
Mn3
Dshift
Vshift
Vclk
Vbias
Deep N-WellStructures
Vout
Three Diodesin Series, Each: 128/0.1
1kW
21pF
Cclk = 21pF
VDD
+CF=356pF
34560/0.75CF
2016/0.75
Cbias
34
Buck-boost Converter
-2
-1.6
-1.2
-0.8
-0.4
0
10 20 30 40 50 60 70
Duty Ratio (%)
Vo
ut
(V)
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=90mA
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
0
20
40
60
80
100
0 20 40 60 80 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=20%D=30%D=40%D=50%D=60%D=70%
Results and Comparisons
36
Summary Results
• 90nm layouts, 3GHz operation, < 0.3mm2
Clock-only power (input)
Extra power to operate
converter (input)
Converter output power
% clock energy
recovered
Buck converter [ ISSCC2007 ]
40mW 16mW 26mW 50%
Boost converter
100mW 25mW 28mW 20%
Buck-boost converter
100mW 72mW 48mW 30%
37
Comparative Results
• IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2
• Other work: fully on-chip DC-DC buck converter– S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter
with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357
– 27mm2, 45MHz– 65% power efficiency
• This work– 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz
• Cclk 20pF, equiv to 1.6mm2 of Power6 area
• DC-DC converter adds 12.5% area overhead
– LC filter: 310pH inductor, 350pF capacitor• L and C similar and dominate layout area can stack to cut area in half
– Buck: 75 – 185% effective power efficiency (50% recovered)– Boost: 25 – 110% effective power efficiency (20% recovered)– Buck-boost: 20 – 66% effective power efficiency (30% recovered)
38
Conclusion
• Key concepts– High switching frequency saves area– Combined drivers saves area and switching loss
– Recycled charge converter load discharges Cclk
– ZVS delay circuit lower power loss
• Limitations– Regulation needs variable duty cycle clock
• May introduce additional clock jitter• Mostly suitable for edge-triggered blocks
(no latches)
• Future work– Lots of improvements to make!
Thank you!
Questions ?