Mark Dunford Academic Quality and Partnership Director Principal Investigator, Silver Stories
Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of...
Mehdi Alimadadi, Samad Sheikhaei,
Guy Lemieux, Shahriar Mirabbasi, William Dunford
University of British Columbia, Canada
Patrick Palmer
University of Cambridge, UK
Energy Recovery
from High-frequency Clocks
using DC-DC Converters
2
Problem
Clock power in high-performance CPUsCPU Year Clock Power % Power
for ClockClock Power
Intel McKinley2002
(180nm)1 GHz 130W 33% 43W
Intel Montecito2005
(90nm)2.5 GHz 85W 30% 25W
IBM Power 62007
(65nm)5 GHz > 100W 22% > 22W
• Cause– Charge big clock capacitor Cclk with energy– Discharge Cclk energy to GND (WASTE IT!!)– Repeat every clock cycle
3
Primary Contribution of This Work
• Primary contribution– Discharge Cclk using DC-DC converter instead of GND
• Use converter to power useful load (Rload)• Integrated clock drivers with DC-DC converters• Net savings in power
Voltage feedback (for regulation)
Useful
Load
4
Summary Results
• Explore 3 main DC-DC power converter topologies– Buck converter our previous work [ ISSCC 2007 ]– Boost converter this paper [ ISVLSI 2008 ]– Buck-boost converter this paper [ ISVLSI 2008 ]
• 90nm layouts, 3GHz operation, < 0.3mm2
Clock-only power (input)
Extra power to operate
converter (input)
Converter output power
% clock energy
recovered
Buck converter [ ISSCC2007 ]
40mW 16mW 26mW 50%
Boost converter
100mW 25mW 28mW 20%
Buck-boost converter
100mW 72mW 48mW 30%
6
Background – Typical Clocking Architecture
Level 3 Gaters & Final drivers
Final H-tree
Bottom mesh
Level 1 & Level 2 H-tree
Clock
Source
7
Background – Typical Clocking Architecture
• Clock distribution
– Majority of energy used by final drivers
– Levels 1, 2• H-trees• Tunable delays (CVDs) to eliminate skew• Low-swing, differential low power, noise immunity• ~ 5W of power
– Level 3• Gaters reduce clock activity 50-85% (Power6)
– Can’t eliminate all activity still need a clock to compute• Final clock drivers
– Full-rail swing tapered inverters drive hundreds latches, high power• H-tree with ends shorted by Mesh low skew, high power
• ~15W to 40W of power
8
Background –Reducing Clock Power
• Clock distribution– Low-swing (differential) signals
• Final drivers need full-rail
– Resonant clocking (saves 80%)• Final drivers need square clock
• Final clock drivers– Adiabatic switching
• Low-performance, < 100MHz
– Double-edge clocking• Feasible, but complex flip-flops, larger loads• Compatible with energy recovery in this paper
9
Background – Switch Mode Power Supplies
• Basic DC-DC converter topologies– Buck
• Step down• 0 Vout VDD
– Boost• Step up• VDD Vout
– Buck-boost• Negative step up/down• Vout 0
CF
LF
D
S
RL
+
CF
LF D
S RL
+
CF
DS
RLLF
+
10
Background – Switch Mode Power Supplies
• DC-DC buck converter– CMOS inverter as power switches
• Implementation of zero-voltage switching (ZVS)– Turn on NMOS when Vinv= 0– Turn on PMOS when Vinv=Vdd
C R
Vgate VoutVinv
Vdd
S
D
IL
LL
R
VoutVinv
DS
-+Vin C
Vgate
IL
12
Integration of Clock and SMPS
• CPU clock: 3GHz clock and large Cclk
• SMPS: large Mp, Mn drive chain
13
Integration of Clock and SMPS
• Combine the driver circuits
Vclk
Cclk
CLK in
Mp
Mn
VoutLf
Cf Rload
CLK in
14
Key Concept: Energy Recycling
• Benefits– Shared driver chain
– Cclk added to SMPS
• Red path– NMOS drains Cclk wastes charge!
• Blue path– Delay NMOS turn-on recovers clock charge!– ZVS (zero voltage switching) in power electronics
Vclk
Cclk
CLK in
VoutLf
Cf Rload
15
ZVS Detailed Operation
• ZVS delay circuit – Delay only rising edge of Vn
– Implemented inside the clock chain
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
16
ZVS Detailed Operation (Mode 1)
• Mode 1 (0 < t < DTsw)
– Mp is ON
– Current builds up in the inductor
– Cclk charges up
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Switching period
17
ZVS Detailed Operation (Mode 2)
• Mode 2 (DTsw < t < DTsw+Tzvs)– Both power transistors are OFF
– Inductor current discharges Cclk
– Cclk charge is recycled to output load
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
18
ZVS Detailed Operation (Mode 3)
• Mode 3 (DTsw+Tzvs < t < Tsw)
– Mn turns ON when Vclk 0
• ZVS for Mn
– Inductor current decreases linearly
Mp
Mn
GND
Vdd
Vn
Vp
VoutVclkLf
Cclk Cf Rload
D = Duty cycle
Tsw = Period
Tzvs = ZVS delay
19
Detailed Operation
• ZVS delay circuit for Mn
– Delay rising edge of Vn
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
3
4
Vout
RloadCclk
Lf
Cf
20
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
Detailed Operation
• ZVS delay circuit for Mn
– Falling edges of Vp and Vn are synchronized
21
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Time (nSec)V
olt
age
(V)
VclkVclk-refVload
Simulation Voltages
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
22
Simulation Currents
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1
Time (nSec)
Vo
ltag
e (V
)
VclkVclk-refVload
Mp
Mn
GND
Vdd
Vm
Vn
Vp
Vclk
M3
M4
M1
M2
ZVS Delay Circuit
12
2
Vout
RloadCclk
Lf
Cf
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0 0.2 0.4 0.6 0.8 1
Time (nSec)
Cu
rren
t (m
A)
LfMnMp
23
Effective Efficiency
• How to measure power efficiency after clock drivers are integrated with DC-DC converters ?– Converter gets “free energy” from clock
– Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter
Raw efficiency Effective efficiency
1001
in
outraw P
P
Raw Efficiency
Pin1 Pout
Integrated Clock Driverand Power Converter
orStand-alone Power Converter
dummyEffective Efficiency
Pin2
Pin1 – Pin2 PoutPin1
Clock Driver Portion
Power Converter Portion
Recycled Energy(not counted as
input power)
10021
inin
outeffective PP
P
24
Buck Converter – Simulation Results
0
50
100
150
200
250
300
40 50 60 70 80 90 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=30%D=40%D=50%D=60%D=70%
0
0.25
0.5
0.75
1
10 20 30 40 50 60 70 80
Duty Ratio (%)
Vo
ut
(V)
Iout=30
Iout=50
Iout=70
Iout=100
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
26
Buck Converter – Chip Measurement vs. Simulation Results
0
50
100
150
200
250
300
40 50 60 70 80 90 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=30%D=40%D=50%D=60%D=70%
Chip Measurement Simulation (3GHz)
Fsw Sweep (D=50%)
0
40
80
120
160
200
240
30 40 50 60 70 80 90 100 110
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
3.5GHz3GHz2.5GHz2GHz
28
Boost Converter
• Basic operation– Vclk provides power & timing
• 0th order result… Vout = D/(1-D)*Vdd
29
Boost Converter
192/0.1
Wp/Lp = 576/0.1 Wp/Lp = 192/0.1
VpulseMp1
Wp/Lp = 48/0.1 Wp/Lp = 16/0.1
Wp/Lp = 192/0.1 Wp/Lp = 64/0.1
64/0.1
4096/0.1
1024/0.1
512/0.1 x2
Clock Load Capacitance
+
ILf
Cshift=21pF
Vclk
Vclk_scaled
4096/0.1
2048/0.1Mp2
Mp3
Mn2
Mn3
Mn1
Cclk_scaled
Vshift
Dshift
Vout
1kW
Cclk=21pF
+CF=378pF
2.2pF
LF=310pH
216/0.75
36720/0.75
VDD
2016/0.75Cclk=Cshift
30
Boost Converter – Simulation Results
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
0
0.5
1
1.5
2
2.5
30 40 50 60 70 80
Duty Ratio (%)
Vo
ut
(V)
Iout=10mAIout=30mAIout=50mAIout=70mAIout=100mA
0
25
50
75
100
125
0 20 40 60 80 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=40%D=50%D=60%D=70%D=80%
32
Buck-boost Converter
• Basic operation– Vclk provides power & timing
• 0th order result… Vout = -D2/(1-D)*Vdd
33
Buck-boost Converter
192/0.1
Wp/Lp = 576/0.1 Wp/Lp = 192/0.1
VpulseMp1
Mn1
ILf LF
Vclk
Clock Load Capacitance
Vinv
Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 2016/0.75
Wp/Lp = 192/0.1 Wp/Lp = 64/0.1
64/0.1
4096/0.1
4096/0.1
1024/0.1
1024/0.1
+
+10.4kW
128/0.1
310pH
Cbias
2016/0.75 Cshift=21pF
Mp2
Mp3
4096/0.1Mp4
Mp5
Mn2
Mn3
Dshift
Vshift
Vclk
Vbias
Deep N-WellStructures
Vout
Three Diodesin Series, Each: 128/0.1
1kW
21pF
Cclk = 21pF
VDD
+CF=356pF
34560/0.75CF
2016/0.75
Cbias
34
Buck-boost Converter
-2
-1.6
-1.2
-0.8
-0.4
0
10 20 30 40 50 60 70
Duty Ratio (%)
Vo
ut
(V)
Iout=10mA
Iout=30mA
Iout=50mA
Iout=70mA
Iout=90mA
• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because
only a fixed amount of energy is available from Cclk
0
20
40
60
80
100
0 20 40 60 80 100
Iout (mA)
Eff
ecti
ve E
ffic
ien
cy (
%)
D=20%D=30%D=40%D=50%D=60%D=70%
36
Summary Results
• 90nm layouts, 3GHz operation, < 0.3mm2
Clock-only power (input)
Extra power to operate
converter (input)
Converter output power
% clock energy
recovered
Buck converter [ ISSCC2007 ]
40mW 16mW 26mW 50%
Boost converter
100mW 25mW 28mW 20%
Buck-boost converter
100mW 72mW 48mW 30%
37
Comparative Results
• IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2
• Other work: fully on-chip DC-DC buck converter– S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter
with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357
– 27mm2, 45MHz– 65% power efficiency
• This work– 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz
• Cclk 20pF, equiv to 1.6mm2 of Power6 area
• DC-DC converter adds 12.5% area overhead
– LC filter: 310pH inductor, 350pF capacitor• L and C similar and dominate layout area can stack to cut area in half
– Buck: 75 – 185% effective power efficiency (50% recovered)– Boost: 25 – 110% effective power efficiency (20% recovered)– Buck-boost: 20 – 66% effective power efficiency (30% recovered)
38
Conclusion
• Key concepts– High switching frequency saves area– Combined drivers saves area and switching loss
– Recycled charge converter load discharges Cclk
– ZVS delay circuit lower power loss
• Limitations– Regulation needs variable duty cycle clock
• May introduce additional clock jitter• Mostly suitable for edge-triggered blocks
(no latches)
• Future work– Lots of improvements to make!