Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of...

39
Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux , Shahriar Mirabbasi, William Dunford University of British Columbia, Canada Patrick Palmer University of Cambridge, UK Energy Recovery from High-frequency Clocks using DC-DC Converters
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Mehdi Alimadadi, Samad Sheikhaei, Guy Lemieux, Shahriar Mirabbasi, William Dunford University of...

Mehdi Alimadadi, Samad Sheikhaei,

Guy Lemieux, Shahriar Mirabbasi, William Dunford

University of British Columbia, Canada

Patrick Palmer

University of Cambridge, UK

Energy Recovery

from High-frequency Clocks

using DC-DC Converters

2

Problem

Clock power in high-performance CPUsCPU Year Clock Power % Power

for ClockClock Power

Intel McKinley2002

(180nm)1 GHz 130W 33% 43W

Intel Montecito2005

(90nm)2.5 GHz 85W 30% 25W

IBM Power 62007

(65nm)5 GHz > 100W 22% > 22W

• Cause– Charge big clock capacitor Cclk with energy– Discharge Cclk energy to GND (WASTE IT!!)– Repeat every clock cycle

3

Primary Contribution of This Work

• Primary contribution– Discharge Cclk using DC-DC converter instead of GND

• Use converter to power useful load (Rload)• Integrated clock drivers with DC-DC converters• Net savings in power

Voltage feedback (for regulation)

Useful

Load

4

Summary Results

• Explore 3 main DC-DC power converter topologies– Buck converter our previous work [ ISSCC 2007 ]– Boost converter this paper [ ISVLSI 2008 ]– Buck-boost converter this paper [ ISVLSI 2008 ]

• 90nm layouts, 3GHz operation, < 0.3mm2

Clock-only power (input)

Extra power to operate

converter (input)

Converter output power

% clock energy

recovered

Buck converter [ ISSCC2007 ]

40mW 16mW 26mW 50%

Boost converter

100mW 25mW 28mW 20%

Buck-boost converter

100mW 72mW 48mW 30%

Background

6

Background – Typical Clocking Architecture

Level 3 Gaters & Final drivers

Final H-tree

Bottom mesh

Level 1 & Level 2 H-tree

Clock

Source

7

Background – Typical Clocking Architecture

• Clock distribution

– Majority of energy used by final drivers

– Levels 1, 2• H-trees• Tunable delays (CVDs) to eliminate skew• Low-swing, differential low power, noise immunity• ~ 5W of power

– Level 3• Gaters reduce clock activity 50-85% (Power6)

– Can’t eliminate all activity still need a clock to compute• Final clock drivers

– Full-rail swing tapered inverters drive hundreds latches, high power• H-tree with ends shorted by Mesh low skew, high power

• ~15W to 40W of power

8

Background –Reducing Clock Power

• Clock distribution– Low-swing (differential) signals

• Final drivers need full-rail

– Resonant clocking (saves 80%)• Final drivers need square clock

• Final clock drivers– Adiabatic switching

• Low-performance, < 100MHz

– Double-edge clocking• Feasible, but complex flip-flops, larger loads• Compatible with energy recovery in this paper

9

Background – Switch Mode Power Supplies

• Basic DC-DC converter topologies– Buck

• Step down• 0 Vout VDD

– Boost• Step up• VDD Vout

– Buck-boost• Negative step up/down• Vout 0

CF

LF

D

S

RL

+

CF

LF D

S RL

+

CF

DS

RLLF

+

10

Background – Switch Mode Power Supplies

• DC-DC buck converter– CMOS inverter as power switches

• Implementation of zero-voltage switching (ZVS)– Turn on NMOS when Vinv= 0– Turn on PMOS when Vinv=Vdd

C R

Vgate VoutVinv

Vdd

S

D

IL

LL

R

VoutVinv

DS

-+Vin C

Vgate

IL

Background

ISSCC 2007 Design

• ZVS delay circuit• Integrated clock driver / power converter

12

Integration of Clock and SMPS

• CPU clock: 3GHz clock and large Cclk

• SMPS: large Mp, Mn drive chain

13

Integration of Clock and SMPS

• Combine the driver circuits

Vclk

Cclk

CLK in

Mp

Mn

VoutLf

Cf Rload

CLK in

14

Key Concept: Energy Recycling

• Benefits– Shared driver chain

– Cclk added to SMPS

• Red path– NMOS drains Cclk wastes charge!

• Blue path– Delay NMOS turn-on recovers clock charge!– ZVS (zero voltage switching) in power electronics

Vclk

Cclk

CLK in

VoutLf

Cf Rload

15

ZVS Detailed Operation

• ZVS delay circuit – Delay only rising edge of Vn

– Implemented inside the clock chain

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

16

ZVS Detailed Operation (Mode 1)

• Mode 1 (0 < t < DTsw)

– Mp is ON

– Current builds up in the inductor

– Cclk charges up

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Switching period

17

ZVS Detailed Operation (Mode 2)

• Mode 2 (DTsw < t < DTsw+Tzvs)– Both power transistors are OFF

– Inductor current discharges Cclk

– Cclk charge is recycled to output load

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Period

Tzvs = ZVS delay

18

ZVS Detailed Operation (Mode 3)

• Mode 3 (DTsw+Tzvs < t < Tsw)

– Mn turns ON when Vclk 0

• ZVS for Mn

– Inductor current decreases linearly

Mp

Mn

GND

Vdd

Vn

Vp

VoutVclkLf

Cclk Cf Rload

D = Duty cycle

Tsw = Period

Tzvs = ZVS delay

19

Detailed Operation

• ZVS delay circuit for Mn

– Delay rising edge of Vn

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

3

4

Vout

RloadCclk

Lf

Cf

20

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

Detailed Operation

• ZVS delay circuit for Mn

– Falling edges of Vp and Vn are synchronized

21

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Time (nSec)V

olt

age

(V)

VclkVclk-refVload

Simulation Voltages

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

22

Simulation Currents

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Vo

ltag

e (V

)

VclkVclk-refVload

Mp

Mn

GND

Vdd

Vm

Vn

Vp

Vclk

M3

M4

M1

M2

ZVS Delay Circuit

12

2

Vout

RloadCclk

Lf

Cf

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

Time (nSec)

Cu

rren

t (m

A)

LfMnMp

23

Effective Efficiency

• How to measure power efficiency after clock drivers are integrated with DC-DC converters ?– Converter gets “free energy” from clock

– Effective efficiency: how efficient a regular (standalone) power converter must be to equal the efficiency of integrated clock/power converter

Raw efficiency Effective efficiency

1001

in

outraw P

P

Raw Efficiency

Pin1 Pout

Integrated Clock Driverand Power Converter

orStand-alone Power Converter

dummyEffective Efficiency

Pin2

Pin1 – Pin2 PoutPin1

Clock Driver Portion

Power Converter Portion

Recycled Energy(not counted as

input power)

10021

inin

outeffective PP

P

24

Buck Converter – Simulation Results

0

50

100

150

200

250

300

40 50 60 70 80 90 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=30%D=40%D=50%D=60%D=70%

0

0.25

0.5

0.75

1

10 20 30 40 50 60 70 80

Duty Ratio (%)

Vo

ut

(V)

Iout=30

Iout=50

Iout=70

Iout=100

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

25

ISSCC 2007

• 90nm test chip 1mm2, buck converter 0.27mm2

26

Buck Converter – Chip Measurement vs. Simulation Results

0

50

100

150

200

250

300

40 50 60 70 80 90 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=30%D=40%D=50%D=60%D=70%

Chip Measurement Simulation (3GHz)

Fsw Sweep (D=50%)

0

40

80

120

160

200

240

30 40 50 60 70 80 90 100 110

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

3.5GHz3GHz2.5GHz2GHz

ISVLSI 2008New Design 1

Boost Converter

28

Boost Converter

• Basic operation– Vclk provides power & timing

• 0th order result… Vout = D/(1-D)*Vdd

29

Boost Converter

192/0.1

Wp/Lp = 576/0.1 Wp/Lp = 192/0.1

VpulseMp1

Wp/Lp = 48/0.1 Wp/Lp = 16/0.1

Wp/Lp = 192/0.1 Wp/Lp = 64/0.1

64/0.1

4096/0.1

1024/0.1

512/0.1 x2

Clock Load Capacitance

+

ILf

Cshift=21pF

Vclk

Vclk_scaled

4096/0.1

2048/0.1Mp2

Mp3

Mn2

Mn3

Mn1

Cclk_scaled

Vshift

Dshift

Vout

1kW

Cclk=21pF

+CF=378pF

2.2pF

LF=310pH

216/0.75

36720/0.75

VDD

2016/0.75Cclk=Cshift

30

Boost Converter – Simulation Results

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

0

0.5

1

1.5

2

2.5

30 40 50 60 70 80

Duty Ratio (%)

Vo

ut

(V)

Iout=10mAIout=30mAIout=50mAIout=70mAIout=100mA

0

25

50

75

100

125

0 20 40 60 80 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=40%D=50%D=60%D=70%D=80%

ISVLSI 2008New Design 2

Buck-boost Converter

32

Buck-boost Converter

• Basic operation– Vclk provides power & timing

• 0th order result… Vout = -D2/(1-D)*Vdd

33

Buck-boost Converter

192/0.1

Wp/Lp = 576/0.1 Wp/Lp = 192/0.1

VpulseMp1

Mn1

ILf LF

Vclk

Clock Load Capacitance

Vinv

Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 2016/0.75

Wp/Lp = 192/0.1 Wp/Lp = 64/0.1

64/0.1

4096/0.1

4096/0.1

1024/0.1

1024/0.1

+

+10.4kW

128/0.1

310pH

Cbias

2016/0.75 Cshift=21pF

Mp2

Mp3

4096/0.1Mp4

Mp5

Mn2

Mn3

Dshift

Vshift

Vclk

Vbias

Deep N-WellStructures

Vout

Three Diodesin Series, Each: 128/0.1

1kW

21pF

Cclk = 21pF

VDD

+CF=356pF

34560/0.75CF

2016/0.75

Cbias

34

Buck-boost Converter

-2

-1.6

-1.2

-0.8

-0.4

0

10 20 30 40 50 60 70

Duty Ratio (%)

Vo

ut

(V)

Iout=10mA

Iout=30mA

Iout=50mA

Iout=70mA

Iout=90mA

• Open loop converter (no regulation)– Higher efficiency at lowest duty cycle because

only a fixed amount of energy is available from Cclk

0

20

40

60

80

100

0 20 40 60 80 100

Iout (mA)

Eff

ecti

ve E

ffic

ien

cy (

%)

D=20%D=30%D=40%D=50%D=60%D=70%

Results and Comparisons

36

Summary Results

• 90nm layouts, 3GHz operation, < 0.3mm2

Clock-only power (input)

Extra power to operate

converter (input)

Converter output power

% clock energy

recovered

Buck converter [ ISSCC2007 ]

40mW 16mW 26mW 50%

Boost converter

100mW 25mW 28mW 20%

Buck-boost converter

100mW 72mW 48mW 30%

37

Comparative Results

• IBM Power6 100W@1V, 341mm2 Cclk = 13pF/mm2

• Other work: fully on-chip DC-DC buck converter– S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A Multi-Stage Interleaved Synchronous Buck Converter

with Integrated Output Filter in a 0.18µm SiGe Process," ISSCC 2006, pp. 356–357

– 27mm2, 45MHz– 65% power efficiency

• This work– 0.27, 0.26, 0.20 mm2, including 0.1mm2 inductor area, 3GHz

• Cclk 20pF, equiv to 1.6mm2 of Power6 area

• DC-DC converter adds 12.5% area overhead

– LC filter: 310pH inductor, 350pF capacitor• L and C similar and dominate layout area can stack to cut area in half

– Buck: 75 – 185% effective power efficiency (50% recovered)– Boost: 25 – 110% effective power efficiency (20% recovered)– Buck-boost: 20 – 66% effective power efficiency (30% recovered)

38

Conclusion

• Key concepts– High switching frequency saves area– Combined drivers saves area and switching loss

– Recycled charge converter load discharges Cclk

– ZVS delay circuit lower power loss

• Limitations– Regulation needs variable duty cycle clock

• May introduce additional clock jitter• Mostly suitable for edge-triggered blocks

(no latches)

• Future work– Lots of improvements to make!

Thank you!

Questions ?