Resonant Clock Design for a Power-efficient, High-volume...

35
Resonant Clock Design for a Power-efficient, High-volume x86-64 Microprocessor x86 64 Microprocessor Visvesh Sathe 1 Srikanth Arekapudi 1 Alexander Ishii 2 Visvesh Sathe , Srikanth Arekapudi , Alexander Ishii , Charles Ouyang 1 , Marios Papaefthymiou 2,3 , Samuel Naffziger 1 1 Advanced Micro Devices, Inc. 2 Cyclos Semiconductor, Inc. 3 The University of Michigan, Ann Arbor

Transcript of Resonant Clock Design for a Power-efficient, High-volume...

Page 1: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Resonant Clock Design for a Power-efficient, High-volume

x86-64 Microprocessorx86 64 Microprocessor

Visvesh Sathe1 Srikanth Arekapudi1 Alexander Ishii2 Visvesh Sathe , Srikanth Arekapudi , Alexander Ishii , Charles Ouyang1, Marios Papaefthymiou2,3,

Samuel Naffziger1

1Advanced Micro Devices, Inc.,2Cyclos Semiconductor, Inc.

3The University of Michigan, Ann Arbor

Page 2: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Outline

• Piledriver (PD) overview

• Resonant clocking• Resonant clocking

• Clock driver design

• Inductor design

• Other resonant clock componentsOther resonant clock components

• Putting it all together

• Measurement results

• Conclusion

Paper 3.7

Conclusion

Page 3: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Piledriver

• Extension of Bulldozer Architecture - 2-core module:• Shared I-fetch, Decode, Br. Predict, FP, L2, Cache unit, , , , ,• Per-core Integer schedule/execute, Load/Store

• 32nm CMOS• HKMG SOI

BP, DecodeI-Fetch• HKMG, SOI

• 11 metal layers• 33.3 mm2 w/L2

I Fetch

e U

nit

Core 0 Core 133.3 mm w/L2• 216M transistors• 0.8-1.3 V

FPUCa

ch

Core 0 Core 1

2MB L2 Cache

Paper 3.7

FPU Cache

Page 4: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

The Piledriver Global ClockSi ifi t l b l l k l di• Significant global clock loading

– 7ps clock grid skew target across 21mm2 core area

– Constrained clock latency from grid to timing elements

24% of average application• 24% of average applicationpower in global clock

Paper 3.7

Page 5: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Resonant Clocking Operation

• Rely on efficient LC resonance between spiral inductors and grid capacitance near resonant frequencyand grid capacitance near resonant frequency

• Efficient operation around natural frequency • Driving clock at much lower frequencies Reduced efficiency Warped clock waveform

Page 6: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Resonant Clocking: Mutual Inductance

• Significant mutual inductance interaction from– Signal nets under and around inductor windings.g g– Power and ground nets serving circuits under inductor

• Mutual inductance causes L Q Cl k – L, Q, Clock power

• Maintaining keep-out regions is prohibitive (~5% area penalty)

Page 7: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Resonant Clocking on PileDriver

plsEn

3rclk_x

CT/2TSw

preclk

p

drvEn

MSw

C /2

T/clkτ

3drvEn rclk

Clk Tank

CT/2On-chipSpiral Inductor

• Dual mode clock system with Mode switch (MSw)

Clock Driver Grid CapInductor

y ( )– resonant clocking (rclk) when MSw is closed– conventional clocking (cclk) when MSw is open

• CT ≈ 6xCclk to serve as effective AC ground• Throttle Switch (TSw) to address transient voltage spikes

Page 8: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Piledriver Resonant Clocking

• 92 distributed inductors• 92 distributed inductors• MSw, TankCap, TSw

inductors contained in HCK

Paper 3.7

Tree Macros• Conventional L2CLK, NBCLK

Page 9: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (1/6)

• Rclk operation can be partitioned into 6 phases• Voltage across Ctank ~Vdd/2• Nmos conducting Pmos off• Nmos conducting, Pmos off• R-L Current buildup through the Nmos• IR drop across Nmos and grid resistance clock voltage IR drop across Nmos and grid resistance clock voltage

Page 10: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (2/6)

• Both, Nmos and Pmos are off• LC oscillation with initial inductor current to charge clk• Clk voltage transition a function of L,C and initial current.

Page 11: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (3/6)

• Pmos on, Nmos off.• Both LC and pull-up mechanisms in effect

Page 12: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (4/6)

• Pmos on, Nmos off• RL current buildup through the Pmos• IR drop across Pmos and grid clock voltage increase

Page 13: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (5/6)

• Both driver devices are off• LC oscillation with initial inductor current to discharge clk• Clk voltage transition a function of L,C and initial current

Page 14: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Basic Rclk Operation (6/6)

• Nmos on, Pmos off• Both LC and pull-down mechanisms in effect

Page 15: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Cclk and Rclk Waveforms• Reduced driver strength required for rclk• Lower rclk slew insertion delay increase (phase offset)

D l d t f d i d i lk l h ff t• Delayed onset of driver devices rclk_pulse phase offset

rclk square

cclk

rclk_squarerclk_pulse

cclk

Paper 3.7

Page 16: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Clock Driver Design

• Clock driver palette with 24 driversUp to 4 blocks Each block can contain up to 6 subBlocks– Up to 4 blocks. Each block can contain up to 6 subBlocks

– Effective granularity and efficiency tradeoff

• Run-time programmable drive strength modulation support

Paper 3.7

p g g pp– Each subBlock consists of 3 banks (2:3:2 ratio)– drvEn[2:0] signals allow for n/7 (n=2,3,4,5,7) drive modulation

Page 17: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Driver Design

• Split buffer design p g– Skewed pre-drivers for reduced crossover current– Allows insertion delay management of rclk w.r.t. cclk, NB and L2

clocks

Paper 3.7

clocks

Page 18: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Driver Design: Pulse Mode• Subtractive pulse-generator

scheme – Delay chain used to delay Delay chain used to delay

asserting edges of nmos and pmos devicesDe asserting transitions not – De-asserting transitions not delayed

– Ontime is a function of input duty cycle and delay amount

• Benefits over traditional pulse generationpulse generation– Lower variation (smaller delay)– Support for Off P-state operation

Paper 3.7

– Allows PLL duty cycle tuning

Page 19: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Inductor Design

Paper 3.7

• Clk macro, bump pitch constrain inductor size

Page 20: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Inductor Design (cont’d)

• Metal sharing with existing power cut-aways

Paper 3.7

• Metal sharing with existing power cut-aways• Center power straps, HCK tree through inductor

for mutual inductance cancellation

Page 21: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Inductor Design (cont’d)

• Custom “loopless” power under inductor to avoid Q

Paper 3.7

Custom loopless power under inductor to avoid Q degradation due to power grid eddys

• TankCap built in Si, metal to meet capacitance, ESR target

Page 22: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Other rclk Components: Mode Switchlk i t lk• rclk resistance ↔ cclk

loading tradeoff• Mode switch offers fet

resistance– Voltage dependent– Lower overdrive → Negative Lower overdrive → Negative

Temperature Coefficient

• cclk → rclk mode causes excessive clock grid loadingexcessive clock grid loading– V(n1) not always well

defined in cclk modeR lt i d d l k – Results in reduced clock amplitude and degraded slew → timing impact

Page 23: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Mode Switch (Contd.)

St i t h i d• Staging techniques used– Turn on Mode Switch in stages (like in power gating)– “Warm” up n0 before making a low resistance connection to grid.

Page 24: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Voltage Overshoot on Mode Switch

• rclk → cclk transition can result in voltage overshoot on n1. • Oxide stress poses a reliability issue.

Page 25: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Throttle Switch• Throttle switch

connected across inductor– Low resistance help p

damp overshoot– Turns on as Mode

Switch turns offSwitch turns off

Page 26: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Other rclk Components: CTank

• CTank acts as an A.C ground connection.

• C, ESR to serve as a low C, ESR to serve as a low loss ground connections– Low resistance power,

ground inductor ground, inductor connection

– High bandwidth decap• Implemented underneath

inductor using Si and metal• Distributed CTank connection Distributed CTank connection

using inductor winding for low resistance contact

• Metal cap built to meet C • Metal cap built to meet C, ESR requirements

Page 27: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Clock Tuning• 7ps grid skew target across

22mm2

• Heavy clock grid loading requires • Heavy clock grid loading requires effective strategies for:– Grid wire tuning– Clock driver tuning– Inductor tuning

• Elmore delay-aware local wire • Elmore delay-aware local wire routing solution.

• Clock wire tuning algorithm t t t k ithmeets target skew with

– Inductance-aware clock spine geometries

– Minimal clock spine capacitance

Page 28: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Clock Tuning (contd.)• Global clock load varies significantly across core• Effective driver and Inductor allocation key to maintaining

clock skew clock skew – Driver palette size : 24– Inductor palette size : 5 (0.5—1.3nH range)

• Iterative Linear Programming-based algorithm for driver and inductor allocation 1 Start with initial driver/inductor assignment1. Start with initial driver/inductor assignment2. Linearize problem – Obtain sensitivity matrix for each

driver/inductor location3. Setup L.P, solve for optimal assignment4. Run full chip clock skew analysis5 If skew budget not met goto 25. If skew budget not met, goto 2

Page 29: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Clock Tuning (contd)Cl k k t l f lk d lk• Clock skew control for cclk and rclk– Wire tuning algorithm to constrain clock latency– Iterative LP formulation for optimal driver and inductor Iterative LP formulation for optimal driver and inductor

assignment– Interleaved driver and inductor placement

Additi l lk k i t 1– Additional rclk skew impact <1ps

Clock MeshClock MeshLoad

Global ClockSpine

Driver-MSw Shorting Bar

Page 30: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Putting It All Together• Clock configuration programming

during P-State transitions F i d d f t bl

ProgramSequencerNorth Bridge

Fuse Box

• Frequency-indexed fuse table to access configuration bits

– Mode selection (rclk,cclk)

g

Clk Mode Parameter Broadcast (in HCK tree4)( , )

– Driver strength– Pulse_en, Pulse duty cycle Clk Mode Parameter

Broadcast (in HCK tree1)• Source-synchronous transfer• Cclk mode during P-State

transition Clk Mode Parameter

Broadcast (in HCK tree1)

transition Broadcast (in HCK tree0)driver[0]

Clk ModeParameter

D d

driver[26]Clk Mode

Parameter Decode

SourceSynchronous

Receiver

DecodeDecode

Page 31: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Measured Efficiency (%) vs. Frequency

rclk_pulse_3750C250C

rclk square 3

rclk_square_2

rclk_square_3

• Efficiency : Percentage clock power savings over cclk• rclk_square_x Clock driver strength modulation of x/7

Page 32: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

fmax Measurements

3.9

4.0

3.7

3.8

(GH

z)

3.5

3.6

f max

3.3

3.4

cclk rclk sq 3 rclk sq 2

• Frequency-limiting patterns in HST setup• 0 MHz median, 5 MHz mean (0.13%) frequency overhead in rclk

_ q_ _ q_

( ) q y– rclk cclk phase offset– Low-slew rclk waveforms variation in timing elements

Page 33: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Rclk Measurement Summary• Successfully ran SST (System Stress Test) over 2 weeks• Latest Fmax impact data on a larger set of parts shows an Fmax

overhead of ~0.2%overhead of 0.2%• Up to 34% energy efficiency achieved in the global clock using

Pulse ModeN t d ti d d t i F i t d i b – Not production ready due to excessive Fmax impact driven by phase offset to NB and L2 clock interface

– Phase offset issue resolved in current design• Temperature effect : Overall efficiency degradation with T

– Positive temperature coefficient due to metal resistanceN ti t t ffi i t d t l d i MS– Negative temperature coefficient due to low-overdrive MSw

• Traces with higher activity provide additional efficiency– Increased clock load dominates additional crossover current.

Page 34: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

Conclusion• Dual-mode resonant clock design in 32nm SOI

– Conventional mode : < 2.9GHzR d 2 9GH f– Resonant mode : > 2.9GHz to fmax

• Power savingsClock power: 25%– Clock power: 25%

– Average application power (core): 4.5% – Idle power (core): 10%Idle power (core): 10%

• Built upon existing clock infrastructure• No CMOS technology modificationNo CMOS technology modification

Paper 3.7

Page 35: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume

AcknowledgmentsAMD

• Manivannan Bhoopathy• Kyle Viau

Cyclos Semiconductor• Fred Brauchler• Joohee Kim• Kyle Viau

• Alan Arakawa• Tom Meneghini

• Joohee Kim• Jerry Kao

• Denny Renfrow• Vasant Palisetti• Kevin Hurd• David Johnson• Rohit Kumar1

• Alvin Loke• Alvin Loke• Bill McGee

1 Currently with Apple Computers

Paper 3.7

y pp p