24. Skew-Tolerant Design - University of Texas at...

28
D. Z. Pan 24. Skew-Tolerant Design 1 24. Skew-Tolerant Design Previous Unit: – Power • Dynamic • Static Design for low power This Unit: Clock Distribution Clock Skew Skew-Tolerant Static Circuits Skew-Tolerant Domino Circuits

Transcript of 24. Skew-Tolerant Design - University of Texas at...

Page 1: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design1

24. Skew-Tolerant Design• Previous Unit:

– Power• Dynamic• Static

– Design for low power• This Unit:

– Clock Distribution– Clock Skew– Skew-Tolerant Static Circuits– Skew-Tolerant Domino Circuits

Page 2: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design2

Clocking• Synchronous systems use a clock to keep

operations in sequence– Distinguish this from previous or next– Determine speed at which machine operates

• Clock must be distributed to all the sequencing elements– Flip-flops and latches

• Also distribute clock to other elements – Domino circuits and memories

Page 3: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design3

Clock Distribution• On a small chip, the clock distribution

network is just a wire– And possibly an inverter for clkb

• On practical chips, the RC delay of the wire resistance and gate load is very long– Variations in this delay cause clock to get to

different elements at different times– This is called clock skew

• Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew

Page 4: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design4

Example• Skew comes from differences in gate and

wire delay– With right buffer sizing, clk1 and clk2 could

ideally arrive at the same time.– But power supply noise changes buffer delays– clk2 and clk3 will always see RC skew

3 mm

1.3 pF

3.1 mmgclk

clk1

0.5 mm

clk2clk3

0.4 pF 0.4 pF

Page 5: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design5

Review: Skew Impact

F1 F2

clk

clk clk

Combinational Logic

Tc

Q1 D2

Q1

D2

tskew

CL

Q1

D2

F1

clk

Q1

F2

clk

D2

clk

tskew

tsetup

tpcq

tpdq

tcd

thold

tccq

( )setup skew

sequencing overhead

hold skew

pd c pcq

cd ccq

t T t t t

t t t t

≤ − + +

≥ − +

144424443

• Ideally full cycle isavailable for work

• Skew adds sequencingoverhead

• Increases hold time too

Page 6: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design6

Cycle Time Trends• Much of CPU performance comes from

higher f– f improving faster than simple process shrinks– Sequencing overhead is bigger part of cycle

0.01

0.1

1

10

100

8038680486PentiumPentium II / III

Spec

Int9

5

1985 1988 1991 1994 1997 2000

1.2 0.8 0.6 0.35 2.0

Process

100

200

500VDD = 5 VDD = 3.3

VDD = 2.5

50Fano

ut-o

f-4 (F

O4)

Inve

rter D

elay

(ps)

0.25

10

100

1000

8038680486PentiumPentium II / III

MH

z

1988 1991 1994 1997 20001985

10

100

1985 1988 1991 1994 1997

8038680486PentiumPentium II / IIIFO

4 in

verte

r del

ays

/ cyc

le

50

20

2000

Page 7: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design7

Solutions• Reduce clock skew

– Careful clock distribution network design– Plenty of metal wiring resources

• Analyze clock skew– Only budget actual, not worst case skews– Local vs. global skew budgets

• Tolerate clock skew– Choose circuit structures insensitive to skew

Page 8: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design8

Clock Distribution Networks• Ad hoc• Grids• H-tree• Hybrid

Page 9: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design9

Clock Grids• Use grid on two or more levels to carry

clock• Make wires wide to reduce RC delay• Ensures low skew between nearby points• But possibly large skew across die

Page 10: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design10

Alpha Clock Grids

PLL

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

gclk grid

Alpha 21064 Alpha 21164 Alpha 21264

Page 11: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design11

H-Trees• Fractal structure

– Gets clock arbitrarily close to any point– Matched delay along all paths

• Delay variations cause skew• A and B might see big skew A B

Page 12: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design12

Itanium 2 H-Tree• Four levels of buffering:

– Primary driver– Repeater– Second-level

clock buffer– Gater

• Route aroundobstructions Primary Buffer

Repeaters

Typical SLCBLocations

Page 13: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design13

Hybrid Networks• Use H-tree to distribute clock to many

points• Tie these points together with a grid

• Ex: IBM Power4, PowerPC– H-tree drives 16-64 sector buffers– Buffers drive total of 1024 points– All points shorted together with grid

Page 14: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design14

Skew Tolerance• Flip-flops are sensitive to skew because

of hard edges– Data launches at latest rising edge of clock– Must setup before earliest next rising edge

of clock– Overhead would shrink if we can soften

edge• Latches tolerate moderate amounts of

skew– Data can arrive anytime latch is transparent

Page 15: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design15

Skew: Latches

Q1

L1

φ1

φ2

L2 L3

φ1 φ1φ2

CombinationalLogic 1

CombinationalLogic 2

Q2 Q3D1 D2 D3( )

( )

sequencing overhead

1 2 hold nonoverlap skew

borrow setup nonoverlap skew

2

,

2

pd c pdq

cd cd ccq

c

t T t

t t t t t t

Tt t t t

≤ −

≥ − − +

≤ − + +

123

2-Phase Latches

( )

( )

setup skew

sequencing overhead

hold skew

borrow setup skew

max ,pd c pdq pcq pw

cd pw ccq

pw

t T t t t t t

t t t t t

t t t t

≤ − + − +

≥ + − +

≤ − +

1444442444443

Pulsed Latches

Page 16: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design16

Dynamic Circuit Review• Static circuits are slow because fat pMOS

load input• Dynamic gates use precharge to remove

pMOS transistors from the inputs– Precharge: φ = 0 output forced high– Evaluate: φ = 1 output may pull low

A B

C DY

A B C D

A

B

C

D

Page 17: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design17

Domino Circuits• Dynamic inputs must monotonically rise

during evaluation– Place inverting stage between each dynamic

gate– Dynamic / static pair called domino gate

• Domino gates can be safely cascaded

A

W

φ

B

X

domino AND

dynamicNAND

staticinverter

Page 18: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design18

Domino Timing• Domino gates are 1.5 – 2x faster than

static CMOS– Lower logical effort because of reduced Cin

• Challenge is to keep precharge off critical path

• Look at clocking schemes for precharge and eval– Traditional schemes have severe overhead– Skew-tolerant domino hides this overhead

Page 19: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design19

Traditional Domino Ckts• Hide precharge time by ping-ponging

between half-cycles– One evaluates while other precharges– Latches hold results during precharge

Tc

Sta

tic

Dyn

amic

Latc

h

clk

Sta

tic

Dyn

amic

clkS

tatic

Dyn

amic

clk

Dyn

amic

clk clk

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

clk clk clk clk clk

clk

clk

tpdq tpdq

2pd c pdqt T t= −

Page 20: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design20

Clock Skew• Skew increases sequencing overhead

– Traditional domino has hard edges– Evaluate at latest rising edge– Setup at latch by earliest falling edge

Sta

tic

Dyn

amic

Latc

hclk

Sta

tic

Dyn

amic

clkD

ynam

icclk clk

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

Dyn

amic

clk clk clk clk

clk

clk

tskewtsetup

setup skew2 2pd ct T t t= − −

Page 21: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design21

Time Borrowing• Logic may not exactly fit half-cycle

– No flexibility to borrow time to balance logic between half cycles

• Traditional domino sequencing overhead is about 25% of cycle time in fast systems!

Sta

tic

Dyn

amic

Latc

hclk

Sta

tic

Dyn

amic

clk clk

Sta

tic

Dyn

amic

Latc

h

Sta

tic

Dyn

amic

clk clk clk

clk

clk

tskewtsetup

Page 22: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design22

Relaxing the Timing• Sequencing overhead caused by hard

edges– Data departs dynamic gate on late rising edge– Must setup at latch on early falling edge

• Latch functions– Prevent glitches on inputs of domino gates– Holds results during precharge

• Is the latch really necessary?– No glitches if inputs come from other domino– Can we hold the results in another way?

Page 23: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design23

Skew-Tolerant Domino• Use overlapping clocks to eliminate latches

at phase boundaries.– Second phase evaluates using results of first

a

Sta

tic

Dyn

amic

φ1

Sta

tic

Dyn

amic

φ2

b c d

a

φ1

φ2

b

c

a

φ1

φ2

b

c

No latch atphase boundary

Page 24: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design24

Full Keeper• After second phase evaluates, first phase

precharges• Input to second phase falls

– Violates monotonicity?• But we no longer need the value• Now the second gate has a floating output

– Need full keeper to hold it either high or lowφ

weak fullkeepertransistors

f

XH

Page 25: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design25

Time Borrowing• Overlap can be used to

– Tolerate clock skew– Permit time borrowing

• No sequencing overheadtskew

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Sta

tic

φ1

φ2

φ1 φ1 φ1 φ1 φ1 φ2 φ2 φ2

Phase 1 Phase 2

toverlap

tborrow

pd ct T=

Page 26: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design26

Multiple Phases• With more clock phases, each phase

overlaps more– Permits more skew tolerance and time

borrowingS

tatic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Sta

tic

Dyn

amic

Dyn

amic

Sta

tic

Sta

tic

φ3

φ4

φ1 φ1 φ2 φ2 φ3 φ3 φ4 φ4

Phase 1 Phase 2 Phase 3 Phase 4

φ1

φ2

Page 27: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design27

Clock Generation

clken

φ1

φ2

φ3

φ4

Page 28: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends

D. Z. Pan 24. Skew-Tolerant Design28

Summary• Clock skew effectively increases setup and

hold times in systems with hard edges• Managing skew

– Reduce: good clock distribution network– Analyze: local vs. global skew– Tolerate: use systems with soft edges

• Flip-flops and traditional domino are costly• Latches and skew-tolerant domino perform

at full speed even with moderate clock skews.