skew-tolerant circuit design slides - Harvey Mudd...

50
Skew-Tolerant Circuit Design David Harris [email protected] December, 2000 Harvey Mudd College Claremont, CA

Transcript of skew-tolerant circuit design slides - Harvey Mudd...

Page 1: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design

David [email protected]

December, 2000

Harvey Mudd College

Claremont, CA

Page 2: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 2 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example

Talk based on:• Skew-Tolerant Circuit Design, Morgan Kaufmann Publishers, 2001.

Page 3: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 3 of 50

Sequencing Overhead in Flip-Flop Systems

Ideally, each full clock cycle Tc is available for useful computation

But we must keep token in current pipestage from catching up with token in next• Purpose of sequencing elements such as flip-flops is not to “remember”

data but merely to retard fast paths so they don’t corrupt tokens further along in the pipeline

• This inherently retards the slower paths as well• Introduces “sequencing overhead” inherent to keeping tokens sequenced

For flip-flops: Flop

clk

Static

Static

Static

Static

Static

Static

Flopclk

clk

Cycle 1

∆logic

Tc

∆CQ∆DC

∆logic Tc ∆CQ– ∆DC–=

Page 4: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 4 of 50

Clock Skew & Time Borrowing

Clock skew increases sequencing overhead

• Data may launch on the latest skewed clock edge• Must setup before the earliest skewed clock edge• Clock skew directly cuts into time available for computation

Any unused time at end of cycle is wasted• No time borrowing possible to balance logic between stages

Flopclk

Static

Static

Static

Static

Static

Flopclk

clk

Cycle 1

∆logic

Tc

∆CQ∆DC

tskew

∆logic Tc ∆CQ– ∆DC– tskew–=

Page 5: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 5 of 50

Cycle Time Trends

Much of CPU performance improvement comes from higher frequencies• Clock rates increasing faster than simple process improvement allows• Fixed sequencing overhead becomes greater portion of cycle time

0.01

0.1

1

10

100

8038680486PentiumPentium II / III

Spe

cInt

95

1985 1988 1991 1994 1997 2000

1.2 0.8 0.6 0.35 2.0

Process

100

200

500VDD = 5 VDD = 3.3

VDD = 2.5

50Fano

ut-o

f-4 (F

O4)

Inve

rter D

elay

(ps)

0.25

10

100

1000

8038680486PentiumPentium II / III

MH

z

1988 1991 1994 1997 20001985

10

100

1985 1988 1991 1994 1997

8038680486PentiumPentium II / IIIFO

4 in

verte

r del

ays

/ cyc

le

50

20

2000µ µ µ µ µ µ

Page 6: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 6 of 50

SIA Frequency Roadmap

Semiconductor Industry Association has predicted clock frequencies• Estimate cycle time in FO4 inverter delays• Cycles are getting very short• Minimizing sequencing overhead will be ever more important

Process (mm) Year Frequency (MHz) Cycle time (FO4)0.25 1997 750 (600) 13.3 (16.7)0.18 1999 1250 (733) 11.1 (18.9)0.15 2001 1500 (> 1500) 11.1 (<11.1)0.13 2003 2100 9.20.10 2006 3500 7.10.07 2009 6000 6.00.05 2012 10000 5.0

Page 7: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 7 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example

Page 8: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 8 of 50

Transparent Latches

Transparent latch-based systems use a latch in each half-cycle• Latch is transparent when controlling clock is high, opaque when low• Two latches required so tokens does not skip ahead to next pipeline stage

clk

Static

Static

Static

Static

Static

Static

clk_b

clk

Half-Cycle 1

Tc

Latch

Latch

Half-Cycle 2

clk_b

∆DQ ∆DQ

Transparent

Transparent

Opaque

Opaque

∆logic Tc 2∆DQ–=

Page 9: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 9 of 50

Transparent Latches with Skew

Flip-flops are sensitive to skew because of the hard edges• Data not launched until latest rising edge of clock• Data must setup before earliest falling edge of clock• Overhead would shrink if we could soften an edge

Latch-based systems may still use entire cycle even if there is skew

clk

Static

Static

Static

Static

Static

Static

clk_b

clk

Half-Cycle 1

Tc

Latch

Latch

Half-Cycle 2

clk_b

∆logic Tc 2∆DQ–=

Page 10: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 10 of 50

Pulsed Latches

Can we keep tokens in sequence with only one latch per cycle?• Pulse the latch transparent briefly compared to delay through pipestage

Static

Static

Static

Static

Static

Tc

Latch

∆DQ

Static

Static

Cycle 1

φp

∆logic

tpw ∆DC

∆DC wasted time

Case 1: pulse wider than setup time + clock skewdata arrives while latch is transparent, no time wasted

Case 2: pulse narrower than setup time + clock skewdata may arrive while latch is opaque, leaving wasted time

tpw

tskew

φpφp

φp

tpw

tskew

∆logicTc ∆DQ – if tpw ∆DC tskew+≥Tc tpw ∆DQ– ∆DC– tskew–+ otherwise

=

Page 11: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 11 of 50

Min-Delay

Min-Delay (a.k.a hold time or race problem) is especially insidious• Setup time problems can be fixed by slowing clock• Hold time problems require redesign of chip

Consider min-delay constraint for flip-flops with little intervening logic

Flop 1clk

Flop 2

clk

δlogicδCQ

∆CD

tskew

δlogic

δCQ δlogic+ tskew ∆CD+≥

input arrives at Flop 2

safe for Flop 2 input to change

clk

δlogic ∆CD tskew δCQ–+≥

Page 12: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 12 of 50

Min-Delay (Transparent Latches)

Latches may use nonoverlapping clocks to reduce min-delay problems• But min-delay constraints exist between each latch (twice per cycle)

(in each half-cycle paths)

clk

clk_bδlogicδCQ∆CDtskew

δlogic

Latch 1Latch 2

clk

clk_b

tnonoverlap

input arrives at Latch 2

safe for Latch 2 input to changeδCQ δlogic+ tskew ∆CD tnonoverlap–+≥

δlogic ∆CD tskew δCQ– tnonoverlap–+≥

Page 13: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 13 of 50

Min-Delay (Pulsed Latches)

Pulsed latches with wide pulses are especially susceptible to min-delay• Data launches off earliest rising edge of pulse• Must hold until after latest falling edge of pulse

δlogicδCQ

∆CDtskew

δlogic

input arrives at Latch 2

safe for Latch 2 input to change

Latch 1

φp

Latch 2

φp

φp

tpw

δlogic tpw ∆+ CD tskew δCQ–+≥

Page 14: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 14 of 50

Time Borrowing

Logic between latches may occupy more than exactly 1/2 cycle• Borrow time so long as setup time at next latch is met• Soft edges provide flexibility to partition and balance logic (unlike flops)

clk

clk_b

clk

Half-Cycle 1

Tc

Latch

Latch

Half-Cycle 2

clk_b

CombinationalLogic

tskew

∆DC

Tc/2

tborrow

tborrowTc2------ ∆DC– tskew–=

Page 15: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 15 of 50

Time Borrowing with Pulses

What if the clock duty cycle is less than 1/2 cycle (pulses or nonoverlap)?• Still meet setup time before end of transparency pulse

clk

clk_b

clk

Half-Cycle 1

Tc

Latch

Latch

Half-Cycle 2

clk_b

CombinationalLogic

tskew

∆DC

Tc/2

tborrow tnonoverlap

tpw

tborrow

tpw ∆DC– tskew– pulsed latches

Tc2------ tnonoverlap– ∆DC tskew–– transparent latches

=

Page 16: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 16 of 50

Skew-Tolerant Static Circuits Summary

Flip-Flops:• Popular because CAD tools and junior engineers understand them• Worst sequencing overhead and no time borrowing

Transparent Latches:• Better sequencing overhead• Excellent time borrowing• Understood by most but not all engineers and CAD tools

Pulsed Latches:• Lowest sequencing overhead• Can be treated like flip-flops• Requires thorough min-delay analysis• Minimal time borrowing

Element Sequencing overhead Time borrowing Min-delay δlogic

Flip-flop

Transparent latch

(in each half-cycle)Pulsed latch

∆CQ ∆DC tskew+ + 0 ∆CD tskew δCQ–+

2∆DQ Tc2------ ∆DC tskew tnonoverlap–––

∆CD tskew δCQ– tnonoverlap–+

∆DQ max 0 ∆DC tskew tpw–+,( )+ tpw ∆– DC tskew– tpw ∆+ CD tskew δCQ–+

Page 17: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 17 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example

Page 18: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 18 of 50

Dynamic Circuit Review

Static CMOS circuits are slow because fat PMOS devices load inputDynamic gates use precharged circuitry to remove PMOS from input

• Precharge: φ = 0 output set high through clocked PMOS transistor• Evaluate: φ = 1 output possibly pulls low through logic transistors

A B C D

Dynamic NOR4

A B C D

Static NOR4

Y

X

Y

Evaluate PrechargePrecharge

φ

φ

φ

Page 19: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 19 of 50

Domino Circuits

Dynamic inputs must be monotonically rising during evaluation• Place inverting static gate between each dynamic gate• Dynamic / static pair is called a domino gate

Domino gates can be safely cascaded• Inputs from static CMOS logic must still be latched to avoid glitches

Domino evaluation delays 1.5 - 2x faster than static CMOS• Challenge is how to prevent precharge time from impacting critical path• Traditional domino clocking approaches introduce severe overhead• Skew-tolerant domino circuits can hide this overhead

φ

φ

Dynamic Static

Domino Gate

Page 20: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 20 of 50

Traditional Domino Circuits

How do we prevent precharge time from appearing in the critical path?Hide precharge time by ping-ponging between half-cycles

• One evaluates while other precharges• Latches hold results during precharge

∆DQ

clk

clk_b

clk

Half-Cycle 1

Tc

Half-Cycle 2

clk_b

Latch

Dynam

icS

taticD

ynamic

Static

Dynam

ic

Latch

Dynam

icS

taticD

ynamic

Static

Dynam

ic

clk

clkclk

clk_b

clk_bclk_b

Evaluate

Precharge Evaluate

Precharge

∆logic Tc 2∆DQ–=

Page 21: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 21 of 50

Clock Skew

Clock skew increases sequencing overhead

• Evaluation begins at latest rising edge• Latch input setup before earliest falling edge• Budget clock skew twice each cycle

∆DC tskew+

Latch

Dynam

icS

taticD

ynamic

Eval begins

Setup before here

clk1

clk1

Half-Cycle 1

Tc

clk1

clk2

clk2

∆logic Tc 2∆DC– 2tskew–=

Page 22: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 22 of 50

Balancing Logic

Logic may not exactly fit half-cycle

• No flexibility to borrow time to balance logic between half-cyclesLatch

Dynam

icS

taticD

ynamic

Latch

Dynam

icS

taticD

ynamic

clk

clk_b

clk

Half-Cycle 1

Tc

Half-Cycle 2

clk_b

clk

clk

clk_b

clk_b

Unusable time

∆logic Tc 2∆DC– 2tskew– timbalanced–=

Page 23: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 23 of 50

How bad is it?

Consider 500 MHz system similar to Alpha 21164• Tc = 2ns• tskew = 200 ps (reported budget)

• = 50 ps (optimistic estimate)

25% of cycle consumed by skew & setup!

∆DC

2 ns

Half-Cycle 1 Half-Cycle 2

Setup Skew Setup SkewLogic Logic

Page 24: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 24 of 50

Relaxing the Timing Constraints

Sequencing overhead caused by hard edges (just as for flip-flops)• Data not launched until latest rising edge of clock• Data must setup before earliest falling edge of clock• Overhead would shrink if we could soften an edge• Let’s look at the role of the latch on the falling edge

Latch function(1) Prevent glitches on inputs of domino gates(2) Hold results during precharge

Latches would be unnecessary if we could do these tasks in another way• Glitches will not occur if inputs come from domino logic• Can we build domino circuits which consume the results before previous

half-cycle precharges?

Page 25: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 25 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example

Page 26: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 26 of 50

Skew-Tolerant Domino Circuits

Overlap clocks so gate Y evaluates before X precharges

• We can remove latches within domino pipeline• Eliminate all sequencing overhead within the domino pipeline• Still budget skew and setup at static / domino interface

∆DC tskew+

φ1 φ2

φ1

Phase 1

Tc

Phase 2

φ2

Dynam

icS

taticD

ynamic

Dynam

icS

taticD

ynamic

Static

Dynam

icS

tatic

Static

Dynam

icS

taticφ1 φ1 φ2 φ2

overlap

X Y

LatchS

taticS

tatic

Page 27: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 27 of 50

Skew-Tolerant Domino Waveforms

In general, we can use N phases

• Each phase delayed Tc/N from the previous• What is the best duty cycle (te and tp)?

φ1 φ3

φ1

Phase 1 Phase 2

φ3

Dynam

icS

taticD

ynamic

Dynam

icS

taticD

ynamic

Static

Dynam

icS

tatic

Static

Dynam

icS

tatic

φ1 φ2 φ3 φ4

toverlap

φ2

φ4

te tp

Phase 3 Phase 4

Tc

Page 28: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 28 of 50

Precharge Time (tp)

Precharge time set by precharge of two gates in a phase with skewed clocks

• First gate begins precharging late• Requires tprech to fully precharge• Precharge must complete before second gate reenters evaluation

φ1a

φ1a

Dynam

icS

tatic

tp

φ1b

Dynam

icS

taticφ1b

tprech tskew

tp tprech tskew+=

Page 29: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 29 of 50

Evaluation Time (te)

Evaluation time set by next phase consuming result before first phase precharges

• First phase begins evaluation early• Must not precharge until some hold time thold after second phase evaluates

φ2a

φ1b

Dynam

icS

tatic

thold

Dynam

icS

tatic

φ1b

φ2a

Phase 1 Phase 2

te

tskewTc / N

teTcN------ thold+ tskew+=

Page 30: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 30 of 50

Skew Tolerance

We’ve found the best duty cycle• Precharge:

• Evaluation:

Solve for the maximum tolerable skew

Observations• Skew tolerance increases with N• Precharge and hold time cut into skew tolerance• In the limit of large N and long cycles, skew tolerance ~ 1/2 cycle

tp tprech tskew+=

teTcN------ thold+ tskew+=

tskew-max

N 1–N

-------------Tc tprech– thold–

2---------------------------------------------------------=

Page 31: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 31 of 50

Local Skew

Local clock domains• Most circuits experience less skew in a small area than across chip• Reduce skew within a phase of logic, thereby tolerating more skew globally

• This is the nominal overlap between phases

tp tprech tskewlocal+=

teTcN------ thold+ tskew

global+=

tskew-maxglobal N 1–

N-------------Tc tprech– thold– tskew

local–=

Page 32: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 32 of 50

Time Borrowing

Time borrowing• Excess overlap allows time borrowing into next phase (e.g. gate X)

φ1

φ1

Phase 1 Phase 2

Dynam

icS

taticD

ynamic

Static

Dynam

icS

tatic

φ1 φ1

φ2

Tc

tborrow

Dynam

icS

tatic

φ2

tskew-maxglobal

tskewglobal

X

tborrow tskew-maxglobal tskew

global– N 1–N

-------------Tc tprech– thold– tskewlocal– tskew

global–= =

Page 33: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 33 of 50

Example and Summary

Again consider a 500 MHz processor:• Tc = 2ns tskewG = 200 ps / tskewL = 50 ps thold = 0 (conservative)

Domino Style Sequencing overhead Skew Tolerance Time borrowingTraditional n/a

Skew-Tolerant

2 3 4 6 8

N (# of clock phases)

t bor

row

(ps)

0

250

500

750

1000

Available Time Borrowing vs. # Clock Phases(assuming tprech = 500 ps)

250583

750 9171000

2∆DC 2tskew– timbalanced– 0

0 N 1–N

-------------Tc tprech– thold– tskewlocal– N 1–

N-------------Tc tprech– thold– tskew

local– tskewglobal–

Page 34: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 34 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Results

Page 35: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 35 of 50

Clock Generation

Two-Phase Clock Generator

Four-Phase Clock Generator

• Inverters used to delay clock by quarter of nominal cycle time

gclk

φ1

φ2

Low-skewcomplementgenerator

Clock choppers

gclk

φ1

φ2

Low-skewcomplementgenerator

φ3

φ4

Optionalclock choppers1/4 cycle delay

Page 36: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 36 of 50

Generator-Induced Clock Skew

Variation in clock generator delays appears as clock skew

Variation in delay 20-30% relative to other clocks• 4-Phase is still a large net benefit

2-Phase 4-Phase

Ove

rlap

(ps)

250

500

750

Chopper Skew

Delay Skew

tborrow

0

Complement Skew

Page 37: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 37 of 50

Interface with Static Logic

Real systems mix both static and domino logic• Transparent latches use φ1 and φ3• Pulsed latches use φp (see book)• Four-phase skew-tolerant domino uses φ1, φ2, φ3, and φ4• Which connections are legal?

Domino outputs must be latched before driving static logic• Use N-C2MOS Latch for speed and to avoid clock skew problems

Q

D

φ

Dynamic gate N-C2MOS latch

From domino logic

To static logic

Page 38: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 38 of 50

Timing Types

Timing types define legal interfaces between gates• Signal suffixes define when data is legal to use, allow static verification• _s indicates that a signal is stable throughout that phase• _v indicates that a signal becomes valid before the end of that phase

φ1

φ2

φ3

x_s23

φ4

y_v2

φ2

Dynam

ic

Latch

φ1

y_v2x_s23

Page 39: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 39 of 50

4-Phase Skew-Tolerant Domino Timing Types

Static gate output suffix is same as input suffix

Clocked gates and latches set output suffix

Element Type Clock Input Output

Domino φ1 _s1, _v4 (first) / v1 (others) _v1

φ2 _s2, _v1 (first) / _v2 (others) _v2

φ3 _s3, _v2 (first) / _v3 (others) _v3

φ4 _s4, _v3 (first) / _v4 (others) _v4

Transparent Latch φ1 _s1 _s23

φ3 _s3 _s41

N-C2MOS Latch φ2 _v2 _s34

φ4 _v4 _s12

Page 40: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 40 of 50

Example of Legal Connections

φ1

Dynam

ic

Static

Latch

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

NC

2MO

S

Latch

NC

2MO

S

Dynam

ic

Static

Static

Static

Static

Latch

Static

Static

Static

Static

φ1 φ2 φ2 φ3 φ3 φ4 φ4 φ1

φ4φ2

φ1 φ3 φ1

s23

v1

s23

s34

s23 s23 s3 s41 s41 s41 s41 s41 s23

s12

v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1

φ1

φ2

φ3

φ4

Page 41: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 41 of 50

Example of Illegal Connections

φ1

Dynam

ic

Static

Latch

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Dynam

ic

Static

Latch

Dynam

ic

Static

Static

Static

Static

Latch

Static

Static

Static

Static

φ1 φ2 φ2 φ3 φ3 φ4 φ4 φ1

φ1 φ3 φ1

s23

v1

s23 s23 s23 s3 s41 s41 s41 s41 s41 s23

v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1

φ1

φ2

φ3

φ4

Page 42: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 42 of 50

Testability

To test chips, we would like to be able to stop the clock and observe and control one element in each path through each cycle of logic.• Stop with gclk low (φ1 and φ2 low, φ3 and φ4 high)• Add full keeper to first φ3 gate of each cycle to avoid floating high or low• Different from half keeper of ordinary dynamic gates which only float high

φ3

weak

from φ2 dominox_v2

y_v3

x_v2

y_v3

φ2

φ3

output would float

Page 43: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 43 of 50

Transparent Latch Scan

Add slave latch and access transistor to scan φ1 static latches

Scan Procedure:1. Stop gclk low2. Toggle SCA and SCB to march data through the scan chain3. Restart gclk

How do we scan cycles of domino logic with no transparent latches?

φ1

SDI

SDOSCA

SCB

D

Qφ1 slave latch

Page 44: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 44 of 50

Domino Scan

Scanning domino logic is very similar. Which gate should be scanned?

• Next gate should be precharging so glitches do not propagate• When normal operation resumes, scan data should remain until consumed• Thus, choose last φ4 domino gate of each cycle for scan

Scan Procedure1. Stop gclk low2. Stop φ4s low3. Toggle SCA and SCB to march data through the scan chain4. Restart gclk5. Release φ4s once scannable gate begins precharge (race)

SDOSCA

slave latchSDI

SCBφ4

φ4s

pulldownnetwork

Page 45: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 45 of 50

Improved Clock Generator

Putting together skew-tolerant domino, stop clock, and scan:

• SR latch solves the race reenabling φ4s1. Stop gclk low2. Toggle SCA and SCB to march data through the scan chain. The

first pulse of SCA will force φ4s low.3. Restart gclk. The falling edge of φ4 will release φ4s to track φ4.

gclkclken

φ1

φ2

φ3

φ4

φ4s

RS

SCAφ4

Page 46: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 46 of 50

Delayed Clocking / Delayed Reset Domino

Alternative approach uses a single clock phase per gate:• Locally generate clocks with matched delay buffers

Dynam

icS

taticD

ynamic

Dynam

icS

taticD

ynamic

Static

Dynam

icS

tatic

Static

Dynam

icS

tatic

gclk

φ1 φ2 φ3 φ4 φ5 φ6

φ1

φ2

φ3

φ4

φ5

φ6

Dynam

icS

tatic

φ1

Page 47: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 47 of 50

Outline

o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example

Page 48: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 48 of 50

Example

64-bit superscalar ALU self-bypass path

To Data Cache

clk

x4

Static

Static

Dynam

ic

Dynam

ic

Dynam

ic

Latch

Latch

Result MuxBypass Mux

1 mm 2 mmOther

ALU Blocks (150 fF)64-bit Adder

x2

1 mm

clk

clk

clk

clk_b

clk_b

clk_bTraditional Domino

To Data Cache

x4

Static

Static

Dynam

ic

Dynam

ic

Dynam

icResult Mux

Bypass Mux

1 mm 2 mmOther

ALU Blocks (150 fF)64-bit Adder

x2

1 mm

φ1 φ2 φ3 φ3 φ4

Skew-Tolerant Domino

Page 49: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 49 of 50

Simulation Results

Adders were simulated in 0.6µ HP14 technology

• Skew-tolerant domino 25%+ faster

TraditionalLatency

TraditionalCycle Time

Skew-tolerantLatency

Skew-tolerantCycle Time

Tim

e (F

O4

inve

rter d

elay

s)

13.016.6

11.9 11.9

15.0

18.6 No Skew1 FO4 local skew

Page 50: skew-tolerant circuit design slides - Harvey Mudd Collegepages.hmc.edu/harris/class/e158/01/lect21.pdf · 2020-03-02 · Skew-Tolerant Circuit Design David Harris Page 3 of 50 Sequencing

Skew-Tolerant Circuit Design David Harris Page 50 of 50

Skew-Tolerant Domino Conclusions

Domino is very attractive for high speed ICsTraditional domino limited by sequencing overhead

• Latch delay• Clock skew• Inability to balance logic through time borrowing

Skew-tolerant domino eliminates overhead• Use overlapping clocks and remove latches• Overhead at static / domino interface motivates building critical loops

entirely from dominoDomino design issues

• Four-phase skew-tolerant domino is a reasonable choice• Timing types specify legal connections among static and dynamic gates• Latchless pipelines can still be fully scanned for testability• Simple clock generators

Now in widespread use among top design teams• DEC Alpha ALUs, Intel “OTB Domino,” IBM “Delayed-Reset Domino,” Sun

“Delayed Clocking,” etc.