skew-tolerant circuit design slides - Harvey Mudd...
Transcript of skew-tolerant circuit design slides - Harvey Mudd...
Skew-Tolerant Circuit Design
David [email protected]
December, 2000
Harvey Mudd College
Claremont, CA
Skew-Tolerant Circuit Design David Harris Page 2 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example
Talk based on:• Skew-Tolerant Circuit Design, Morgan Kaufmann Publishers, 2001.
Skew-Tolerant Circuit Design David Harris Page 3 of 50
Sequencing Overhead in Flip-Flop Systems
Ideally, each full clock cycle Tc is available for useful computation
But we must keep token in current pipestage from catching up with token in next• Purpose of sequencing elements such as flip-flops is not to “remember”
data but merely to retard fast paths so they don’t corrupt tokens further along in the pipeline
• This inherently retards the slower paths as well• Introduces “sequencing overhead” inherent to keeping tokens sequenced
For flip-flops: Flop
clk
Static
Static
Static
Static
Static
Static
Flopclk
clk
Cycle 1
∆logic
Tc
∆CQ∆DC
∆logic Tc ∆CQ– ∆DC–=
Skew-Tolerant Circuit Design David Harris Page 4 of 50
Clock Skew & Time Borrowing
Clock skew increases sequencing overhead
• Data may launch on the latest skewed clock edge• Must setup before the earliest skewed clock edge• Clock skew directly cuts into time available for computation
Any unused time at end of cycle is wasted• No time borrowing possible to balance logic between stages
Flopclk
Static
Static
Static
Static
Static
Flopclk
clk
Cycle 1
∆logic
Tc
∆CQ∆DC
tskew
∆logic Tc ∆CQ– ∆DC– tskew–=
Skew-Tolerant Circuit Design David Harris Page 5 of 50
Cycle Time Trends
Much of CPU performance improvement comes from higher frequencies• Clock rates increasing faster than simple process improvement allows• Fixed sequencing overhead becomes greater portion of cycle time
0.01
0.1
1
10
100
8038680486PentiumPentium II / III
Spe
cInt
95
1985 1988 1991 1994 1997 2000
1.2 0.8 0.6 0.35 2.0
Process
100
200
500VDD = 5 VDD = 3.3
VDD = 2.5
50Fano
ut-o
f-4 (F
O4)
Inve
rter D
elay
(ps)
0.25
10
100
1000
8038680486PentiumPentium II / III
MH
z
1988 1991 1994 1997 20001985
10
100
1985 1988 1991 1994 1997
8038680486PentiumPentium II / IIIFO
4 in
verte
r del
ays
/ cyc
le
50
20
2000µ µ µ µ µ µ
Skew-Tolerant Circuit Design David Harris Page 6 of 50
SIA Frequency Roadmap
Semiconductor Industry Association has predicted clock frequencies• Estimate cycle time in FO4 inverter delays• Cycles are getting very short• Minimizing sequencing overhead will be ever more important
Process (mm) Year Frequency (MHz) Cycle time (FO4)0.25 1997 750 (600) 13.3 (16.7)0.18 1999 1250 (733) 11.1 (18.9)0.15 2001 1500 (> 1500) 11.1 (<11.1)0.13 2003 2100 9.20.10 2006 3500 7.10.07 2009 6000 6.00.05 2012 10000 5.0
Skew-Tolerant Circuit Design David Harris Page 7 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example
Skew-Tolerant Circuit Design David Harris Page 8 of 50
Transparent Latches
Transparent latch-based systems use a latch in each half-cycle• Latch is transparent when controlling clock is high, opaque when low• Two latches required so tokens does not skip ahead to next pipeline stage
clk
Static
Static
Static
Static
Static
Static
clk_b
clk
Half-Cycle 1
Tc
Latch
Latch
Half-Cycle 2
clk_b
∆DQ ∆DQ
Transparent
Transparent
Opaque
Opaque
∆logic Tc 2∆DQ–=
Skew-Tolerant Circuit Design David Harris Page 9 of 50
Transparent Latches with Skew
Flip-flops are sensitive to skew because of the hard edges• Data not launched until latest rising edge of clock• Data must setup before earliest falling edge of clock• Overhead would shrink if we could soften an edge
Latch-based systems may still use entire cycle even if there is skew
clk
Static
Static
Static
Static
Static
Static
clk_b
clk
Half-Cycle 1
Tc
Latch
Latch
Half-Cycle 2
clk_b
∆logic Tc 2∆DQ–=
Skew-Tolerant Circuit Design David Harris Page 10 of 50
Pulsed Latches
Can we keep tokens in sequence with only one latch per cycle?• Pulse the latch transparent briefly compared to delay through pipestage
Static
Static
Static
Static
Static
Tc
Latch
∆DQ
Static
Static
Cycle 1
φp
∆logic
tpw ∆DC
∆DC wasted time
Case 1: pulse wider than setup time + clock skewdata arrives while latch is transparent, no time wasted
Case 2: pulse narrower than setup time + clock skewdata may arrive while latch is opaque, leaving wasted time
tpw
tskew
φpφp
φp
tpw
tskew
∆logicTc ∆DQ – if tpw ∆DC tskew+≥Tc tpw ∆DQ– ∆DC– tskew–+ otherwise
=
Skew-Tolerant Circuit Design David Harris Page 11 of 50
Min-Delay
Min-Delay (a.k.a hold time or race problem) is especially insidious• Setup time problems can be fixed by slowing clock• Hold time problems require redesign of chip
Consider min-delay constraint for flip-flops with little intervening logic
Flop 1clk
Flop 2
clk
δlogicδCQ
∆CD
tskew
δlogic
δCQ δlogic+ tskew ∆CD+≥
input arrives at Flop 2
safe for Flop 2 input to change
clk
δlogic ∆CD tskew δCQ–+≥
Skew-Tolerant Circuit Design David Harris Page 12 of 50
Min-Delay (Transparent Latches)
Latches may use nonoverlapping clocks to reduce min-delay problems• But min-delay constraints exist between each latch (twice per cycle)
(in each half-cycle paths)
clk
clk_bδlogicδCQ∆CDtskew
δlogic
Latch 1Latch 2
clk
clk_b
tnonoverlap
input arrives at Latch 2
safe for Latch 2 input to changeδCQ δlogic+ tskew ∆CD tnonoverlap–+≥
δlogic ∆CD tskew δCQ– tnonoverlap–+≥
Skew-Tolerant Circuit Design David Harris Page 13 of 50
Min-Delay (Pulsed Latches)
Pulsed latches with wide pulses are especially susceptible to min-delay• Data launches off earliest rising edge of pulse• Must hold until after latest falling edge of pulse
δlogicδCQ
∆CDtskew
δlogic
input arrives at Latch 2
safe for Latch 2 input to change
Latch 1
φp
Latch 2
φp
φp
tpw
δlogic tpw ∆+ CD tskew δCQ–+≥
Skew-Tolerant Circuit Design David Harris Page 14 of 50
Time Borrowing
Logic between latches may occupy more than exactly 1/2 cycle• Borrow time so long as setup time at next latch is met• Soft edges provide flexibility to partition and balance logic (unlike flops)
clk
clk_b
clk
Half-Cycle 1
Tc
Latch
Latch
Half-Cycle 2
clk_b
CombinationalLogic
tskew
∆DC
Tc/2
tborrow
tborrowTc2------ ∆DC– tskew–=
Skew-Tolerant Circuit Design David Harris Page 15 of 50
Time Borrowing with Pulses
What if the clock duty cycle is less than 1/2 cycle (pulses or nonoverlap)?• Still meet setup time before end of transparency pulse
clk
clk_b
clk
Half-Cycle 1
Tc
Latch
Latch
Half-Cycle 2
clk_b
CombinationalLogic
tskew
∆DC
Tc/2
tborrow tnonoverlap
tpw
tborrow
tpw ∆DC– tskew– pulsed latches
Tc2------ tnonoverlap– ∆DC tskew–– transparent latches
=
Skew-Tolerant Circuit Design David Harris Page 16 of 50
Skew-Tolerant Static Circuits Summary
Flip-Flops:• Popular because CAD tools and junior engineers understand them• Worst sequencing overhead and no time borrowing
Transparent Latches:• Better sequencing overhead• Excellent time borrowing• Understood by most but not all engineers and CAD tools
Pulsed Latches:• Lowest sequencing overhead• Can be treated like flip-flops• Requires thorough min-delay analysis• Minimal time borrowing
Element Sequencing overhead Time borrowing Min-delay δlogic
Flip-flop
Transparent latch
(in each half-cycle)Pulsed latch
∆CQ ∆DC tskew+ + 0 ∆CD tskew δCQ–+
2∆DQ Tc2------ ∆DC tskew tnonoverlap–––
∆CD tskew δCQ– tnonoverlap–+
∆DQ max 0 ∆DC tskew tpw–+,( )+ tpw ∆– DC tskew– tpw ∆+ CD tskew δCQ–+
Skew-Tolerant Circuit Design David Harris Page 17 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example
Skew-Tolerant Circuit Design David Harris Page 18 of 50
Dynamic Circuit Review
Static CMOS circuits are slow because fat PMOS devices load inputDynamic gates use precharged circuitry to remove PMOS from input
• Precharge: φ = 0 output set high through clocked PMOS transistor• Evaluate: φ = 1 output possibly pulls low through logic transistors
A B C D
Dynamic NOR4
A B C D
Static NOR4
Y
X
Y
Evaluate PrechargePrecharge
φ
φ
φ
Skew-Tolerant Circuit Design David Harris Page 19 of 50
Domino Circuits
Dynamic inputs must be monotonically rising during evaluation• Place inverting static gate between each dynamic gate• Dynamic / static pair is called a domino gate
Domino gates can be safely cascaded• Inputs from static CMOS logic must still be latched to avoid glitches
Domino evaluation delays 1.5 - 2x faster than static CMOS• Challenge is how to prevent precharge time from impacting critical path• Traditional domino clocking approaches introduce severe overhead• Skew-tolerant domino circuits can hide this overhead
φ
φ
Dynamic Static
Domino Gate
Skew-Tolerant Circuit Design David Harris Page 20 of 50
Traditional Domino Circuits
How do we prevent precharge time from appearing in the critical path?Hide precharge time by ping-ponging between half-cycles
• One evaluates while other precharges• Latches hold results during precharge
∆DQ
clk
clk_b
clk
Half-Cycle 1
Tc
Half-Cycle 2
clk_b
Latch
Dynam
icS
taticD
ynamic
Static
Dynam
ic
Latch
Dynam
icS
taticD
ynamic
Static
Dynam
ic
clk
clkclk
clk_b
clk_bclk_b
Evaluate
Precharge Evaluate
Precharge
∆logic Tc 2∆DQ–=
Skew-Tolerant Circuit Design David Harris Page 21 of 50
Clock Skew
Clock skew increases sequencing overhead
• Evaluation begins at latest rising edge• Latch input setup before earliest falling edge• Budget clock skew twice each cycle
∆DC tskew+
Latch
Dynam
icS
taticD
ynamic
Eval begins
Setup before here
clk1
clk1
Half-Cycle 1
Tc
clk1
clk2
clk2
∆logic Tc 2∆DC– 2tskew–=
Skew-Tolerant Circuit Design David Harris Page 22 of 50
Balancing Logic
Logic may not exactly fit half-cycle
• No flexibility to borrow time to balance logic between half-cyclesLatch
Dynam
icS
taticD
ynamic
Latch
Dynam
icS
taticD
ynamic
clk
clk_b
clk
Half-Cycle 1
Tc
Half-Cycle 2
clk_b
clk
clk
clk_b
clk_b
Unusable time
∆logic Tc 2∆DC– 2tskew– timbalanced–=
Skew-Tolerant Circuit Design David Harris Page 23 of 50
How bad is it?
Consider 500 MHz system similar to Alpha 21164• Tc = 2ns• tskew = 200 ps (reported budget)
• = 50 ps (optimistic estimate)
25% of cycle consumed by skew & setup!
∆DC
2 ns
Half-Cycle 1 Half-Cycle 2
Setup Skew Setup SkewLogic Logic
Skew-Tolerant Circuit Design David Harris Page 24 of 50
Relaxing the Timing Constraints
Sequencing overhead caused by hard edges (just as for flip-flops)• Data not launched until latest rising edge of clock• Data must setup before earliest falling edge of clock• Overhead would shrink if we could soften an edge• Let’s look at the role of the latch on the falling edge
Latch function(1) Prevent glitches on inputs of domino gates(2) Hold results during precharge
Latches would be unnecessary if we could do these tasks in another way• Glitches will not occur if inputs come from domino logic• Can we build domino circuits which consume the results before previous
half-cycle precharges?
Skew-Tolerant Circuit Design David Harris Page 25 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example
Skew-Tolerant Circuit Design David Harris Page 26 of 50
Skew-Tolerant Domino Circuits
Overlap clocks so gate Y evaluates before X precharges
• We can remove latches within domino pipeline• Eliminate all sequencing overhead within the domino pipeline• Still budget skew and setup at static / domino interface
∆DC tskew+
φ1 φ2
φ1
Phase 1
Tc
Phase 2
φ2
Dynam
icS
taticD
ynamic
Dynam
icS
taticD
ynamic
Static
Dynam
icS
tatic
Static
Dynam
icS
taticφ1 φ1 φ2 φ2
overlap
X Y
LatchS
taticS
tatic
Skew-Tolerant Circuit Design David Harris Page 27 of 50
Skew-Tolerant Domino Waveforms
In general, we can use N phases
• Each phase delayed Tc/N from the previous• What is the best duty cycle (te and tp)?
φ1 φ3
φ1
Phase 1 Phase 2
φ3
Dynam
icS
taticD
ynamic
Dynam
icS
taticD
ynamic
Static
Dynam
icS
tatic
Static
Dynam
icS
tatic
φ1 φ2 φ3 φ4
toverlap
φ2
φ4
te tp
Phase 3 Phase 4
Tc
Skew-Tolerant Circuit Design David Harris Page 28 of 50
Precharge Time (tp)
Precharge time set by precharge of two gates in a phase with skewed clocks
• First gate begins precharging late• Requires tprech to fully precharge• Precharge must complete before second gate reenters evaluation
φ1a
φ1a
Dynam
icS
tatic
tp
φ1b
Dynam
icS
taticφ1b
tprech tskew
tp tprech tskew+=
Skew-Tolerant Circuit Design David Harris Page 29 of 50
Evaluation Time (te)
Evaluation time set by next phase consuming result before first phase precharges
• First phase begins evaluation early• Must not precharge until some hold time thold after second phase evaluates
φ2a
φ1b
Dynam
icS
tatic
thold
Dynam
icS
tatic
φ1b
φ2a
Phase 1 Phase 2
te
tskewTc / N
teTcN------ thold+ tskew+=
Skew-Tolerant Circuit Design David Harris Page 30 of 50
Skew Tolerance
We’ve found the best duty cycle• Precharge:
• Evaluation:
Solve for the maximum tolerable skew
Observations• Skew tolerance increases with N• Precharge and hold time cut into skew tolerance• In the limit of large N and long cycles, skew tolerance ~ 1/2 cycle
tp tprech tskew+=
teTcN------ thold+ tskew+=
tskew-max
N 1–N
-------------Tc tprech– thold–
2---------------------------------------------------------=
Skew-Tolerant Circuit Design David Harris Page 31 of 50
Local Skew
Local clock domains• Most circuits experience less skew in a small area than across chip• Reduce skew within a phase of logic, thereby tolerating more skew globally
•
•
•
• This is the nominal overlap between phases
tp tprech tskewlocal+=
teTcN------ thold+ tskew
global+=
tskew-maxglobal N 1–
N-------------Tc tprech– thold– tskew
local–=
Skew-Tolerant Circuit Design David Harris Page 32 of 50
Time Borrowing
Time borrowing• Excess overlap allows time borrowing into next phase (e.g. gate X)
•
φ1
φ1
Phase 1 Phase 2
Dynam
icS
taticD
ynamic
Static
Dynam
icS
tatic
φ1 φ1
φ2
Tc
tborrow
Dynam
icS
tatic
φ2
tskew-maxglobal
tskewglobal
X
tborrow tskew-maxglobal tskew
global– N 1–N
-------------Tc tprech– thold– tskewlocal– tskew
global–= =
Skew-Tolerant Circuit Design David Harris Page 33 of 50
Example and Summary
Again consider a 500 MHz processor:• Tc = 2ns tskewG = 200 ps / tskewL = 50 ps thold = 0 (conservative)
Domino Style Sequencing overhead Skew Tolerance Time borrowingTraditional n/a
Skew-Tolerant
2 3 4 6 8
N (# of clock phases)
t bor
row
(ps)
0
250
500
750
1000
Available Time Borrowing vs. # Clock Phases(assuming tprech = 500 ps)
250583
750 9171000
2∆DC 2tskew– timbalanced– 0
0 N 1–N
-------------Tc tprech– thold– tskewlocal– N 1–
N-------------Tc tprech– thold– tskew
local– tskewglobal–
Skew-Tolerant Circuit Design David Harris Page 34 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Results
Skew-Tolerant Circuit Design David Harris Page 35 of 50
Clock Generation
Two-Phase Clock Generator
Four-Phase Clock Generator
• Inverters used to delay clock by quarter of nominal cycle time
gclk
φ1
φ2
Low-skewcomplementgenerator
Clock choppers
gclk
φ1
φ2
Low-skewcomplementgenerator
φ3
φ4
Optionalclock choppers1/4 cycle delay
Skew-Tolerant Circuit Design David Harris Page 36 of 50
Generator-Induced Clock Skew
Variation in clock generator delays appears as clock skew
Variation in delay 20-30% relative to other clocks• 4-Phase is still a large net benefit
2-Phase 4-Phase
Ove
rlap
(ps)
250
500
750
Chopper Skew
Delay Skew
tborrow
0
Complement Skew
Skew-Tolerant Circuit Design David Harris Page 37 of 50
Interface with Static Logic
Real systems mix both static and domino logic• Transparent latches use φ1 and φ3• Pulsed latches use φp (see book)• Four-phase skew-tolerant domino uses φ1, φ2, φ3, and φ4• Which connections are legal?
Domino outputs must be latched before driving static logic• Use N-C2MOS Latch for speed and to avoid clock skew problems
Q
D
φ
Dynamic gate N-C2MOS latch
From domino logic
To static logic
Skew-Tolerant Circuit Design David Harris Page 38 of 50
Timing Types
Timing types define legal interfaces between gates• Signal suffixes define when data is legal to use, allow static verification• _s indicates that a signal is stable throughout that phase• _v indicates that a signal becomes valid before the end of that phase
φ1
φ2
φ3
x_s23
φ4
y_v2
φ2
Dynam
ic
Latch
φ1
y_v2x_s23
Skew-Tolerant Circuit Design David Harris Page 39 of 50
4-Phase Skew-Tolerant Domino Timing Types
Static gate output suffix is same as input suffix
Clocked gates and latches set output suffix
Element Type Clock Input Output
Domino φ1 _s1, _v4 (first) / v1 (others) _v1
φ2 _s2, _v1 (first) / _v2 (others) _v2
φ3 _s3, _v2 (first) / _v3 (others) _v3
φ4 _s4, _v3 (first) / _v4 (others) _v4
Transparent Latch φ1 _s1 _s23
φ3 _s3 _s41
N-C2MOS Latch φ2 _v2 _s34
φ4 _v4 _s12
Skew-Tolerant Circuit Design David Harris Page 40 of 50
Example of Legal Connections
φ1
Dynam
ic
Static
Latch
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
NC
2MO
S
Latch
NC
2MO
S
Dynam
ic
Static
Static
Static
Static
Latch
Static
Static
Static
Static
φ1 φ2 φ2 φ3 φ3 φ4 φ4 φ1
φ4φ2
φ1 φ3 φ1
s23
v1
s23
s34
s23 s23 s3 s41 s41 s41 s41 s41 s23
s12
v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1
φ1
φ2
φ3
φ4
Skew-Tolerant Circuit Design David Harris Page 41 of 50
Example of Illegal Connections
φ1
Dynam
ic
Static
Latch
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Dynam
ic
Static
Latch
Dynam
ic
Static
Static
Static
Static
Latch
Static
Static
Static
Static
φ1 φ2 φ2 φ3 φ3 φ4 φ4 φ1
φ1 φ3 φ1
s23
v1
s23 s23 s23 s3 s41 s41 s41 s41 s41 s23
v1 v1 v1 v2 v2 v2 v2 v3 v3 v3 v3 v4 v4 v4 v4 v1
φ1
φ2
φ3
φ4
Skew-Tolerant Circuit Design David Harris Page 42 of 50
Testability
To test chips, we would like to be able to stop the clock and observe and control one element in each path through each cycle of logic.• Stop with gclk low (φ1 and φ2 low, φ3 and φ4 high)• Add full keeper to first φ3 gate of each cycle to avoid floating high or low• Different from half keeper of ordinary dynamic gates which only float high
φ3
weak
from φ2 dominox_v2
y_v3
x_v2
y_v3
φ2
φ3
output would float
Skew-Tolerant Circuit Design David Harris Page 43 of 50
Transparent Latch Scan
Add slave latch and access transistor to scan φ1 static latches
Scan Procedure:1. Stop gclk low2. Toggle SCA and SCB to march data through the scan chain3. Restart gclk
How do we scan cycles of domino logic with no transparent latches?
φ1
SDI
SDOSCA
SCB
D
Qφ1 slave latch
Skew-Tolerant Circuit Design David Harris Page 44 of 50
Domino Scan
Scanning domino logic is very similar. Which gate should be scanned?
• Next gate should be precharging so glitches do not propagate• When normal operation resumes, scan data should remain until consumed• Thus, choose last φ4 domino gate of each cycle for scan
Scan Procedure1. Stop gclk low2. Stop φ4s low3. Toggle SCA and SCB to march data through the scan chain4. Restart gclk5. Release φ4s once scannable gate begins precharge (race)
SDOSCA
slave latchSDI
SCBφ4
φ4s
pulldownnetwork
Skew-Tolerant Circuit Design David Harris Page 45 of 50
Improved Clock Generator
Putting together skew-tolerant domino, stop clock, and scan:
• SR latch solves the race reenabling φ4s1. Stop gclk low2. Toggle SCA and SCB to march data through the scan chain. The
first pulse of SCA will force φ4s low.3. Restart gclk. The falling edge of φ4 will release φ4s to track φ4.
gclkclken
φ1
φ2
φ3
φ4
φ4s
RS
SCAφ4
Skew-Tolerant Circuit Design David Harris Page 46 of 50
Delayed Clocking / Delayed Reset Domino
Alternative approach uses a single clock phase per gate:• Locally generate clocks with matched delay buffers
Dynam
icS
taticD
ynamic
Dynam
icS
taticD
ynamic
Static
Dynam
icS
tatic
Static
Dynam
icS
tatic
gclk
φ1 φ2 φ3 φ4 φ5 φ6
φ1
φ2
φ3
φ4
φ5
φ6
Dynam
icS
tatic
φ1
Skew-Tolerant Circuit Design David Harris Page 47 of 50
Outline
o Introductiono Skew-Tolerant Static Circuitso Traditional Domino Circuitso Skew-Tolerant Domino Circuitso Design Methodologyo Example
Skew-Tolerant Circuit Design David Harris Page 48 of 50
Example
64-bit superscalar ALU self-bypass path
To Data Cache
clk
x4
Static
Static
Dynam
ic
Dynam
ic
Dynam
ic
Latch
Latch
Result MuxBypass Mux
1 mm 2 mmOther
ALU Blocks (150 fF)64-bit Adder
x2
1 mm
clk
clk
clk
clk_b
clk_b
clk_bTraditional Domino
To Data Cache
x4
Static
Static
Dynam
ic
Dynam
ic
Dynam
icResult Mux
Bypass Mux
1 mm 2 mmOther
ALU Blocks (150 fF)64-bit Adder
x2
1 mm
φ1 φ2 φ3 φ3 φ4
Skew-Tolerant Domino
Skew-Tolerant Circuit Design David Harris Page 49 of 50
Simulation Results
Adders were simulated in 0.6µ HP14 technology
• Skew-tolerant domino 25%+ faster
TraditionalLatency
TraditionalCycle Time
Skew-tolerantLatency
Skew-tolerantCycle Time
Tim
e (F
O4
inve
rter d
elay
s)
13.016.6
11.9 11.9
15.0
18.6 No Skew1 FO4 local skew
Skew-Tolerant Circuit Design David Harris Page 50 of 50
Skew-Tolerant Domino Conclusions
Domino is very attractive for high speed ICsTraditional domino limited by sequencing overhead
• Latch delay• Clock skew• Inability to balance logic through time borrowing
Skew-tolerant domino eliminates overhead• Use overlapping clocks and remove latches• Overhead at static / domino interface motivates building critical loops
entirely from dominoDomino design issues
• Four-phase skew-tolerant domino is a reasonable choice• Timing types specify legal connections among static and dynamic gates• Latchless pipelines can still be fully scanned for testability• Simple clock generators
Now in widespread use among top design teams• DEC Alpha ALUs, Intel “OTB Domino,” IBM “Delayed-Reset Domino,” Sun
“Delayed Clocking,” etc.