24. Skew-Tolerant Design - University of Texas at...
Transcript of 24. Skew-Tolerant Design - University of Texas at...
![Page 1: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/1.jpg)
D. Z. Pan 24. Skew-Tolerant Design1
24. Skew-Tolerant Design• Previous Unit:
– Power• Dynamic• Static
– Design for low power• This Unit:
– Clock Distribution– Clock Skew– Skew-Tolerant Static Circuits– Skew-Tolerant Domino Circuits
![Page 2: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/2.jpg)
D. Z. Pan 24. Skew-Tolerant Design2
Clocking• Synchronous systems use a clock to keep
operations in sequence– Distinguish this from previous or next– Determine speed at which machine operates
• Clock must be distributed to all the sequencing elements– Flip-flops and latches
• Also distribute clock to other elements – Domino circuits and memories
![Page 3: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/3.jpg)
D. Z. Pan 24. Skew-Tolerant Design3
Clock Distribution• On a small chip, the clock distribution
network is just a wire– And possibly an inverter for clkb
• On practical chips, the RC delay of the wire resistance and gate load is very long– Variations in this delay cause clock to get to
different elements at different times– This is called clock skew
• Most chips use repeaters to buffer the clock and equalize the delay– Reduces but doesn’t eliminate skew
![Page 4: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/4.jpg)
D. Z. Pan 24. Skew-Tolerant Design4
Example• Skew comes from differences in gate and
wire delay– With right buffer sizing, clk1 and clk2 could
ideally arrive at the same time.– But power supply noise changes buffer delays– clk2 and clk3 will always see RC skew
3 mm
1.3 pF
3.1 mmgclk
clk1
0.5 mm
clk2clk3
0.4 pF 0.4 pF
![Page 5: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/5.jpg)
D. Z. Pan 24. Skew-Tolerant Design5
Review: Skew Impact
F1 F2
clk
clk clk
Combinational Logic
Tc
Q1 D2
Q1
D2
tskew
CL
Q1
D2
F1
clk
Q1
F2
clk
D2
clk
tskew
tsetup
tpcq
tpdq
tcd
thold
tccq
( )setup skew
sequencing overhead
hold skew
pd c pcq
cd ccq
t T t t t
t t t t
≤ − + +
≥ − +
144424443
• Ideally full cycle isavailable for work
• Skew adds sequencingoverhead
• Increases hold time too
![Page 6: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/6.jpg)
D. Z. Pan 24. Skew-Tolerant Design6
Cycle Time Trends• Much of CPU performance comes from
higher f– f improving faster than simple process shrinks– Sequencing overhead is bigger part of cycle
0.01
0.1
1
10
100
8038680486PentiumPentium II / III
Spec
Int9
5
1985 1988 1991 1994 1997 2000
1.2 0.8 0.6 0.35 2.0
Process
100
200
500VDD = 5 VDD = 3.3
VDD = 2.5
50Fano
ut-o
f-4 (F
O4)
Inve
rter D
elay
(ps)
0.25
10
100
1000
8038680486PentiumPentium II / III
MH
z
1988 1991 1994 1997 20001985
10
100
1985 1988 1991 1994 1997
8038680486PentiumPentium II / IIIFO
4 in
verte
r del
ays
/ cyc
le
50
20
2000
![Page 7: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/7.jpg)
D. Z. Pan 24. Skew-Tolerant Design7
Solutions• Reduce clock skew
– Careful clock distribution network design– Plenty of metal wiring resources
• Analyze clock skew– Only budget actual, not worst case skews– Local vs. global skew budgets
• Tolerate clock skew– Choose circuit structures insensitive to skew
![Page 8: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/8.jpg)
D. Z. Pan 24. Skew-Tolerant Design8
Clock Distribution Networks• Ad hoc• Grids• H-tree• Hybrid
![Page 9: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/9.jpg)
D. Z. Pan 24. Skew-Tolerant Design9
Clock Grids• Use grid on two or more levels to carry
clock• Make wires wide to reduce RC delay• Ensures low skew between nearby points• But possibly large skew across die
![Page 10: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/10.jpg)
D. Z. Pan 24. Skew-Tolerant Design10
Alpha Clock Grids
PLL
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
![Page 11: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/11.jpg)
D. Z. Pan 24. Skew-Tolerant Design11
H-Trees• Fractal structure
– Gets clock arbitrarily close to any point– Matched delay along all paths
• Delay variations cause skew• A and B might see big skew A B
![Page 12: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/12.jpg)
D. Z. Pan 24. Skew-Tolerant Design12
Itanium 2 H-Tree• Four levels of buffering:
– Primary driver– Repeater– Second-level
clock buffer– Gater
• Route aroundobstructions Primary Buffer
Repeaters
Typical SLCBLocations
![Page 13: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/13.jpg)
D. Z. Pan 24. Skew-Tolerant Design13
Hybrid Networks• Use H-tree to distribute clock to many
points• Tie these points together with a grid
• Ex: IBM Power4, PowerPC– H-tree drives 16-64 sector buffers– Buffers drive total of 1024 points– All points shorted together with grid
![Page 14: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/14.jpg)
D. Z. Pan 24. Skew-Tolerant Design14
Skew Tolerance• Flip-flops are sensitive to skew because
of hard edges– Data launches at latest rising edge of clock– Must setup before earliest next rising edge
of clock– Overhead would shrink if we can soften
edge• Latches tolerate moderate amounts of
skew– Data can arrive anytime latch is transparent
![Page 15: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/15.jpg)
D. Z. Pan 24. Skew-Tolerant Design15
Skew: Latches
Q1
L1
φ1
φ2
L2 L3
φ1 φ1φ2
CombinationalLogic 1
CombinationalLogic 2
Q2 Q3D1 D2 D3( )
( )
sequencing overhead
1 2 hold nonoverlap skew
borrow setup nonoverlap skew
2
,
2
pd c pdq
cd cd ccq
c
t T t
t t t t t t
Tt t t t
≤ −
≥ − − +
≤ − + +
123
2-Phase Latches
( )
( )
setup skew
sequencing overhead
hold skew
borrow setup skew
max ,pd c pdq pcq pw
cd pw ccq
pw
t T t t t t t
t t t t t
t t t t
≤ − + − +
≥ + − +
≤ − +
1444442444443
Pulsed Latches
![Page 16: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/16.jpg)
D. Z. Pan 24. Skew-Tolerant Design16
Dynamic Circuit Review• Static circuits are slow because fat pMOS
load input• Dynamic gates use precharge to remove
pMOS transistors from the inputs– Precharge: φ = 0 output forced high– Evaluate: φ = 1 output may pull low
A B
Yφ
C DY
A B C D
A
B
C
D
![Page 17: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/17.jpg)
D. Z. Pan 24. Skew-Tolerant Design17
Domino Circuits• Dynamic inputs must monotonically rise
during evaluation– Place inverting stage between each dynamic
gate– Dynamic / static pair called domino gate
• Domino gates can be safely cascaded
A
W
φ
B
X
domino AND
dynamicNAND
staticinverter
![Page 18: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/18.jpg)
D. Z. Pan 24. Skew-Tolerant Design18
Domino Timing• Domino gates are 1.5 – 2x faster than
static CMOS– Lower logical effort because of reduced Cin
• Challenge is to keep precharge off critical path
• Look at clocking schemes for precharge and eval– Traditional schemes have severe overhead– Skew-tolerant domino hides this overhead
![Page 19: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/19.jpg)
D. Z. Pan 24. Skew-Tolerant Design19
Traditional Domino Ckts• Hide precharge time by ping-ponging
between half-cycles– One evaluates while other precharges– Latches hold results during precharge
Tc
Sta
tic
Dyn
amic
Latc
h
clk
Sta
tic
Dyn
amic
clkS
tatic
Dyn
amic
clk
Dyn
amic
clk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
clk clk clk clk clk
clk
clk
tpdq tpdq
2pd c pdqt T t= −
![Page 20: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/20.jpg)
D. Z. Pan 24. Skew-Tolerant Design20
Clock Skew• Skew increases sequencing overhead
– Traditional domino has hard edges– Evaluate at latest rising edge– Setup at latch by earliest falling edge
Sta
tic
Dyn
amic
Latc
hclk
Sta
tic
Dyn
amic
clkD
ynam
icclk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
Dyn
amic
clk clk clk clk
clk
clk
tskewtsetup
setup skew2 2pd ct T t t= − −
![Page 21: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/21.jpg)
D. Z. Pan 24. Skew-Tolerant Design21
Time Borrowing• Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic between half cycles
• Traditional domino sequencing overhead is about 25% of cycle time in fast systems!
Sta
tic
Dyn
amic
Latc
hclk
Sta
tic
Dyn
amic
clk clk
Sta
tic
Dyn
amic
Latc
h
Sta
tic
Dyn
amic
clk clk clk
clk
clk
tskewtsetup
![Page 22: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/22.jpg)
D. Z. Pan 24. Skew-Tolerant Design22
Relaxing the Timing• Sequencing overhead caused by hard
edges– Data departs dynamic gate on late rising edge– Must setup at latch on early falling edge
• Latch functions– Prevent glitches on inputs of domino gates– Holds results during precharge
• Is the latch really necessary?– No glitches if inputs come from other domino– Can we hold the results in another way?
![Page 23: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/23.jpg)
D. Z. Pan 24. Skew-Tolerant Design23
Skew-Tolerant Domino• Use overlapping clocks to eliminate latches
at phase boundaries.– Second phase evaluates using results of first
a
Sta
tic
Dyn
amic
φ1
Sta
tic
Dyn
amic
φ2
b c d
a
φ1
φ2
b
c
a
φ1
φ2
b
c
No latch atphase boundary
![Page 24: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/24.jpg)
D. Z. Pan 24. Skew-Tolerant Design24
Full Keeper• After second phase evaluates, first phase
precharges• Input to second phase falls
– Violates monotonicity?• But we no longer need the value• Now the second gate has a floating output
– Need full keeper to hold it either high or lowφ
weak fullkeepertransistors
f
XH
![Page 25: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/25.jpg)
D. Z. Pan 24. Skew-Tolerant Design25
Time Borrowing• Overlap can be used to
– Tolerate clock skew– Permit time borrowing
• No sequencing overheadtskew
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Sta
tic
φ1
φ2
φ1 φ1 φ1 φ1 φ1 φ2 φ2 φ2
Phase 1 Phase 2
toverlap
tborrow
pd ct T=
![Page 26: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/26.jpg)
D. Z. Pan 24. Skew-Tolerant Design26
Multiple Phases• With more clock phases, each phase
overlaps more– Permits more skew tolerance and time
borrowingS
tatic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Sta
tic
Dyn
amic
Dyn
amic
Sta
tic
Sta
tic
φ3
φ4
φ1 φ1 φ2 φ2 φ3 φ3 φ4 φ4
Phase 1 Phase 2 Phase 3 Phase 4
φ1
φ2
![Page 27: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/27.jpg)
D. Z. Pan 24. Skew-Tolerant Design27
Clock Generation
clken
φ1
φ2
φ3
φ4
![Page 28: 24. Skew-Tolerant Design - University of Texas at Austinusers.ece.utexas.edu/~dpan/2009Spring_EE382M/lectures/24-skew.pdf · D. Z. Pan 24. Skew-Tolerant Design 6 Cycle Time Trends](https://reader030.fdocuments.us/reader030/viewer/2022040809/5e4daf67c8ad44173f58b48a/html5/thumbnails/28.jpg)
D. Z. Pan 24. Skew-Tolerant Design28
Summary• Clock skew effectively increases setup and
hold times in systems with hard edges• Managing skew
– Reduce: good clock distribution network– Analyze: local vs. global skew– Tolerate: use systems with soft edges
• Flip-flops and traditional domino are costly• Latches and skew-tolerant domino perform
at full speed even with moderate clock skews.