High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1,...

18
High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1 , James F. Buckwalter 1 , and Chung-Kuan Cheng 2 1 Dept. of ECE, 2 Dept. of CSE, UC San Diego, La Jolla, CA 19 th Conference on Electrical Performance of Electronic Packaging and Systems Oct 25, 2010 Austin, USA

Transcript of High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1,...

Page 1: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer

Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2

1Dept. of ECE, 2Dept. of CSE, UC San Diego, La Jolla, CA

19th Conference on Electrical Performance of Electronic Packaging and SystemsOct 25, 2010 Austin, USA

Page 2: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

2

Outline Introduction Equalized On-Chip Global Link

Overall structure Basic working principle

Driver Design for On-Chip Transmission-Line Guideline for tapered CML driver Driver design example

Continuous-Time Linear Equalizer (CTLE) Design CTLE modeling CTLE design example

Driver-Receiver Co-Design for Low Energy per Bit Methodology Overall link design example

Conclusion

Page 3: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Research Motivation

Global interconnect planning becomes a challenge in ultra-deep sub-macron (UDSM) process Performance gap between global wire and logic gates Conventional buffer insertion brings in larger extra power

overhead Uninterrupted wire configurations are used to tackle the

on-chip global communication issues On-chip T-lines to reduce interconnect power Equalization to improve the bandwidth State-of-the-art[Kim2009]

2Gb/s/um, < 1pJ/b, signaling over 10mm global wire in 90nm

3

Page 4: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Our Contributions

Contributions Build up a novel equalized on-chip T-line structure for global

communication Tapered CML driver + CTLE receiver

Accurate small-signal modeling on CTLE receiver to improve the optimization quality

A design methodology to achieve driver-wire-receiver co-optimization to reduce the total energy per bit

Results of our design 20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy per bit in 45nm

4

Page 5: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Equalized On-Chip Global Link

5

Overall structure Tapered current-mode logic (CML) drivers Terminated differential on-chip T-line Continuous-time linear equalizer (CTLE) receiver Sense-amplifier based latch

Page 6: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Basic Working Principle Tapered CML Driver

Provide low-swing differential signals to driver T-line Tapered factor u, number of stages N, fan-out X, final stage current ISS,

driver resistance RS

T-line Differential wire w/ P/G shielding Geometries (width, pitch) and termination resistance RT

CTLE Receiver Recover signal and improve eye-quality Load resistance RL, source degeneration resistance RD and

capacitance CD, over-drive voltage Vod.

Sense-amplifier based latch Synchronize and convert signal back to digital level

6

Page 7: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Tapered CML Driver Design Output swing constraint

Design guideline [Tsuchiya2006, Heydari2004]

Begin from the final stage For given VSW, output resistance RS optimized

with RT to increase eye-opening Transistor size

Tapered factor u = 2.7 for delay reduction Number of stages

Each previous stage is designed backward by scaling with the factor u

7

Need to design:1) Output resistance RS

2) Tail current ISS

3) Size of transistors W

Page 8: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

CML Driver Study w/ Loaded T-line

8

Assume 45nm 1P11M CMOST-line built on M9 with M1 as referenceT = 1.2um, H = 3.5um (fixed)Optimize W and S for eye-opening

Change of the eye-opening with width for fixed 2um pitch

Change of the eye-opening with pitch for equal width/spacing

Page 9: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

CML Driver Design Example Experimental observations

Optimal eye happens when width=spacing Eye-opening improves with larger pitch

Design methodology Choose the minimum pitch that satisfied the wire-end eye-

opening requirement Design example

9

Page 10: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Accurate CTLE Modeling

10

voutvinG

S

D

RD CD

rds CLRLgmvgs

Small Signal Circuit to derive H(s):

2

1

2

1( )

1

( 1)

( ) ( 1)

( 1)

( 1)

1

1/

/

D DDC

m ds LDC

m ds D ds L

ds L L D D m ds D L L L D D

m ds D ds L

ds D D L L

m ds D ds L

zD D

p

p

sR CH s Gain

as bsg r R

Gaing r R r R

r R C R C g r R R C R R Ca

g r R r R

r R C R Cb

g r R r R

R C

a

a b

1.2

( ), ( ), ( )

21, , ,

1.5fF/um , 1.5fF/um

,

od od od

Bias dd ic Biasm ds

od Bias L od

para paraS D

ex para ex paraD D S L L D

V V K K V

I V V IWg r Ibias

V I R L KV

C W C W

C C C C C C

Design Variables: RL, RD, CD, Vod(Size)

[Hanumolu2005]

Page 11: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

CTLE Modeling Validation

Test case:10mm, 16mV-eye@wire-end Blue lines: simple modeling, not consider rds and parasitics

Red line: only consider rds

Black line: the proposed accurate model11

<10% correlation error>20% eye-opening increase

Page 12: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

CTLE Design Example Observations of CTLE study

Eye-opening improves with relaxed power constraints but tends to be saturated

Design example Based on the pre-optimized CML driver + T-line design Eye-opening improved by 4X after CTLE

12

Page 13: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Driver-Receiver Co-Design Methodology

Optimize driver-wire-receiver together by setting Veye/Power as the cost function

Choose pre-designed CML/T-line/CTLE as initial solution Optimization Flow

Driver-to-receiver step-response generation based on SPICE simulation and CTLE modeling

Eye-opening estimation based on step-response SQP-based non-linear optimization Variables: [ISS,RT,RL,RD,CD,Vod]

Performance Comparison Option A:Driver/Receiver independent design Option B:Low-power driver/receiver co-design

13

Page 14: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Low Energy-per-Bit Optimization Flow

14

Pre-designed CML driver Pre-designed CTLE receiver

Driver-Receiver Co-Design Initial Solution

Co-Design Cost Function Estimation

SPICE generated T-line step response

Step-Response Based Eye Estimation

Receiver Step-Response using CTLE modeling

Internal SQP (Sequential Quadratic Optimization) routine to generate best solution

Best set of design variables in terms of overall energy-per-bit

Change variables[ISS,RT,RL,RD,CD,Vod]

Cost-FunctionVeye/Power

Page 15: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Simulated Eye Diagrams

15

Methodology A: driver/receiver separate design

Methodology B: driver/receiver co-design for low-power

Page 16: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Summary of Performance ComparisonMethodology Adriver/receiver separate design

Methodology Bdriver/receiver co-design for low-power

RS/ohm 47 148

RT/ohm 94 1100

RL/ohm 440 890

RD/ohm 110 1430

CD/fF 680 150

Vod/mV 60 58

Eye-Opening@CTLE/mV 91 113

Power Consumption/mW 8.1 3.8

16

Note: driver/receiver co-design methodology uses much larger driver/termination resistance to reduce power, but will close the eye-opening at the driver output and wire-end. Final eye is recovered by fully utilizing CTLE.

Page 17: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Conclusion

We propose a novel equalized on-chip global link using CML driver and CTLE receiver

Accurate modeling for CTLE is provided to achieve <10% correlation error and will improve eye-opening optimization quality

Our design achieves 20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line 11ps/mm latency and 0.2pJ/b energy

17

Page 18: High-Speed and Low-Power On-Chip Global Link Using Continuous-Time Linear Equalizer Yulei Zhang 1, James F. Buckwalter 1, and Chung-Kuan Cheng 2 1 Dept.

Thank You!

Q & A

18