A 4x, 3-level blind ADC-based CDR in 65nm CMOS...A 4x, 3-level blind ADC-based CDR in 65nm CMOS Neno...

A 4x, 3-level blind ADC-based CDR in 65nm CMOS

by

Neno Kovacevic

A thesis submitted in conformity with the requirementsfor the degree of Masters of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2014 by Neno Kovacevic

A 4x, 3-level blind ADC-based CDR in 65nm CMOS

Neno Kovacevic

Master of Applied Science, 2014

Graduate Department of Electrical and Computer Engineering

University of Toronto

Abstract

This thesis presents the design, implementation, and measurement of a 4 times over-

sampled, 3-level blind ADC-based CDR. The goal of this work was to provide a blind

ADC-based design that reduced the overall power consumption. This was achieved by

reducing the ADC resolution to 3 levels, while increasing the oversampling ratio to 4.

Also, by non-uniformly distributing the threshold levels of the ADC, the design incor-

porated a speculative DFE. The speculative DFE is implemented with comparator pairs

whose symmetric offsets correspond to the first post cursor ISI. The samples from the

comparator pairs are then passed to the digital CDR which uses them to recover the

clock phase and data. The digital power was measured to be 18mW, while an estimate

for the ADC and CLK Divider measured power was found to be 35.5mW. The estimate

of the total measured power for the chip was 53.5mW.

ii

Acknowledgements

Firstly, I want to thank Prof. Sheikholeslami for his guidance and for providing an en-

riching academic experience.

I also thank my committee members Prof. Chan Carusone, Prof. Liscidini, and Prof.

Prodic for giving me further insight to this work.

I would like to thank Joshua Liang for his invaluable help and experience during mea-

surement. Josh this work would definitely not have been possible without your help.

I am also grateful to Sadegh Jalali for his comparator design contribution. More impor-

tantly, thank you Sadegh for the continuous help during the analog design process, and

your patience to answer my questions when times were stressful.

My learning experience would have taken a serious dent were I not in the presence of

three former students who were not only extremely willing to help, but also very clever.

Thank you Safeen, Cliff, and Ravi.

I am also very glad to have had two great colleagues with whom I shared this Masters

experience. It was a long journey, from the bleak early days of relentless VLSI work to

the very end, when the light at the end of the tunnel began to shine. Thank you Luke

and Jeff for being great friends.

iii

Shayan and Mario, thank you for just being there to joke around with. Much thanks also

goes to the rest of the BA5000 bunch for being friendly colleagues.

Most importantly, I am thankful to my family for their love and support through a

stressful but extremely life-enriching experience.

iv

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 4

2.1 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 ADC-based CDRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Proposed Design 17

3.1 Proposed Full Rate Architecture . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Front-End Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Front-End Speculative DFE . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Comparator Architecture . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Phase Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Data Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Complete System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 28

3.5.1 Cycle Slip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Simulations and Measurement 34

4.1 Behavioural Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

v

4.1.1 DFE Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1.2 Effect of Comparator Offset on Jitter Tolerance . . . . . . . . . . 41

4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.1 Receiver Layout and equipment setup . . . . . . . . . . . . . . . . 43

4.2.2 Offset Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2.3 Jitter Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2.4 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Conclusion 54

5.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

References 56

vi

List of Tables

4.1 Channel models used for simulation. . . . . . . . . . . . . . . . . . . . . 35

4.2 Pin description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Channels used for measurements. . . . . . . . . . . . . . . . . . . . . . . 48

4.4 Channels used for measurements. . . . . . . . . . . . . . . . . . . . . . . 52

vii

List of Figures

2.1 Block diagram of the transceiver system. . . . . . . . . . . . . . . . . . . 4

2.2 Typical channel frequency response. . . . . . . . . . . . . . . . . . . . . . 5

2.3 Typical channel frequency response. . . . . . . . . . . . . . . . . . . . . . 6

2.4 Inter-symbol interference (ISI). . . . . . . . . . . . . . . . . . . . . . . . 6

2.5 Ideal linear equalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.6 Ideal linear equalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7 Decision Feedback Equalizer architecture for 1-tap. . . . . . . . . . . . . 9

2.8 Speculative DFE architecture. . . . . . . . . . . . . . . . . . . . . . . . . 10

2.9 Sampling of input data with recovered clock. . . . . . . . . . . . . . . . . 11

2.10 Architecture of a clock and data recovery (CDR) circuit. . . . . . . . . . 12

2.11 Architecture of a phase tracking ADC-based CDR. . . . . . . . . . . . . 13

2.12 Interpolation of phase and data. . . . . . . . . . . . . . . . . . . . . . . . 14

2.13 Architecture of a digital DFE. . . . . . . . . . . . . . . . . . . . . . . . . 14

2.14 Power Comparison of previous ADC-based CDRs of various ADC resolu-

tion and oversampling rate [4] [5] [6] [8]. . . . . . . . . . . . . . . . . . . 15

2.15 Number of comparators as a function of the oversampling rate and the

ADC resolution. The 2x-5bit designs [4] [5] [6] used 62 comparators per

UI, and the 3x-3bit [8] used 21 comparators per UI (Assuming a flash ADC

is used). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

viii

3.1 A Full rate system block diagram of the proposed speculative ADC-based

CDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB

Channel at Nyquist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Full-rate Digital CDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Pulse Response of a 7dB Channel at Nyquist. . . . . . . . . . . . . . . . 21

3.5 MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB

Channel at Nyquist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.6 The comparator has offset cancellation in order to have a tuneable offset. 24

3.7 The UI window of 5 consecutive samples. . . . . . . . . . . . . . . . . . . 25

3.8 Phase estimation is done by determining the transition point in the UI. . 25

3.9 The phase estimate from the POS and NEG sets is averaged to obtain

the true zero-crossing. This is the operation of the systems full-rate phase

detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.10 The phase estimate from the phase detector, φx, is subtracted by the loop

filter average phase, φAV G, to produce φERR, which is fed back to the loop

filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.11 Behavioural simulation showing the comparison of a zero-crossing PD

scheme and the proposed one. . . . . . . . . . . . . . . . . . . . . . . . . 27

3.12 The selection of the correct sample is based on the position of φPICK . . . 28

3.13 The complete system architecture. . . . . . . . . . . . . . . . . . . . . . . 29

3.14 The complete digital CDR architecture. . . . . . . . . . . . . . . . . . . . 30

3.15 The selection of the correct set of samples is done with a chain of MUXes. 31

3.16 Cycle Slip; data rate is faster than blind sampling clock. . . . . . . . . . 32

3.17 Cycle Slip; data rate is slower than blind sampling clock. . . . . . . . . . 32

4.1 Simulink Model of the entire system. . . . . . . . . . . . . . . . . . . . . 35

4.2 Simulated eye diagrams for Channel 1, before and after DFE. . . . . . . 37

ix

4.3 Simulated eye diagrams for Channel 2, before and after DFE. . . . . . . 38

4.4 Jitter tolerance curves for Channels 1 vs. α. . . . . . . . . . . . . . . . . 39

4.5 Jitter tolerance curves for Channels 2 vs. α. . . . . . . . . . . . . . . . . 39

4.6 Simlated eye diagrams of channel 3, before CTLE, after CTLE, and after

DFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.7 Jitter tolerance curves for Channel 3 vs. α. . . . . . . . . . . . . . . . . . 41

4.8 Jitter tolerance curves for Channel 1 vs. standard deviation of comparator

offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.9 Jitter tolerance curves for Channel 2 vs. standard deviation of comparator

offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.10 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.11 Chip die photo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.12 Detailed equipment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.13 Measured eye diagrams of the channels. . . . . . . . . . . . . . . . . . . . 49

4.14 Measured jitter tolerance vs. α for Channel B. . . . . . . . . . . . . . . . 50

4.15 Measured jitter tolerance vs. channel length. . . . . . . . . . . . . . . . . 51

4.16 Comparison of power between this work and blind ADC-based CDR with

3x oversampling [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

x

List of Acronyms

ADC Analog to Digital Converter

BER Bit-Error Rate

CDR Clock and Data Recovery

CTLE Continuous Time Linear Equalizer

DFE Decision Feedback Equalizer

ENOB Equivalent Number of Bits

FFE Feed Forward Equalizer

FIR Finite Input Response

FSM Finite State Machine

Gb/s Gigabits per second

GSa/s Giga-samples per second

IIR Infinite Impulse Response

ISI Inter Symbol Interference

MUX Multiplexor

NRZ Non-return to Zero

xi

PCB Printed Circuit Board

PCIe Peripheral Component Interconnect Express

PD Phase Detector

PI Phase Interpolator

PLL Phase-Locked Loop

PRBS Pseudo-Random Binary Sequence

SATA Serial Advanced Technology Attachment

UI Unit Interval

VCO Voltage Controlled Oscillator

VGA Variable Gain Amplifier

xii

1 Introduction

As silicon technology continues to scale to smaller geometries, the density of integrated

circuits continues to increase. With increased complexity and computational power,

faster data rates are required for chip to chip communication. New generations of media

standards such as PCI Express (PCIe) and SATA continue to push the limits on data

rates. This demand in industry leads to the need for innovation in the design of our

circuits. [1] [2] [3].

1.1. Motivation

In chip-to-chip communication, data is sent over a channel that consists of a few inches of

printed circuit board (PCB) trace. In an ideal communication system, the transmit data

would be received by the receiver chip without any attenuation. However, in practical

systems the channel exhibits a finite bandwidth, attenuating higher-frequency content of

the transmit signal. This spreads the bits, sent from the transmitter, into adjacent unit

intervals (UI), making the task of clock and data recover (CDR) difficult.

Conventional phase-tracking CDR with analog equalizers are used to recover these bits.

Alternatively, blind ADC-based CDRs offer appealing benefits such as the elimination of

analog feedback. They also simplify the design process and open the window for more

complex detection schemes via the ease of implementation with fully digital circuits. One

1

Chapter 1. Introduction 2

of the major flaws with these designs is that they tend to be power hungry. Previous

designs have used 5-bit ADCs with an oversampling ratio of 2 (2x) [4] [5] [6] [7], and 3-bit

ADCs with an oversampling ratio of 3 (3x) [8]. One of the driving forces for the high

resolution was to enable equalization entirely in the digital domain. However, this has

come with the price of high power consumption. This leads to the question of whether

it is possible to reduce power without sacrificing jitter tolerance.

1.2. Thesis Objectives

This thesis presents an alternative design technique for equalization in ADC-based CDRs

in an attempt to reduce power without sacrificing performance. The main objectives of

this thesis are:

• To provide insight to the background of high-speed signalling, and in particular to

the ADC-based CDRs.

• To propose an alternative architecture for an ADC-based CDR in order to reduce

power consumption.

• Provide implementation, simulation, and measured results of a working chip and

compare it against other ADC-based designs.

1.3. Thesis Outline

This thesis is organized as follows:

• Chapter 2 provides a background on equalization, clock and data recovery, and

ADC-based CDRs.

• Chapter 3 presents the proposed design. The block level diagrams as well as circuit

diagrams are illustrated.

Chapter 1. Introduction 3

• Chapter 4 first presents and discusses the simulated results, followed by the mea-

surement process and results, as well as a comparison with previous works.

• Chapter 5 summarizes the thesis and provides directions for future research.

2 Background

In this chapter, we briefly review some of the key concepts in chip-to-chip communica-

tions, including the limitations of the channel, conventional techniques in equalization

and clock and data recovery, and techniques pertaining to ADC-based equalization and

clock and data recover.

As shown in Figure 2.1, the transceiver model consists of two parts, each containing a

digital core designed for a specific application. When communicating, one transceiver

may act as a transmitter and one as a receiver.

Chip 2Chip 1

DigitalCore

Transmitter(TX)

DigitalCore

Receiver(RX)

Channel

Figure 2.1: Block diagram of the transceiver system.

The transmitter needs to send information to the receiver over a medium, known as the

channel. This channel can take the form of optical fibres or Ethernet cables in large

servers, or just printed circuit board (PCB) traces in chip-to-chip signalling. However,

these channels are limited in bandwidth; being unable to pass signals above a certain fre-

quency without attenuating them. A typical response for the insertion loss of the channel

4

Chapter 2. Background 5

(the ratio of the power received at the receiver to the power sent at the transmitter) is

shown in Figure 2.2. As can be seen, the channel attenuation increases with frequency,

exhibiting a low pass characteristic.

107

108

109

1010

−35

−30

−25

−20

−15

−10

−5

0

Frequency (Hz)

Inse

rtio

n Lo

ss F

requ

ency

Res

pons

e (d

B)

Figure 2.2: Typical channel frequency response.

The main challenge in designing transceivers is to increase the data rate (fb) while being

able to tolerate the signal loss. In the transceiver model, the transmitter sends almost

ideal data (data with sharp edges) at baud rate, but when it arrives at the receiver, the

loss in the frequency response translates to a spread of the signal in the time domain.

This is demonstrated in Figure 2.3. The transmit pulse of one unit interval (UI=1/fb),

takes some time to arrive at the receiver, and it spreads to several UIs.

The main issue with the non-ideal received pulse is that the pulse response corresponding

to one UI is spreading over the adjacent UIs. This phenomenon is known as inter-symbol

interference (ISI). For the receiver to operate properly, it must detect the correct bits at

an acceptable bit error rate (BER). Usually, the standard for the receiver is to make less

than one error for one trillion bits; equivalently a BER of less than 10−12.


1

0

1 UI

TX Pulse

RX Pulse

Figure 2.3: Typical channel frequency response.

h0

1 UI

h1

h2

Figure 2.4: Inter-symbol interference (ISI).

However, the presence of ISI makes the detection of error free data increasingly difficult.

The desireable part of the received pulse with ISI is known as the main cursor, defined

as h0 in Figure 2.4. The interference in the current UI from the previous bit is known as

the first post cursor, h1; similarly, the interference from the nth previous bit is hn. These

post-cursors are detrimental to the recovery of the desired main cursor. The elimination

of ISI is the one of the main challenges in transceiver design.


2.1. Equalization

The process of eliminating ISI is known as equalization. Various equalizer circuits exist

both in the analog and digital domains [4] [6] [9] [10] [11] [12] [13] [14] [15].

The frequency response of a band-limited channel is similar to that of a low pass filter

with several poles. A simple solution to extend the bandwidth is to implement a circuit

whose transfer function contains a zero near the channel’s cut-off frequency. Thus this

transfer function provides an amplification to counter the signal attenuation caused by

the channel. This technique, which is known as linear equalization, is illustrated in Figure

2.5.

fb/2 fb/2 fb/2X =

Figure 2.5: Ideal linear equalization.

There is a downside, however, to the design of a circuit with an ever-increasing frequency

response. The presence of white noise at the input of such a circuit would prove to

be detrimental as the high frequency noise would be greatly amplified. Consequently

this would make detection of the received signal impossible. An alternate solution is to

extend the bandwidth by providing a boost to frequency components which lie slightly

higher than the cut-off. Figure 2.6 shows this technique of linear equalization. The linear

equalizers are implemented either as infinite impulse response (IIR) [2] [10] [14] or finite

impulse response (FIR) filters [9] [16] [17], both in the analog domain. However, if the


input is sampled by an ADC at the front end, an equivalent filter can be implemented

as a feed-forward equalizer (FFE), allowing for digital implementation [5] [6] [18].

fb/2

fb/2

fb/2X =

Figure 2.6: Ideal linear equalization.

Approaching the problem of ISI from the time domain, it becomes apparent that if the

post-cursor ISI can be eliminated at the sampling instance, then only the main cursor

is present. At the centre of the nth UI, the received signal xn can be represented as in

Equation 2.1 below [19]. vn is the white Gaussian noise at the sampling instance, hn is

the impulse response of a bandlimited channel, and In is a sequence of symbols from the

transmitter corresponding to the set of non-return to zero data (NRZ) of { -1, 1 }.

xn = In · h0 + isi+ zn = In · h0 +∞∑

k=0, k 6=n

Ik · hn−k + zn, n= 0, 1, 2 . . . (2.1)

Without any equalization, a slicer would simply produce high or low depending on the

sign of the signal. Ideally, it would detect the sign of the main cursur, h0. However, it

can so happen that enough post-cursor ISI is introduced for a slicer to sample the signal

to be a sign opposite to that of the main cursor. By eliminating the post-cursor ISI, this

issue can be circumvented. A type of non-linear equalizer that employs this technique is

known as the Decision Feedback Equalizer (DFE) [1] [20] [21] [22] [23], shown in Figure

2.7.


FF

bn

bn-1

xn

w1a1=bn-1·w1

yn = xn - a1

Critical Path

Figure 2.7: Decision Feedback Equalizer architecture for 1-tap.

A DFE operates on the assumption that the previous bits are detected correctly. Based

on this assumption, and earlier knowledge of the relative values of the post-cursors to

the main cursor, the portion of the signal due to these components are subtracted at

the summing node of the circuit. Thus, if the signs of the previous bits are stored, then

by scaling the signs of these bits with weights corresponding to their ratios of the main

cursor, the ISI’s due to these cursors is eliminated. For example, if the first post-cursor

dominates the ISI, then by scaling the previous bit by the weight w1 (this is a 1-tap DFE

architecture), the signal being sliced becomes:

yn = In · h0 + In−1 · h1 − w1 · bn−1 + vk = In + vk (2.2)

This decision feedback structure can easily be expanded to more weights, by simply

adding additional flip flops in the feedback path. The circuit design of the DFE is

straight forward if implemented in a current summing structure where the current source

of each post-cursor differential pair is scaled relative to the input differential pair of the

current bit [24].

The conventional DFE architecture is often difficult to design due to the stringent timing

constraint of the feedback path. The delay of this path must be less than one bit period.


Especially with the presence of multiple DFE weights, the loading at the summing node is

further increased and limits the overall speed . An efficient alternative design eliminates

the feedback loop of the decision entirely, relaxing the timing constraint [25]. Parallelism

is employed by having two dedicated paths, one assuming the previous bit was a one and

the other assuming the previous bit was a zero. This architecture, shown in Figure 2.8, is

known as the look-ahead architecture also known as the speculative DFE. The two paths

provide speculative bits b+n and b−n from which the correct bit is then selected from a

MUX whose select signal is the correct previous bit. This design allows for higher data

rates, at the price of increased hardware requirements [26].

FFbnxn bn-1

1

0

a1

a1

y-n = xn + a1

y+n = xn - a1 b+

n

b-n

Figure 2.8: Speculative DFE architecture.

2.2. Clock and Data Recovery

Equalization is often incorporated both at the transmitter and at the receiver. Once the

receiver has equalized the incoming data, its next and ultimate goal is to sample the data

precisely to obtain the correct bits that were sent. The receiver circuit generates a clock

from the received data such that the rising edge of the clock (for example) is aligned with

the centre of the data. This process of recovering the data, as well as extracting a clock

to sample the received data is called clock and data recovery, as illustrated in Figure 2.9.


In essence, sampling the data at the centre of the UI when it is most correlated to the

main cursor has the highest probability of obtaining the correct bit.

Recovered CLK

Input Data

Data Sampled at UI Center

Figure 2.9: Sampling of input data with recovered clock.

The block diagram for a simple CDR architecture is shown in Figure 2.10. The phase

detector (PD) uses the information from the transitions of the data to adjust the optimal

sampling position of the clock. Any misalignment in phase between the clock and the

data, φERR, is forced to zero through a stable negative feedback. The charge pump uses

the relative information in phase between the data and the clock to increase or decrease

a voltage which is filtered and fed to the voltage controlled oscillator (VCO) or phase

interpolator (PI). The VCO can also adjust for any frequency offset that might occur

between the data and the clock.

This architecture, known as a phase tracking CDR [27] [28] [29], is similar to that of

a phase-locked loop (PLL), except for the fact that the input data is not periodic. To

account for this difference, the phase detector for the CDR only updates the phase infor-

mation that is propagated through the loop when there is a data transition.


Phase Detector

(PD)

Charge Pump(CP)

Loop Filter(LF)

VCO CK

FF

Din

Dout

Figure 2.10: Architecture of a clock and data recovery (CDR) circuit.

2.3. ADC-based CDRs

An alternative to the conventional phase tracking CDR is the ADC-based CDR architec-

ture. The latter sample the data and convert it to digital. The benefits of the ADC-based

CDRs are that the equalization can take place entirely in the digital domain. Figure 2.11

(a), shows the basic architecture for a phase tracking ADC-based CDR. In this archi-

tecture, equalization can take place in both the analog and digital domain; however, an

analog feedback is still required to adjust the sampling phase of the clock. The bene-

fit of phase tracking CDRs is that they tend to be able to tolerate lossier channels [30] [31].

It is possible to eliminate the analog feedback and sample the data by a blind clock.

This CDR structure, depicted in Figure 2.11 (b), is known as a blind ADC-based CDR

[4] [5] [6] [7] [8]. The obvious benefits of this structure is that a VCO/PI is completely

eliminated from the design, and the clock and data recovery becomes entirely digital.

With such a structure being in the digital domain, the design could be easily ported to

other technologies. On the other hand, since the data is blindly sampled, the digital

CDR must internally estimate the average transition phase, φAV G, from the incoming

samples. In order to have an accurate phase estimate, blind ADC-based CDRs normally


Digital CDR+DFE

N-bit ADC

AnalogEqualizer

DOUT

CLKREC

RXIN

Analog Feedback

Digital CDR+DFE

N-bit ADC

AnalogEqualizer

Blind CLK

DOUTRXIN

(a) Phase Tracking ADC-based Receiver

(b) Blind ADC-based Receiver

ΦREC

For (a) CLKREC

For (b) ΦREC ΦAVG ΦPICK ΦAVG

Data

(c) Recovered Clock

Figure 2.11: Architecture of a phase tracking ADC-based CDR.

oversample the data. The blind sampling ADC-based CDR uses the phase estimate to

precisely select the correct sample in the current UI window. A comparison between

the recovered clock in the phase tracking and the recovered phases in ADC-based CDR

is shown in Figure 2.11(c). The recovered phase in the ADC-based CDR also contains

the estimate of the UI centre phase, φPICK . Usually, the UI window is represented with

a digital number; therefore, φPICK is simply equal to φAV G + 0.5UI. The data decision

scheme is very flexible due to the ease of design in digital. In previous works, interpolating

and extrapolating schemes have been employed [4] [5] [6] [7] [8].

Figure 2.12 illustrates the operation of an ADC-based CDR with an oversampling rate

of 2 (2x). In this case, the CDR uses interpolation for both phase detection and for data

selection; this was the technique used in previous works [5] [6]. In the 2x case, the ADC

has 5 bits. This high resolution is necessary to obtain an accurate estimate of the phase

via interpolation between adjacent samples. The average UI centre, φPICK , is used to

interpolate between two adjacent samples. The value of the interpolated data sample is

Dn. This high resolution in the ADC also permitted these works to implement an FFE,

allowing for higher channel losses.


UI Window

Φx Φx Φx Φx

0.5 UI

ΦPICK

S1

S2

S3

S4

S5

S6S7

S8 S9

ΦPICK ΦPICK ΦPICK

D1

D2

D3

D4

Figure 2.12: Interpolation of phase and data.

FF

FF

DFE Consant

DFE Consant

0

1

Previous Bit

Dn bn

Figure 2.13: Architecture of a digital DFE.

A DFE was viable in both the 5-bit and 3-bit ADC-based CDRs [5] [6] [8]. The digital

DFE worked as follows: once the data bit was obtained either through interpolation or

extrapolation, a DFE constant, represented as a digital number, would be subtracted and

added, as in the case of the speculative DFE architecture. Finally, a MUX would select

the sign of the correct speculative bit while discarding the other, based of the previous

bit. Clearly, the DFE would only be making a difference if the unequalized sample is

slightly below or above zero, such that the DFE coefficient bumps it up or down just


above the threshold, such that the sign is flipped. A DFE was feasible for both designs

due to a substantial number of levels in the voltage domain. Clearly, if the ADC resolu-

tion is too coarse, then equalization has little effect.

0

50

100

150

200

250

300

Tota

l Po

we

r (m

W)

Power of ADC-based CDRs

2X 5-bit CDR [5]2x 5-bit CDR [6]2x 5-bit CDR [4]3x 3-bit CDR [8]

Figure 2.14: Power Comparison of previous ADC-based CDRs of various ADC resolution andoversampling rate [4] [5] [6] [8].

As seen in Figure 2.14, one of the main drawbacks with ADC-based CDRs of the previous

designs, all at 5Gb/s operation, is that they all consumed more than 100mW of power,

(95 mW for the 3x case) [4] [5] [6] [8]. However, by taking a closer look at the comparison

in power between the case of 5-bit 2x vs. 3-bit 3x, it is clear that the latter consumes

less power. The key contributor to this reduction is the reduction of comparator use per

UI as seen in Figure 2.15.

In the 5-bit 2x case, 62 comparators sample the data in one UI. This is in contrast with

the 3-bit 3x case, where only 21 comparators sample the data per UI. However, there

comes a penalty with the reduction of resolution. The 5-bit 2x case is able to implement

an FFE, which provides significant benefits for higher channel loss; while the implementa-


12

34

5

1

2

3

4

50

50

100

150

200

555

151515

444

353535

121212

333

Oversampling Rate

757575

282828

999

222

155155155

606060

212121

Comparator Use UI vs. ADC Bits and OSR

666

111

124124124

454545

141414

333

939393

303030

777

ADC bits

626262

151515

313131

Com

para

tor

Use

per

UI

Figure 2.15: Number of comparators as a function of the oversampling rate and the ADCresolution. The 2x-5bit designs [4] [5] [6] used 62 comparators per UI, and the 3x-3bit [8] used21 comparators per UI (Assuming a flash ADC is used).

tion of one in the 3X case is not feasible. Following the trend, as the resolution continues

to decrease, the FFE becomes impractical, but as mentioned previously, as does the DFE.

The next chapter proposes a design that reduces the power even further by way of re-

duction of total number of comparators per UI.

3 Proposed Design

The previous works on blind oversampling ADC-based CDRs have all been somewhere on

the equivalent number of bits (ENOB) vs. oversampling rate spectrum, as seen in Figure

2.15. The higher the product of the oversampling rate and 2ENOB − 1, the greater the

number of comparators the system uses per UI. Consequently, this increases the power of

the analog front-end and to a lesser extent also of the digital CDR. However, the benefit

of having a higher ENOB is that it allows digital equalization to be more precise. For

example, for binary oversampling, digital equalization is not possible because there is no

amplitude information in the samples. On the other hand, a 2x-oversampled 5-bit ADC-

based CDR, has 32 levels in the voltage domain . This enables the implementation of a

digital DFE by simply subtracting a number that corresponds to the ISI at the sampling

instance. Of course, schemes for adjusting the DFE coefficient based on the sampling

instance would be needed, but it has been done [4] [5] [6] [8].

The focus of this thesis is an architecture that reduces the analog power by reducing

the ENOB to 1.5-bit or equivalently 3 ADC-levels, but by increasing the oversampling

rate to 4. This architecture, which makes 8 comparator uses per UI, is implemented

by two comparators with offsets that are symmetric about the common-mode. The two

comparators form a 3-level ADC with adjustable thresholds. Alternatively, this can be

looked at as a speculative DFE. In this view, the comparators’ offsets represent the tap

weight corresponding to the 1st post-cursor ISI. The outputs of the comparators are

17

Chapter 3. Proposed Design 18

speculative binary bits, which are used for data-decision inside the Digital CDR.

3.1. Proposed Full Rate Architecture

In the full rate architecture shown in Figure 3.1, the receiver input is first equalized by

a CTLE, to allow for channels with higher losses. The CTLE design [32](produced by

Clifford Ting, a former MASc student in the group) is a source degenerated pair, with

tunable one-hot coded resistors. The eye diagrams shown above the system correspond

to a behavioural simulation from a channel with 18dB attenuation at Nyquist. It can be

seen that before the CTLE, the eye is completely closed, and the CTLE opens the eye

by 0.1 UI.

CS Flag

DOUT

S[1:4]P

5Gb/s

RX

20 GHz CLK ÷4

5 GHz

DigitalCDR

POS

NEG

Comparator Pair

4+α

-α

1

2

CTLE/VGA

7CLTE

4VGA

Offset Calib.Engine

4

Offset Calib.Engine

S[1:4]N

Eye Before CTLE Eye After CTLE Composite Eye

4

4

0.1 UI 0.45 UI

Figure 3.1: A Full rate system block diagram of the proposed speculative ADC-based CDR.


This equalized data is then sampled by two comparators, one with a positive offset (POS)

and one with a negative offset (NEG). The two comparators sample the 5 Gb/s data at

20GSa/s, equivalently 4 samples per UI per comparator. A composite eye after the DFE

is generated by combining the speculative outputs of the comparators. The composite

eye is opened even further to 0.45 UI.

Figure 3.2 shows a behavioural simulation of a 5Gb/s input to the receiver from a channel

with a 7dB attenuation at Nyquist.

14.5 14.7 14.9 15.1 15.3 15.5 15.7

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

time (ns)

Vol

tage

(V

)

Analog Input to CDR for a 7dB Loss at Nyquist Channel

Figure 3.2: MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB Channelat Nyquist.

However at this stage, it is not known what the previous bit was or which sample corre-

sponds to the centre of the UI. Therefore, all of the samples must be sent to the Digital

CDR where this information can be processed. The full-rate Digital CDR is shown in

Figure 3.3.

In the full-rate architecture, there are 8 samples in one CDR clock cycle, 4 from each

comparator path. The CDR stores one sample from the next cycle. This is vital for


÷2 LF

DFE Decisions

FF

Φx ΦAVG

bPREV.

CS Flag

ΦxP

ΦxN

Cycle Slip Detection

DOUT

S[1:5]N S[1:5]P

PD

(POS)

(NEG)

DD(POS)

(NEG) bN

bP

Arrange Data

FR Digital CDR

5 GHz CLK

S[1:4]P

2

1

S[1:4]N

Figure 3.3: Full-rate Digital CDR.

phase detection where the entire UI must be represented by 5 samples; where the 5th

sample is the first of the following UI. Note that for a causal system a delay is added to

the four current samples such that the 5th sample in the UI window is available. This

data is then used to extract the phase information from the samples, and to make a

data decision for each UI window. The data decision block simply chooses the sample

from both paths that is closest to the centre of the UI via the estimate of φPICK . Two

speculative bits are extracted from the data decision for one UI, the correct speculative

bit is then chosen in the DFE Decision Block.

3.2. Front-End Architecture

3.2.1. Front-End Speculative DFE

The equalization of the proposed design is achieved on the analog end by adjusting the

DC offset of the comparators sampling the data. Naturally for the purpose of DFE, this

is done in pairs of comparators whose offsets are symmetric about the common mode.


The offset is chosen to match the magnitude of the first post-cursor ISI and defined as

α; with one comparator having a postive differential offset and its complementary pair

having a negative one. Figure 3.4 shows the pulse response of the output of a 7dB channel

at Nyquist. In this case, the magnitude of the differential offset would correspond to the

voltage at first post cursor.

14.5 14.7 14.9 15.1 15.3 15.5 15.7 15.9

0

0.1

0.2

0.3

0.4

0.5

time (ns)

Vol

tage

(V

)

Pulse Response for a 7dB Loss at Nyquist Channel

h0

h0 = |α|

Figure 3.4: Pulse Response of a 7dB Channel at Nyquist.

For instance, if the previous bit was a 1, the previous bit’s pulse response would introduce

an α amount of undesired positive ISI at the precise centre sampling point of the current

bit. In this case, by sampling the current bit using the comparator with the positive

offset, we are effectively cancelling the first post-cursor ISI. Similarly, if the previous bit

is 0, we will use the comparator with a negative offset to sample the current bit.

The only issue is that in the implemented system the previous bit is unknown at the

comparator output stage. If the decision feedback were to be implemented at full-rate, a

flip-flop would have to be designed to operate for 5Gb/s. The proposed system simplifies


this timing requirement by storing the outputs of both comparators, thus acquiring both

cases, and feeding the DeMUXed data forward to the digital CDR. In the digital CDR,

a finite state machine (FSM) determines the correct comparator output to be chosen.

Figure 3.5 depicts the analog and digital inputs for the same waveform snippet as in

Figure 3.2.

It is important to observe the differences in the POS comparator waveform and the NEG

comparator waveform. Specifically how the window length for a specific bit is different

for the two cases; this is even more true with the presence of jitter. If our decision scheme

has to decide between two samples that are opposite, then it is crucial in selecting the

correct comparator output; hence the operation of the DFE.

3.2.2. Comparator Architecture

The implementation of the comparator is shown in Figure 3.6. This comparator (de-

signed by Sadegh Jalali, a Ph.D student in the group) is a strong arm latch with offset

cancellation transistors connected to both sides.

There are three offset transistors per side, which are binary weighted. This comparator

architecture is capable of tolerating a total offset of 250mV in either direction. However,

this would correspond to the absolute worst case in regards to mismatch. The offset

transistors all have an identical off-chip bias allowing for tunability. Therefore, the res-

olution of the offset transistors corresponds directly to the worst case total offset of any

comparator.

The offset can be either calibrated manually or automatically from a small finite state

machine, via the control signal calib. During calibration, the two input pairs are forced

to have the same input, so that the differential input would be zero. The offset due to

mismatch is then calibrated. During normal operation, the input pair would be forced

to the differential offset of α, corresponding to the first post-cursor ISI. The differential


Figure 3.5: MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB Channelat Nyquist.


vddvddVbias

Vbias

OSP2

calib

10

RefP

VinP

calib

10

clk clk

clkclkclk clk

VoutM

Positive Offset Cancellation

Input Pairs Negative Offset Cancellation

Reset and Positive Feeback

OSP1 OSP0 OSN2 OSN1 OSN0RefP

RefM RefM

VinM

VoutP

Figure 3.6: The comparator has offset cancellation in order to have a tuneable offset.

α is directly connected to two input pins on the chip. Therefore, it can be easily tuned

with a DC-voltage supply.

3.3. Phase Detection

The purpose of the phase detection block is to detect the zero-crossings in each of the

UIs. For this full-rate system with an oversampling rate of 4, the UI window consists

of five consecutive samples as seen in Figure 3.7. The fifth sample is the first sample of

the next UI. To make the system causal, a delay is added such that the four incoming

samples belong to the previous CDR cycle, while the fifth sample is from the current


cycle.

A B C D E

UI Window

Figure 3.7: The UI window of 5 consecutive samples.

The phase detection scheme is done by XORing the adjacent binary samples for each

path, as seen in Figure 3.8. If there exists a transition it is assigned a 3-bit weight

corresponding to its position in the UI. However, this will produce two sets of zero

crossings: one with respect to +α and another with respect to -α. In order to obtain the

true zero crossing with respect to the common mode, the complementary phases (φPOSx

and φNEGx ) of the two paths are averaged.

XOR

XOR

XOR

XOR

A

B

C

D

E

1/8

3/8

5/8

7/8

Phase Detector Slice

ΦXPOS or NEG

Figure 3.8: Phase estimation is done by determining the transition point in the UI.


This is shown in Figure 3.9, where φPOSx and φNEG

x from the POS and NEG paths are

summed and divided by two to produce φx. This is then subtracted by the average phase

from the filter to produce φERR. The φERR is averaged by the filter to provide an estimate

for the average zero crossing, φAV G, as seen in Figure 3.10. Ideally as in any closed loop

system, the absolute φERR is minimized, and its value provides an observation point for

debugging tracking issues. The filter is third order, with the ability to track frequency

ramps.

1/8

3/8

5/8

7/8

POS PDSlice

ΦXPOS

1/8

3/8

5/8

7/8

NEG PDSlice

ΦXNEG

½

ΦX

Figure 3.9: The phase estimate from the POS and NEG sets is averaged to obtain the truezero-crossing. This is the operation of the systems full-rate phase detector.

Through behavioural simulations, we observed that the proposed PD operates just as

effectively as a phase detection scheme whose samples are the outputs of zero-offset com-

parators. Note, the zero crossing PD is not the optimal PD, as the data is unequalized.

However, as will be shown shortly, the proposed technique has significant benefits in

opening the horizontal eye. Figure 3.11, shows the comparison between a zero crossing

PD that also XORs adjacent samples vs. the proposed PD. Both of these phase detection

system are operating on samples that have been oversampled by 4.


FFK1 FFK2 FFK3K3

φAVG

Loop Filter

Phase Detector

φERRφX

Figure 3.10: The phase estimate from the phase detector, φx, is subtracted by the loop filteraverage phase, φAV G, to produce φERR, which is fed back to the loop filter.

4000 5000 6000 7000 8000 9000 10000 110000

100

200

300

400

500

600

700

800

900

1000

CDR Clock Cycle

φAVG

(UI=

1024)

Phase Detection of Zero Crossing PD vs. Proposed PD

Proposed PD

Zero−Crossing PD

FrequencyOffset = 20ppm

Figure 3.11: Behavioural simulation showing the comparison of a zero-crossing PD schemeand the proposed one.

3.4. Data Decision

The 8 samples from the POS and NEG paths (4 samples each) are passed to the Data

Decision along with φAV G. The first sample from the following UI is not passed, as it

is only needed for phase detection. Since φAV G determines the average phase crossing,

then the average UI centre is defined as φPICK , where φPICK = φAV G + 0.5UI.


The data decision scheme is straight forward, the sign of the sample in the UI window

closest to φPICK is chosen as the bit of that UI window. Figure 3.12 illustrates the

selection process. Importantly, the selection of the centre sample in each UI window is

done for both the POS set of 4 samples and the NEG set. Each of these two paths will

independently provide a speculative bit, based on φPICK . Then the correct bit, either bP

or bN , is selected based on the bit from the previous cycle, bprevious. If bprevious is 1 then

b = bP , otherwise b = bN .

A B C D A

0 10241 UI Window

φPICK

Figure 3.12: The selection of the correct sample is based on the position of φPICK .

As indicated above, the DFE selection process for the full-rate system is a simple MUX,

which is controlled by the bit from the previous CDR cycle. However, the complete

system is not full-rate and is a DEMUXed system, which will be presented in the next

section.

3.5. Complete System Architecture

The complete system architecture is shown in Figure 3.13. The input 5Gb/s data is

sampled by 8 comparator pairs. The clocking network is generated via a clock divider,


which divides a 10GHz external clock into 8 2.5GHz phases (one phase per comparator

pair). The timing between consecutive phases is 50ps; therefore having a total sampling

rate of 20GSa/s. In one cycle of the 2.5GHz clock, the comparator chain obtains 8

sample pairs, corresponding to 2UIs. These samples are then DEMUXed by a factor of

4; producing 32 POS samples and 32 NEG samples which correspond to 8 UI windows.

These samples are then fed to the digital CDR, shown in Figure 3.14, which is triggered

on a divided-by-four rate of one of the clock phases. The digital implementation of CDR

is just a scaled-rate version of the full-rate blocks. The phase detector and the data

decision for the complete system have 8 slices of the full-rate versions, the filter remains

identical for both cases, and the DFE Decisions becomes a chain of MUXes instead of a

single one (as illustrated in section 3.4.

CS Flag

DOUT

S[1:32]PS[1:8]P

5Gb/s

RX

10 GHz CLK

÷48

2.5 GHz 625 MHz

DigitalCDR

DEMUX 8:32

POS

NEG

Comparator Pair

4+α

-α

8

2

CTLE/VGA

7CLTE

4VGA

1

Clock Divider

Offset Calib.Engine

4

Offset Calib.Engine

x8

S[1:8]N DEMUX 8:32

S[1:32]N

Figure 3.13: The complete system architecture.

In the complete system, the Data Decision block detects 8 bit pairs per CDR cycle. These


÷2 LF

DFE Decisions

FF

Φx[1:8] ΦAVG

b[8]

CS Flag

Φx[1:8]P

Φx[1:8]N

Cycle Slip Detection

DOUT

S[1:33]N S[1:33]P

PD

(POS)

(NEG)

DD(POS)

(NEG) b[1:8]N

b[1:8]P

Arrange Data

Digital CDR

625 MHz CLK

S[1:32]P

2

8

S[1:32]N

Figure 3.14: The complete digital CDR architecture.

eight bit pairs are then passed on from the Data Decision to the subsequent block, DFE

Decision. This block is just an expanded version of the full-rate DFE, a chain of MUXes

shown in Figure 3.15. A MUX chooses between the POS bit and the NEG bit based on

the selection of the previous bit. Specifically, if the previous bit was a 1 the POS bit is

chosen, and the NEG bit is chosed if it was a 0. The previous bit simply corresponds

to the output of the previous MUX. As can be seen from Figure 3.15 this architecture is

a chain of MUXes whose outputs are the final bit decisions for the 8 UI windows. The

very first MUX is controlled by the final bit of the previous CDR cycle, as in the full-rate

case.


b1

Prev_b8

To PRBS Comparator

b2

P(b1)

N(b1)

P(b2)

N(b2)

b7

b8P(b8)

N(b8)

b[1:8]

Figure 3.15: The selection of the correct set of samples is done with a chain of MUXes.

3.5.1. Cycle Slip

The digital CDR obtains 32 sample pairs corresponding to 8 UI windows. The 8 UI

windows will detect 8 bits if there is no frequency offset between the sampling clock and

the data. However, if there is a frequency offset, then the system must be able to account

for the cases:

1. If the received data rate is faster than the blind clock, this implies that φPICK is

increasing, and a bit must be dropped when φPICK transitions from bin D to A.

2. If the received data rate is slower than the blind clock, this implies that φPICK is

decreasing, and a bit must be added when φPICK transitions from bin A to D.

The boundary conditions of the two cases are illustrated in Figures 3.16 and 3.17 respec-

tively. The two figures show the last two UI windows of the previous cycle as well as the

first two of the current cycle. In Figure 3.16, the data frequency is faster than the clock

meaning that the samples arrive earlier and earlier with respect to their clock edge. This


S32S31

S30S29S28S27

S26S25 S1 S2

S5S4S3 S6

S8S7 S10S9

CDR Cycle (n-1) CDR Cycle (n)

ФPICK(n-1) ФPICK(n-1) ФPICK(n) ФPICK(n) ФPICK(n)

UI Window

A B C D A A B C D A A B C D A A B C D A A

Sample Bins

Figure 3.16: Cycle Slip; data rate is faster than blind sampling clock.

S32S31

S30S29S28S27

S26S25 S1 S2

S5S4S3 S6

S8S7 S10S9

CDR Cycle (n-1) CDR Cycle (n)

ФPICK(n-1) ФPICK(n-1) ФPICK(n) ФPICK(n)

A B C D A A B C D A A B C D A A B C D A A

Sample Bins

Figure 3.17: Cycle Slip; data rate is slower than blind sampling clock.

means that φAV G and φPICK are both increasing (shifting to the right) with respect to

that sample. This happens until the condition is reached where φPICK rolls over from

bin D to bin A. In this case there would be a duplicate bit as can be seen from the figure.


Therefore, the first bit must be removed.

In Figure 3.17, the data frequency is slower than the clock, meaning that the samples

arrive later and later with respect to their clock edge. This means that φAV G and φPICK

are both decreasing (shifting to the left) with respect to that sample. This happens until

the condition is reached where φPICK rolls over from bin A to bin D. In this case, there

would be a missing bit as can be seen from the figure. This missing bit is added by taking

the sample from bin A.

For any CDR operation within the 8 UI windows, there could be either 7,8, or 9 bits.

This information is represented via the CS Flag signal.

We present the simulation and measurement results in the next chapter.

4 Simulations and Measurement

This section first describes the behavioural simulations of the entire transceiver model

followed by measured results as well as the measurement process of the chip.

4.1. Behavioural Simulations

The entire transmitter-receiver system was modelled in Simulink, shown in Figure 4.1,

using an event-driven model [33]. The main benefit of using the event-driven model is its

high speed that allows more UIs to be processed compared to the conventional simulation

technique. The transmitter sends PRBS-7 or PRBS-31 data at a rate of 5Gb/s. This

data is then fed through the channel model; which is a time domain model of the step-

response. The channel model step response also takes into account the apperture window

of our sampling function. The ISI depth is 90 UIs including precursors and postcursors.

The output of the channel model is then sampled by 8 comparator pairs, each triggered

by an 8-phase 2.5GHz clock. Finally this sampled data is DEMUXed and given to the

model of the proposed CDR for processing. Verification of the CDR’s detected bits are

done by a model of a BERT with a 32 bit long FIFO. The simulation is done for 106

bits where the first 2.5 × 104 bits are not counted towards the bit error count. Those

initial bits are ignored in order to give the loop filter an initial time to achieve lock.

Jitter is added on the transmitter clock; both random jitter (RJ) and sinusoidal jitter

(SJ) are included in simulations. Frequency offset between the transmitter and receiver

34

Chapter 4. Simulations and Measurement 35

4fTX−RX , is distributed symmetrically between the transmitter and receiver clocks.

PRBSGenerator

Channel Model

8:32DeMUX

Digital CDR

BERT8xComparator

Pairs

8 32 7..9 ErrorCount

÷2 ÷8Sinusoidal

Jitter Source

5 GHzTX Clock

5 GHzRX ClockReceiver

Figure 4.1: Simulink Model of the entire system.

The channel models used for the Simulink behavioural simulations are shown in Table

4.1, below. Channel 1 and 2 were obtained from the channel measurements that were

done by Shayan Shahramian, a former member of the group. These two channels are the

measurements from two daughtercards on a backplane.

Name Description Loss at 2.5 GHz(dB)

Channel 1 8” FR4 Tyco 6.9Channel 2 24” FR4 Tyco 10.7Channel 3 Simulated Channel 18

Table 4.1: Channel models used for simulation.

The S-parameters were obtained; then, a transfer function was fitted to the S21 parame-

ter, and a step-response was correspondingly generated to be used for the systems channel

model. These two channels exhibit 7dB and 11dB loss at Nyquist frequency. Addition-

ally, higher attenuation channels were measured. However, obtaining a fitted transfer

function proved to be difficult due to several notches being present in the frequency re-

sponse. Thus, Channel 3 was generated manually to simulate higher attenuation channels


that require the use of the CTLE.

4.1.1. DFE Performance

The main metric in assessing the performance of the DFE is the CDR’s tolerance to

input jitter. Channel 1 served as an initial stepping-stone to verify the functionality of

the CDR as well as its sensitivity to the DFE coefficient, which in the proposed system

corresponds to the comparator offset, α. Figure 4.2 shows the eye from Channel 1 going

into the receiver. As can be seen the eye has a significant horizontal and vertical eye

opening of 0.74UI and 360mV.

As would be expected for such a case, the DFE weight does not play a significant role

in increasing the eye opening of an already opened eye. This can be observed in the

reconstructed eye, after the DFE, corresponding an optimal offset of α = 50mV . The

0.1UI increase in the horizontal eye opening can directly be observed in the jitter tol-

erance simulations. The optimal offset was found by sweeping α in 25 mV increments

in order to obtain the optimal jitter tolerance curve, as can be seen in Figure 4.4. It

can be seen that the high frequency jitter tolerance increases as the α is increased, un-

til the optimal point of α = 50mV is reached. At this point, any further increase in α

causes a decrease in the high frequency jitter tolerance as the data is being over-equalized.

One important observation from the eye diagrams and the jitter tolerance curve is that

the jitter tolerance is mostly dependent on the horizontal eye opening. This is the case

because the data decision just takes the sign of the sample that is nearest to the estimate

of the UI centre, φPICK . The key determinant in how well the CDR perfroms in the

presence of jitter is the phase detector. The vertical information comes only into effect

when the phase estimate is near the edge of the eye, where the likelihood of the specu-


lative samples being opposites is high. In this case, the DFE opens the eye if we select

the correct speculative bit.

The effect of the DFE is more visible for a channel with more loss, as is the case with

Figure 4.2: Simulated eye diagrams for Channel 1, before and after DFE.

Channel 2. In the case of Channel 2, the increased attenuation is clearly visible by in-

spection of the eye opening, as seen in Figure 4.3. With no DFE, α = 0mV , the eye is

almost completely closed. With the DFE enabled, the horizontal eye opening has a much

more significant increase relative to the case in Channel 1.


Figure 4.3: Simulated eye diagrams for Channel 2, before and after DFE.

In Figure 4.4 the jitter tolerance curves of sweeping the comparator offset, show that

the optimal offset is somewhere between 50mV to 75mV. With the DFE disabled (when

α = 0mV ), the CDR is on the brink of breaking when tracking jitter.

Though the CTLE opens the eyes of both Channel 1 and Channel 2 even further, the

benefits of having a CTLE enabled are best observed for a channel with even more at-

tenuation. Channel 3 has 18dB loss at Nyquist frequency, and from Figure 4.6 it can be

seen that the eye prior to the CTLE is completely closed.


0.1

1

10

1E+05 1E+06 1E+07 1E+08 1E+09 1E+10

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)

Offset = 0Offset = 25mVOffset= 50mVOffset= 75mVOffset = 100mV

Data Type: PRBS-7 Target BER: 10-6

ΔfTX-RX = 200ppm

Figure 4.4: Jitter tolerance curves for Channels 1 vs. α.

0.01

0.1

1

10

1E+05 1E+06 1E+07 1E+08 1E+09 1E+10

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)

Offset = 0Offset = 25mVOffset= 50mVOffset= 75mVOffset = 100mV


ΔfTX-RX = 200ppm

Figure 4.5: Jitter tolerance curves for Channels 2 vs. α.


Figure 4.6: Simlated eye diagrams of channel 3, before CTLE, after CTLE, and after DFE.

The CTLE opens up the eye a little bit; however, more equalization is needed. The


CTLE frequency response was obtained from Cadence circuit simulations and exported

to MATLAB. Then to get the equivalent Channel model plus CTLE, the two impulse

responses were convolved. The effect of the DFE is well illustrated on the input data

which is already equalized by the CTLE. The eye is opened by almost 0.45UIPP , for an

α = 100mV . Figure 4.7, shows the CDR’s jitter tolerance for channel 3 with the CTLE

enabled. The figure confirms that with the DFE disabled, or even with α < 50mV , the

CDR cannot obtain error free data. However, once the DFE coeffiecient is increased to

α = 100mV , the CDR is able to recover error free data for a high frequency jitter of

0.1UIPP .

0.01

0.1

1

10

1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)

Offset= 50mVOffset = 100mVOffset = 125mVOffset = 150mV


ΔfTX-RX = 200ppm CTLE Enabled

Figure 4.7: Jitter tolerance curves for Channel 3 vs. α.

4.1.2. Effect of Comparator Offset on Jitter Tolerance

To simulate the effect of non-ideal comparator offsets due to mismatch, each of the

comparators was assigned a random offset from a Gaussian distribution with a mean of


0mV and a specified standard deviation. This was done initially to give insight for the

required precision of the offset calibration blocks. Figure 4.8, shows the CDR’s jitter

tolerance for channel 1 for a sweep of standard deviation values, with an optimal α for

Channel 1 of 50mV. It can be seen that the CDR is able to recover error free data for

a standard deviation of 40mV but cannot operate at all for a standard deviation of 60mV.

0.01

0.1

1

10

1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)

Sigma = 0Sigma = 20mVSigma = 40mVSigma = 60mV


ΔfTX-RX = 200ppm Offset = 50mV

Figure 4.8: Jitter tolerance curves for Channel 1 vs. standard deviation of comparator offset.

The effect of non-ideal comparator offsets was also simulated for Channel 2, to observe

this impact for lossier channels. Figure 4.9, shows the CDR’s jitter tolerance for Channel

2 for a sweep of standard deviation values, with an optimal α for Channel 2 of 60 mV. As

can be seen the CDR is able to tolerate even less comparator offset for a lossier channel.

In fact for the case of Channel 2, the CDR is able to recover error free data with the

presence of high frequency jitter of standard deviation less or equal to 20 mV. It can’t

operate for deviations of 40 mV as in the case of Channel 1. These simulations provided

an initial estimate for the precision of the offset calibration to be less than 20 mV.


0.01

0.1

1

10

1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)

Sigma = 0Sigma = 20mVSigma = 40mVSigma = 60mV


ΔfTX-RX = 200ppm Offset = 50mV

Figure 4.9: Jitter tolerance curves for Channel 2 vs. standard deviation of comparator offset.

4.2. Experimental Results

This section will present the measured results of the fabricated test chip. First the re-

ceiver layout and testing environment will be shown followed by the measured results

of CDR operation. Finally, the power dissipation of the chip will be assessed against

previous works.

4.2.1. Receiver Layout and equipment setup

The chip was fabricated in Fujitsu’s 65nm technology. The test chip consists of the digital

CDR, a clock divider, test registers, and an on-chip BERT. The test setup overview is

shown in Figure 4.10. The 5Gb/s data is generated with a PRBS-7 generator and passed

through a FR-4 channel, of various lengths (16”,32,”,48”). The chip clock is derived from


Comp.Core

Digital CDR

TestRegisters

PRBSComparator16 64

64

9

Error Count

CLKDivider

2.5GHz CLKs8

÷ 4

10 GHzRX CLK

FR4 Channel

5 GHzTX CLK

PRBS-7Generator

CTLE

DUT

FPGA

Calib.

Figure 4.10: Measurement setup.

a 10 GHz source. An FPGA is used to control the test registers which contain the CDR’s

settings. An on-chip BERT is used to verify the error count of the system. The error

count is output from the chip and observed on the logic analyzer. Figure 4.11 shows the

die photo along with the pin names.

Table 4.2 describes the purpose of each of the pins. The data pins are outputs from the

FPGA that write to the test registers on-chip, in order to control both the CDR and

CTLE settings. These same data[0:5] pins are used to manually write to each of the

test registers of the calibration block. The pin Prog Calib controls both the input to

the comparators during calibration, as well as the data pins from the FPGA, enabling

writing to the calibration blocks rather than the digital CDR.

The measurement setup is shown in Figure 4.12.

• Signal Generator 1: Agillent 83712B 10MHz-20GHz Synthesized CW Generator


VBIAS_CLKGEN

AVS_EQ

VCM_CLKGEN

AVD_ADC

CLKN

AVS_ADC

CLKP

AVS_ADC

VREFP

VB

VREFN

AVS_ADC

PROG_CALIB

AVD_ADC

RST_CAL

AVS_ADC

AVD_ADC

VSS_D

IG

RST_D

IG

VD

D_D

IG

VSS_D

IG

DATA_STB

AD

R_STB

DATA[5

]

DATA[4

]

DATA[3

]

DATA[2

]

DATA[1

]

DATA[0

]

REG

_CLK

VSS_D

IG

VD

D_D

IG

VD

D_IO

VSS_IO

---

---

---

VBIA

S_EQ

2B

RX_IN

AVS_EQ

RX_IN

X

AVS_EQ

AVD

_EQ

AVD

_EQ

VCM

_CTLE

VBIA

S_EQ

AVD

_D

IG

VBIA

S_EQ

2A

AVS_D

IG ---

---

DOUT[7]

VSS_IO

DOUT[6]

VDD_IO

DOUT[5]

VSS_IO

DOUT[4]

VSS_IO

CLKOUT

VDD_IO

DOUT[3]

VSS_IO

DOUT[2]

VDD_IO

DOUT[1]

VSS_IO

DOUT[0]

A CLK Divider

B

C

D

ADC & DMX & Offset Calib.

Digital CDR

CTLE

65 x 100 μm2

300 x 240 μm2

240 x 165 μm2

180 x 180 μm2

B C

A D

2mm

2mm

Process 65nm CMOS

5Gb/s

1.2V

Data Rate

Supply

Figure 4.11: Chip die photo.

• Signal Generator 2: Centellax TG1C1-A Clock Synthesizer

• PRBS Generator: Centellax TG1B1-A 10G PRBS


Pin Name Description

RX in/RX inx Differential Input DataVrefP/VrefN Differential Comparator Offset, αCLKp/CLKn Differential CLK InputVbias Clkgen CLK Divider BiasVcm Clkgen CLK Divider Common ModeVcm CTLE CTLE Common ModeVbias Eq,Vbias Eq2a,Vbias Eq2b CTLE/VGA biasesVB Offset Cancellation Transistor BiasRst Cal Reset CalibrationProg Calib Enable CalibrationRst Dig Reset DigitalData Stb Data strobe for test registerAdr Stb Address strobe for test registerReg CLK Clock for test registersData[0...5] Address/Data bits for test registerDout[0...7] Output Data bits to Logic AnalyzerAVD ADC/AVS ADC ADC Core + DEMUX Power SupplyAVD EQ/AVS EQ Equalizer Power SupplyVDD DIG/VSS DIG Digital CDR Power SupplyVDD IO/VSS IO Input/Output Buffer Power Supply

Table 4.2: Pin description.

• Logic Analyzer: Tektronix TLA714 Logic Analyzer

• Narda 4346 180◦ Hybrid (2-18GHz)

The chip was measured with a probe card. An Agillent 83712B clock generator was used

to generate a 10GHz singled ended clock, which was then converted to be differential and

passed to the chip as CLKp and CLKn. A Centellax TG1C1-A Clock Synthesizer was

used to generate a single ended 5GHz clock with 10kHz to 100MHz of sinusoidal jitter.

The clock was then used to generate 5Gb/s PRBS data with jitter. This data was then

sent through an FR4 channel to the chip as RX in and RX inx. An FPGA was used

to program the test registers of the CDR and the Offset Calibration blocks. Finally, a

Tektronix TLA714 Logic Analyzer was used to observe the error count from the on-chip

BERT.


PRBS GeneratorCLKIN CLKIN

DOUT DOUT

Signal Gen. (2)10MHzIN

Channl2

Signal Gen. (1)

10MHzOUT

RFOUT FMIN

180° Hybrid

SEIN

OUT0 OUT180Probe Card

DUT

FR4 Channel

DC Power Supplies

FPGA

Data_Stb,Adr_Stb,

Prog_calib,Rst_cal,Rst_dig,

Reg_CLK

Data[0...5]

66

RX_INX

RX_IN

CLKP

CLKNLogic

Analyzer

DOUT[0...7]

8

AVD_ADC, AVD_EQ, VDD_DIG, VDD_IO,Vcm_CTLE, Vcm_Clkgen, Vb, Vbias_Clkgen, Vbias_Eq,Vrefp, Vrefn

5 GHz CLK, 1kHz-100MHz SJ

5 Gb/s PRBS

10 GHz CLK, Single Ended

10 GHz CLK, Differential

Figure 4.12: Detailed equipment setup.

4.2.2. Offset Cancellation

The first step in the measurement process was to calibrate for the offset due to mismatch.

With availability of multiple dies, each die had to be calibrated to cancel its particular

offset. It was recognized early on in the testing period that the automatic calibration was

not working. This was debugged to be due to an unintentional floating node in the digital

calibration blocks. However, manual calibration was still possible. The floating node did

not affect the functionality of the system, but as will be presented later, it increased the

total power consumption. When calibrating for the offset, the logic analyzer was vital in

determining the correct offset code per comparator. During calibration, the differential

voltage applied to the comparator is set to zero, by way of setting VrefP and VrefN to

the common-mode voltage. If there was a significant amount of offset, the output would


be a constant high or low. In the case of a small offset, the output would be seen on

the edge of either being high or low and thus would be seen as toggling on the logic

analyzer. This observation proved to be the key indicator for determining when offset

was minimized. Thus by increasing or decreasing the current on either side of the input

pair, through adjusting the offset cancellation codes, the output would begin toggling

when the optimal code was set. Upon completion of the offset calibration, we initiated

evaluation of the CDR’s performance.

4.2.3. Jitter Tolerance

The channels used for the CDR were FR4 traces of several lengths. Table 4.3, describes

the channels that were used for measurements. The corresponding eye diagram for 5Gb/s

PRBS-7 data of the three channels are also shown in Figure 4.13. It can be seen that for

Channel C, the eye is completely closed coming into the CDR.

Name Description Loss at 2.5 GHz (dB)

Channel A 16” FR4 Trace 5.9Channel B 32” FR4 Trace 9.3Channel C 48” FR4 Trace 12.9

Table 4.3: Channels used for measurements.

The effect of α on the jitter tolerance, is very noticeable for Channel B, whose attenuation

is 9.3 dB at 2.5GHz. Figure 4.14 shows the effect of various α values on the jitter toler-

ance of the CDR for 5Gb/s PRBS-7 data, targeting a BER of 10−12. The CDR operates

optimally for this channel when α = 60mV , achieving a high frequency jitter tolerance of

0.39UIPP . If the DFE is disabled, corresponding to alpha = 0mV , the perfomance with

respect to jitter tolerance worsens as seen by the bumps in the corresponding curve. On

the other hand if the data is over-equalized, the performance drops as well, as seen from

the curve corresponding to an α of 100mV.


Figure 4.13: Measured eye diagrams of the channels.


0.01

0.1

1

10

1E+04 1E+05 1E+06 1E+07 1E+08

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)


104 105 106 107 108

α = 0mV

α = 60mV

α = 100mV

Figure 4.14: Measured jitter tolerance vs. α for Channel B.

In the case of Channel B, there wasn’t a need for the CTLE to be enabled. However,

as the channel length is increased to 48” and consequently the loss increased to 12.9dB

at Nyquist frequency, the need for a CTLE becomes apparent. Figure 4.15 shows the

jitter tolerance curves for different channel lengths. In fact, for Channel A, the CDR

operates to the limit of our jitter source; tolerating 0.63UIPP of high frequency jitter.

For Channel C, the CDR cannot operate with the CTLE disabled. With it enabled, and

with alpha = 60mV , the CDR tolerates up to 0.31UIPP of high frequency jitter.

The channels used for simulation were not available for measurement, and a one-to-

one comparison could not be made directly. Comparing the simulated performance for

Channel 2 (10.7dB loss at 2.5GHz) and the measured performance for Channel B (9.3dB

loss at 2.5GHz), the measured high frequency jitter tolerance is higher than the simulated,

0.39UIPP versus 0.21PP . This higher jitter tolerance in the measurement results can be

accosted to the smaller channel attenuation.


0.01

0.1

1

10

1E+04 1E+05 1E+06 1E+07 1E+08

Jitt

er

Tole

ren

ce (

UI p

p)

Frequency (Hz)


104 105 106 107 108

16-inch FR4, CTLE Off

48-inch FR4, CTLE Off

48-inch FR4, CTLE On

Figure 4.15: Measured jitter tolerance vs. channel length.

4.2.4. Power Comparison

Reducing the power consumption of ADC-based CDRs was one of the main goals in this

work. The chip consisted of the following three power grids:

1. ADC core, DEMUX, CLK Divider, Offset Calibration Blocks

2. Equalizer

3. Digital CDR

It was observed that the digital power consumed by the CDR was 18 mW. Mysteriously,

the power for the ADC grid was 45.6 mW. When comparing this with the ADC and CLK

Divider power for the 3-bit ADC, 3x oversampling case [8], whose power was 52.8 mW,

it was clear that there was something off. This was evident because the 3X case had 21

comparator per UI vs. 8 per UI in this work.

By re-simulating the design, it became clear that there was a bug in the offset calibration

blocks. A node had been left floating unintentionally, leading to a voltage of around 0.6V


appearing on the input of several gates. This led to each calibration block consuming

a constant current, and burning 10 mW of power for all 8 blocks. The simulated ADC

power with this introduced bug was 43 mW. This value was used to obtain a realistic

estimate of the measured power with the bug fixed. The bug-free simulated ADC power

was 33 mW, and by scaling this by the ratio of the simulated and measured powers of

the design with the bug, the estimated ADC and CLK divider power was determined to

be 35.5 mW. The comparison of this work with blind ADC-based CDRs is summarized

in Table 4.4.

CDR DataRate(Gb/s)

Tech.(nm)

ChannelLoss(dB)

ADCPower(mW)

Dig.Power(mW)

CLKPower(mW)

TotalPower(mW)

[5] 5 65 10 110 68.4 NA 178.4[4] 5 65 15 NA NA NA 280[6] 5 65 13.3 NA 57.6 NA 211.2[7] 10 65 10 109 111.6 NA 306[8] 5 65 5.8 38.4 42 14.4 94.8

This Work(CTLE Off)

5 65 9.7 *20.5 18 *15 *53.5

This Work(CTLE Off)

5 65 12.9 *20.5 18 *15 *76.3

*The result reported is the estimate of the measured power as described in the text.

Table 4.4: Channels used for measurements.

Compared to previous blind ADC-based CDRs, it is clear that this work consumes almost

half the power of the most efficient alternative. Importantly, this reduction in power does

not come with a sacrifice in the ability to tolerate high channel loss. The power reduction

of this design is best illustrated graphically in Figure 4.16. This figure compares the

estimated measured power of this design with the most efficient alternative being the 3x

case [8]. Both the digital and ADC powers are halved.


0

5

10

15

20

25

30

35

40

45

ADC Power Digital Power CLK Power

Po

we

r (m

W)

3x Blind CDR

This Work

Figure 4.16: Comparison of power between this work and blind ADC-based CDR with 3xoversampling [8].

5 Conclusion

This thesis presented the work of a 4x oversampling, 3-level ADC-based CDR, sampled

by comparator pairs with symmetric offsets used for equalization. It was the goal of this

work to reduce the overall power consumption in ADC-based CDRs without sacrificing

performance for channels with high loss.

This work incorporates a non-uniformly distributed 3-level ADC. This ADC is composed

of two comparators which have off-chip tunable thresholds set to the first post-cursor

ISI. This is equivalent to the implementation of a speculative DFE architecture. A 2-bit

4-level ADC with a linear distribution would be too coarse for operation as a DFE. Fur-

thermore, this architecture is unique from conventional DFEs in the way that the clock

phase is recovered without recovering the clock. Rather, the clock phase is estimated in

a digital filter as is done in blind ADC-based CDR architectures.

Through measurement of input 5Gb/s PRBS data, the CDR was able to operate for

channel lengths up to 48 inches, corresponding to an attenuation of 12.9 dB at Nyquist.

For a 32 inch channel, the effect of the DFE was evident on the jitter tolerance curve. For

an α = 60mV , the CDR was able to attain a high frequency jitter tolerance of 0.39UIPP .

Meanwhile, with the DFE disabled or α = 0mV , the CDR was on the brink of operation

as evident by the bumps in the jitter tolerance curve.

54

Chapter 5. Conclusion 55

An unintentional mistake during the RTL design of the digital offset calibration blocks

led to a node being left floating. This led to the digital calibration blocks consuming

a constant 10mW of power. After re-simulation of the ADC power grid, in an attempt

to get a realistic estimate of the power for an error free circuit, it was determined that

the ADC and clock divider consumed 35.5 mW of power. Meanwhile, the digital CDR

consumed only 18 mW (measured), tallying up for a total estimate of measured power to

be 53.5 mW. This power is nearly half of the power used in the most efficient alternative

(3x) [8], which consumed 94.8 mW. Without a CTLE, the CDR was able to tolerate up

to 9.7dB of loss; comparable to previous works of 10dB, 15dB, 13.3dB, 5.8dB. With the

CTLE enabled, the CDR was able to tolerate up to 12.9 dB of channel loss, consuming

only 76.3 mW; less than all previous designs.

5.1. Thesis Contributions

The contributions of this thesis are summarized below:

1. Design, simulation, and measurement of a 4x oversampling, 3-level blind ADC-

based CDR.

2. A paper entitled, A 4x, 3-Level Blind ADC-Based Receiver, submitted to the Sym-

posium on VLSI Circuits.

5.2. Future Work

Two possible avenues for future research in this area are:

1. Further reduction in power consumption. The work presented in this thesis con-

sumes 53.5mW for 5Gb/s data rate. This translates to 10.7mW/Gb/s, which is

atleast a factor of two higher than the industry standard.

Chapter 5. Conclusion 56

2. Clock phase distribution to time-interleaved ADC’s. It is well known that any skew

between the clock phases of the time-interleaved ADC’s will result in loss in jitter

tolerance. Techniques need to be developed to either precisely distribute the phases

or to calibrate them afterwards.

References

[1] Timothy O. Dickson, John F. Bulzacchelli, and Daniel J. Friedman. A 12-Gb/s

11-mW Half-Rate Sampled 5-Tap Decision Feedback Equalizer With Current-

Integrating Summers in 45-nm SOI CMOS Technology. IEEE Journal of Solid-State

Circuits, 44(4):1298–1305, April 2009.

[2] Yasuo Hidaka. A 4-CHannel 10.3Gb/s Backplane Transceiver Macro with 35dB

Equalizer and Sign-Based Zero-Forcing Adaptive Control. ISSCC Dig. Tech. Papers,

pages 188–190, 2009.

[3] Jri Lee Ming-Shuan Chen, Yu-Nan Shih, Chen-Lun Lin, Hao-Wei Hung. A 40Gb /

s TX and RX Chip Set in 65nm CMOS. ISSCC Dig. Tech. Papers, 11(22):541–542,

2011.

[4] Hisakatsu Yamaguchi, Hirotaka Tamura, Yoshiyasu Doi, Yasumoto Tomita,

Takayuki Hamada, Masaya Kibune, Shuhei Ohmoto, Keita Tateishi, Oleksiy

Tyshchenko, Ali Sheikholeslami, Tomokazu Higuchi, Junji Ogawa, Tamio Saito,

Hideki Ishida, and Kohtaroh Gotoh. CDR and CMA Adaptive Equalizer in 65nm

CMOS. ISSCC Dig. Tech. Papers, 44(4):168–170, 2010.

[5] Oleksiy Tyshchenko, Ali Sheikholeslami, Senior Member, Hirotaka Tamura, Masaya

Kibune, Hisakatsu Yamaguchi, and Junji Ogawa. A 5-Gb / s ADC-Based Feed-

Forward CDR. IEEE Journal of Solid-State Circuits, 45(6):1091–1098, 2010.

57

References 58

[6] Siamak Sarvari, Tina Tahmoureszadeh, Ali Sheikholeslami, Hirotaka Tamura, and

Masaya Kibune. A 5Gb/s speculative DFE for 2x blind ADC-based receivers in

65-nm CMOS. Symposium on VLSI Circuits, 2:69–70, June 2010.

[7] Clifford Ting, Joshua Liang, Ali Sheikholeslami, Masaya Kibune, and Hirotaka

Tamura. 7.4 A Blind Baud-Rate ADC-Based CDR. ISSCC Dig. Tech. Papers,

pages 122–124, 2013.

[8] M Sadegh Jalali, Clifford Ting, Behrooz Abiri, Ali Sheikholeslami, Masaya Kibune,

and Hirotaka Tamura. A 3x Blind ADC-based CDR. IEEE Asian Solid-State Cir-

cuits Conference (A-SSCC), 4:349–352, 2013.

[9] Afshin Momtaz and Michael M Green. An 80mW 40 Gb/s 7-Tap T/2-Spaced Feed-

Forward Equalizer in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 45(3):629–

639, 2010.

[10] Jian-hao Lu and Shen-iuan Liu. A Merged CMOS Digital Near-End Crosstalk Can-

celler and Analog Equalizer for Multi-Lane Serial-Link Receivers. IEEE Journal of

Solid-State Circuits, 45(2):433–446, 2010.

[11] Horace Cheng, Faisal A Musa, Anthony Chan Carusone, Senior Member, and Ab-

stract Pulse-width Pwm-pe. A 32 / 16-Gb / s Dual-Mode Pulsewidth Modulation

30-dB Loss Compensation Using a High-Speed CML Design Methodology. IEEE

Transactions on Circuits and Systems, 56(8):1794–1806, 2009.

[12] Yong Liu, Byungsub Kim, Timothy O Dickson, John F Bulzacchelli, and Daniel J

Friedman. A 10Gb / s Compact Low-Power Serial I / O with DFE-IIR Equalization

in 65nm CMOS. ISSCC Dig. Tech. Papers, pages 182–184, 2009.

[13] Massimo Pozzoni, Simone Erba, Davide Sanzogni, Marcello Ganzerli, Paolo Viola,

Daniele Baldi, Matteo Repossi, Giorgio Spelgatti, and Francesco Svelto. 8.5 A

References 59

12Gb/s 39dB Loss-Recovery Unclocked-DFE Receiver with Bi-dimensional Equal-

ization. ISSCC Dig. Tech. Papers, 11:541–542, 2010.

[14] Dong Hun Shin, Ji Eun Jang, Frank O’Mahony, and C. Patrick Yue. A 1-mW 12-

Gb/s continuous-time adaptive passive equalizer in 90-nm CMOS. IEEE Custom

Integrated Circuits Conference, (Cicc):117–120, September 2009.

[15] Ricky Yuen, Marcus van Ierssel, Ali Sheikholeslami, William Walker, and Hiro-

taka Tamura. A 5Gb/s Transmitter with Reflection Cancellation for Backplane

Transceivers. IEEE Custom Integrated Circuits Conference, (Cicc):413–416, Septem-

ber 2006.

[16] Jonathan Sewter and Anthony Chan Carusone. A 3-Tap FIR Filter With Cascaded

Distributed Tap Amplifiers for Equalization Up to 40 Gb / s in 0 . 18- m CMOS.

IEEE Journal of Solid-State Circuits, 41(8):1919–1929, 2006.

[17] Altan Hazneci and Sorin P Voinigescu. A 49-Gb/s, 7-Tap Transversal Filter in

0.18um SiGe BiCMOS for Backplane Equalization. IEEE CSIC Digest, pages 4–7,

2004.

[18] Behrooz Abiri, Ali Sheikholeslami, Senior Member, and Hirotaka Tamura. An Adap-

tation Engine for a 2x Blind ADC-Based CDR in 65 nm CMOS. IEEE Journal of

Solid-State Circuits, 46(12):3140–3149, 2011.

[19] John Proakis. Digital Communications. McGraw-Hill Science/Engineering/Math, 4

edition, 2000.

[20] Yue Lu and Elad Alon. 2.2 A 66Gb/s 46mW 3-Tap Decision-Feedback Equalizer in

65nm CMOS. ISSCC Dig. Tech. Papers, pages 30–32, 2013.

References 60

[21] Hideyuki Sugita, Kazuhisa Sunaga, Koichi Yamaguchi, and Masayuki Mizuno. 8.4

A 16Gb/s 1st-Tap FFE and 3-Tap DFE in 90nm CMOS. ISSCC Dig. Tech. Papers,

pages 368–369, 2010.

[22] Mike Hardwoord. A 12 . 5Gb / s SerDes in 65nm CMOS Using a Baud- Rate ADC

with Digital Receiver Equalization and Clock Recovery. ISSCC Dig. Tech. Papers,

pages 436–438, 2007.

[23] Aida Varzaghani, Chih-kong Ken Yang, Senior Member, and Abstract A Gs. A 4 .

8 GS / s 5-bit ADC-Based Receiver With Embedded DFE for Signal Equalization.

ISSCC Dig. Tech. Papers, 44(3):901–915, 2009.

[24] J.S. Bal. Circuit Blocks for an Analog CMOS Decision Feedback Equalizer. 1992.

[25] S. Kasturia and J.H. Winters. Techniques for high-speed implementation of nonlinear

cancellation. IEEE Journal on Selected Areas in Communications, 9(5):711–717,

June 1991.

[26] Adesh Garg, Anthony Chan Carusone, Sorin P Voinigescu, and Senior Member. A

1-Tap 40-Gb / s Look-Ahead Decision Feedback Equalizer in 0 . 18- m SiGe BiCMOS

Technology. IEEE Journal of Solid-State Circuits, 41(10):2224–2232, 2006.

[27] Shunichi Kaeriyama. A 10Gb/s/ch 50mW 120x130um2 Clock and Data Recovery

Circuit. ISSCC Dig. Tech. Papers, pages 264–265, 2005.

[28] Declan Dalton, Kwet Chai, Eric Evans, Mark Ferriss, Student Member, Dave Hitch-

cox, Paul Murray, Sivanendra Selvanayagam, Paul Shepherd, and Lawrence Devito.

A 12.5-Mb/s to 2.7-Gb/s Continuous-Rate CDR With Automatic Frequency Acqui-

sition and Data-Rate Readback. IEEE Journal of Solid-State Circuits, 40(12):2713–

2725, 2005.

References 61

[29] Ke-Chung Wu Jri Lee. A 20Gb / s Full-Rate Linear CDR Circuit with Automatic

Frequency Acquisition. ISSCC Dig. Tech. Papers, 2:366–368, 2009.

[30] Bo Zhang, Ali Nazemi, Adesh Garg, Namik Kocaman, Mahmoud Reza Ahmadi,

Mehdi Khanpour, Heng Zhang, Jun Cao, and Afshin Momtaz. A 195mW / 55mW

Dual-Path Receiver AFE for Multistandard 8.5-to-11.5 Gb/s Serial Links in 40nm

CMOS. pages 34–36, 2013.

[31] Jun Cao and Bo Zhang. A 500 mW ADC-Based CMOS AFE With Digital Calibra-

tion for 10 Gb/s Serial Links Over Multimode Fiber. IEEE Journal of Solid-State

Circuits, 45(6):1172–1185, 2010.

[32] Clifford Ting. A Blind Baud-Rate CDR and Zero Forcing Adaptive DFE For an

ADC-Based Receiver. MASc Thesis, (University of Toronto), 2013.

[33] Marcus Van Ierssel, Hisakatsu Yamaguchi, Ali Sheikholeslami, Senior Member, Hi-

rotaka Tamura, and William W Walker. Event-Driven Modeling of CDR Jitter

Induced by Power-Supply Noise , Finite Decision-Circuit Bandwidth , and Channel

ISI. IEEE Journal of Solid-State Circuits, 55(5):1306–1315, 2008.

A 4x, 3-level blind ADC-based CDR in 65nm CMOS...A 4x, 3-level blind ADC-based CDR in 65nm CMOS Neno...

Documents

Transcript of A 4x, 3-level blind ADC-based CDR in 65nm CMOS...A 4x, 3-level blind ADC-based CDR in 65nm CMOS Neno...