AN 8-BIT PIPELINE ADC IN 65NM WITH 250MHZ NYQUIST FREQUENCY TO OBTAIN 6B ENOB
A 4x, 3-level blind ADC-based CDR in 65nm CMOS...A 4x, 3-level blind ADC-based CDR in 65nm CMOS Neno...
Transcript of A 4x, 3-level blind ADC-based CDR in 65nm CMOS...A 4x, 3-level blind ADC-based CDR in 65nm CMOS Neno...
A 4x, 3-level blind ADC-based CDR in 65nm CMOS
by
Neno Kovacevic
A thesis submitted in conformity with the requirementsfor the degree of Masters of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2014 by Neno Kovacevic
A 4x, 3-level blind ADC-based CDR in 65nm CMOS
Neno Kovacevic
Master of Applied Science, 2014
Graduate Department of Electrical and Computer Engineering
University of Toronto
Abstract
This thesis presents the design, implementation, and measurement of a 4 times over-
sampled, 3-level blind ADC-based CDR. The goal of this work was to provide a blind
ADC-based design that reduced the overall power consumption. This was achieved by
reducing the ADC resolution to 3 levels, while increasing the oversampling ratio to 4.
Also, by non-uniformly distributing the threshold levels of the ADC, the design incor-
porated a speculative DFE. The speculative DFE is implemented with comparator pairs
whose symmetric offsets correspond to the first post cursor ISI. The samples from the
comparator pairs are then passed to the digital CDR which uses them to recover the
clock phase and data. The digital power was measured to be 18mW, while an estimate
for the ADC and CLK Divider measured power was found to be 35.5mW. The estimate
of the total measured power for the chip was 53.5mW.
ii
Acknowledgements
Firstly, I want to thank Prof. Sheikholeslami for his guidance and for providing an en-
riching academic experience.
I also thank my committee members Prof. Chan Carusone, Prof. Liscidini, and Prof.
Prodic for giving me further insight to this work.
I would like to thank Joshua Liang for his invaluable help and experience during mea-
surement. Josh this work would definitely not have been possible without your help.
I am also grateful to Sadegh Jalali for his comparator design contribution. More impor-
tantly, thank you Sadegh for the continuous help during the analog design process, and
your patience to answer my questions when times were stressful.
My learning experience would have taken a serious dent were I not in the presence of
three former students who were not only extremely willing to help, but also very clever.
Thank you Safeen, Cliff, and Ravi.
I am also very glad to have had two great colleagues with whom I shared this Masters
experience. It was a long journey, from the bleak early days of relentless VLSI work to
the very end, when the light at the end of the tunnel began to shine. Thank you Luke
and Jeff for being great friends.
iii
Shayan and Mario, thank you for just being there to joke around with. Much thanks also
goes to the rest of the BA5000 bunch for being friendly colleagues.
Most importantly, I am thankful to my family for their love and support through a
stressful but extremely life-enriching experience.
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 4
2.1 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Clock and Data Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 ADC-based CDRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Proposed Design 17
3.1 Proposed Full Rate Architecture . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Front-End Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Front-End Speculative DFE . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Comparator Architecture . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Phase Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Data Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Complete System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.1 Cycle Slip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Simulations and Measurement 34
4.1 Behavioural Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
v
4.1.1 DFE Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.2 Effect of Comparator Offset on Jitter Tolerance . . . . . . . . . . 41
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Receiver Layout and equipment setup . . . . . . . . . . . . . . . . 43
4.2.2 Offset Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.3 Jitter Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.4 Power Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Conclusion 54
5.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
References 56
vi
List of Tables
4.1 Channel models used for simulation. . . . . . . . . . . . . . . . . . . . . 35
4.2 Pin description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Channels used for measurements. . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Channels used for measurements. . . . . . . . . . . . . . . . . . . . . . . 52
vii
List of Figures
2.1 Block diagram of the transceiver system. . . . . . . . . . . . . . . . . . . 4
2.2 Typical channel frequency response. . . . . . . . . . . . . . . . . . . . . . 5
2.3 Typical channel frequency response. . . . . . . . . . . . . . . . . . . . . . 6
2.4 Inter-symbol interference (ISI). . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Ideal linear equalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Ideal linear equalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 Decision Feedback Equalizer architecture for 1-tap. . . . . . . . . . . . . 9
2.8 Speculative DFE architecture. . . . . . . . . . . . . . . . . . . . . . . . . 10
2.9 Sampling of input data with recovered clock. . . . . . . . . . . . . . . . . 11
2.10 Architecture of a clock and data recovery (CDR) circuit. . . . . . . . . . 12
2.11 Architecture of a phase tracking ADC-based CDR. . . . . . . . . . . . . 13
2.12 Interpolation of phase and data. . . . . . . . . . . . . . . . . . . . . . . . 14
2.13 Architecture of a digital DFE. . . . . . . . . . . . . . . . . . . . . . . . . 14
2.14 Power Comparison of previous ADC-based CDRs of various ADC resolu-
tion and oversampling rate [4] [5] [6] [8]. . . . . . . . . . . . . . . . . . . 15
2.15 Number of comparators as a function of the oversampling rate and the
ADC resolution. The 2x-5bit designs [4] [5] [6] used 62 comparators per
UI, and the 3x-3bit [8] used 21 comparators per UI (Assuming a flash ADC
is used). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
viii
3.1 A Full rate system block diagram of the proposed speculative ADC-based
CDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB
Channel at Nyquist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Full-rate Digital CDR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Pulse Response of a 7dB Channel at Nyquist. . . . . . . . . . . . . . . . 21
3.5 MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB
Channel at Nyquist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 The comparator has offset cancellation in order to have a tuneable offset. 24
3.7 The UI window of 5 consecutive samples. . . . . . . . . . . . . . . . . . . 25
3.8 Phase estimation is done by determining the transition point in the UI. . 25
3.9 The phase estimate from the POS and NEG sets is averaged to obtain
the true zero-crossing. This is the operation of the systems full-rate phase
detector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.10 The phase estimate from the phase detector, φx, is subtracted by the loop
filter average phase, φAV G, to produce φERR, which is fed back to the loop
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.11 Behavioural simulation showing the comparison of a zero-crossing PD
scheme and the proposed one. . . . . . . . . . . . . . . . . . . . . . . . . 27
3.12 The selection of the correct sample is based on the position of φPICK . . . 28
3.13 The complete system architecture. . . . . . . . . . . . . . . . . . . . . . . 29
3.14 The complete digital CDR architecture. . . . . . . . . . . . . . . . . . . . 30
3.15 The selection of the correct set of samples is done with a chain of MUXes. 31
3.16 Cycle Slip; data rate is faster than blind sampling clock. . . . . . . . . . 32
3.17 Cycle Slip; data rate is slower than blind sampling clock. . . . . . . . . . 32
4.1 Simulink Model of the entire system. . . . . . . . . . . . . . . . . . . . . 35
4.2 Simulated eye diagrams for Channel 1, before and after DFE. . . . . . . 37
ix
4.3 Simulated eye diagrams for Channel 2, before and after DFE. . . . . . . 38
4.4 Jitter tolerance curves for Channels 1 vs. α. . . . . . . . . . . . . . . . . 39
4.5 Jitter tolerance curves for Channels 2 vs. α. . . . . . . . . . . . . . . . . 39
4.6 Simlated eye diagrams of channel 3, before CTLE, after CTLE, and after
DFE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.7 Jitter tolerance curves for Channel 3 vs. α. . . . . . . . . . . . . . . . . . 41
4.8 Jitter tolerance curves for Channel 1 vs. standard deviation of comparator
offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.9 Jitter tolerance curves for Channel 2 vs. standard deviation of comparator
offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.10 Measurement setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.11 Chip die photo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.12 Detailed equipment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.13 Measured eye diagrams of the channels. . . . . . . . . . . . . . . . . . . . 49
4.14 Measured jitter tolerance vs. α for Channel B. . . . . . . . . . . . . . . . 50
4.15 Measured jitter tolerance vs. channel length. . . . . . . . . . . . . . . . . 51
4.16 Comparison of power between this work and blind ADC-based CDR with
3x oversampling [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
x
List of Acronyms
ADC Analog to Digital Converter
BER Bit-Error Rate
CDR Clock and Data Recovery
CTLE Continuous Time Linear Equalizer
DFE Decision Feedback Equalizer
ENOB Equivalent Number of Bits
FFE Feed Forward Equalizer
FIR Finite Input Response
FSM Finite State Machine
Gb/s Gigabits per second
GSa/s Giga-samples per second
IIR Infinite Impulse Response
ISI Inter Symbol Interference
MUX Multiplexor
NRZ Non-return to Zero
xi
PCB Printed Circuit Board
PCIe Peripheral Component Interconnect Express
PD Phase Detector
PI Phase Interpolator
PLL Phase-Locked Loop
PRBS Pseudo-Random Binary Sequence
SATA Serial Advanced Technology Attachment
UI Unit Interval
VCO Voltage Controlled Oscillator
VGA Variable Gain Amplifier
xii
1 Introduction
As silicon technology continues to scale to smaller geometries, the density of integrated
circuits continues to increase. With increased complexity and computational power,
faster data rates are required for chip to chip communication. New generations of media
standards such as PCI Express (PCIe) and SATA continue to push the limits on data
rates. This demand in industry leads to the need for innovation in the design of our
circuits. [1] [2] [3].
1.1. Motivation
In chip-to-chip communication, data is sent over a channel that consists of a few inches of
printed circuit board (PCB) trace. In an ideal communication system, the transmit data
would be received by the receiver chip without any attenuation. However, in practical
systems the channel exhibits a finite bandwidth, attenuating higher-frequency content of
the transmit signal. This spreads the bits, sent from the transmitter, into adjacent unit
intervals (UI), making the task of clock and data recover (CDR) difficult.
Conventional phase-tracking CDR with analog equalizers are used to recover these bits.
Alternatively, blind ADC-based CDRs offer appealing benefits such as the elimination of
analog feedback. They also simplify the design process and open the window for more
complex detection schemes via the ease of implementation with fully digital circuits. One
1
Chapter 1. Introduction 2
of the major flaws with these designs is that they tend to be power hungry. Previous
designs have used 5-bit ADCs with an oversampling ratio of 2 (2x) [4] [5] [6] [7], and 3-bit
ADCs with an oversampling ratio of 3 (3x) [8]. One of the driving forces for the high
resolution was to enable equalization entirely in the digital domain. However, this has
come with the price of high power consumption. This leads to the question of whether
it is possible to reduce power without sacrificing jitter tolerance.
1.2. Thesis Objectives
This thesis presents an alternative design technique for equalization in ADC-based CDRs
in an attempt to reduce power without sacrificing performance. The main objectives of
this thesis are:
• To provide insight to the background of high-speed signalling, and in particular to
the ADC-based CDRs.
• To propose an alternative architecture for an ADC-based CDR in order to reduce
power consumption.
• Provide implementation, simulation, and measured results of a working chip and
compare it against other ADC-based designs.
1.3. Thesis Outline
This thesis is organized as follows:
• Chapter 2 provides a background on equalization, clock and data recovery, and
ADC-based CDRs.
• Chapter 3 presents the proposed design. The block level diagrams as well as circuit
diagrams are illustrated.
Chapter 1. Introduction 3
• Chapter 4 first presents and discusses the simulated results, followed by the mea-
surement process and results, as well as a comparison with previous works.
• Chapter 5 summarizes the thesis and provides directions for future research.
2 Background
In this chapter, we briefly review some of the key concepts in chip-to-chip communica-
tions, including the limitations of the channel, conventional techniques in equalization
and clock and data recovery, and techniques pertaining to ADC-based equalization and
clock and data recover.
As shown in Figure 2.1, the transceiver model consists of two parts, each containing a
digital core designed for a specific application. When communicating, one transceiver
may act as a transmitter and one as a receiver.
Chip 2Chip 1
DigitalCore
Transmitter(TX)
DigitalCore
Receiver(RX)
Channel
Figure 2.1: Block diagram of the transceiver system.
The transmitter needs to send information to the receiver over a medium, known as the
channel. This channel can take the form of optical fibres or Ethernet cables in large
servers, or just printed circuit board (PCB) traces in chip-to-chip signalling. However,
these channels are limited in bandwidth; being unable to pass signals above a certain fre-
quency without attenuating them. A typical response for the insertion loss of the channel
4
Chapter 2. Background 5
(the ratio of the power received at the receiver to the power sent at the transmitter) is
shown in Figure 2.2. As can be seen, the channel attenuation increases with frequency,
exhibiting a low pass characteristic.
107
108
109
1010
−35
−30
−25
−20
−15
−10
−5
0
Frequency (Hz)
Inse
rtio
n Lo
ss F
requ
ency
Res
pons
e (d
B)
Figure 2.2: Typical channel frequency response.
The main challenge in designing transceivers is to increase the data rate (fb) while being
able to tolerate the signal loss. In the transceiver model, the transmitter sends almost
ideal data (data with sharp edges) at baud rate, but when it arrives at the receiver, the
loss in the frequency response translates to a spread of the signal in the time domain.
This is demonstrated in Figure 2.3. The transmit pulse of one unit interval (UI=1/fb),
takes some time to arrive at the receiver, and it spreads to several UIs.
The main issue with the non-ideal received pulse is that the pulse response corresponding
to one UI is spreading over the adjacent UIs. This phenomenon is known as inter-symbol
interference (ISI). For the receiver to operate properly, it must detect the correct bits at
an acceptable bit error rate (BER). Usually, the standard for the receiver is to make less
than one error for one trillion bits; equivalently a BER of less than 10−12.
Chapter 2. Background 6
1
0
1 UI
TX Pulse
RX Pulse
Figure 2.3: Typical channel frequency response.
h0
1 UI
h1
h2
Figure 2.4: Inter-symbol interference (ISI).
However, the presence of ISI makes the detection of error free data increasingly difficult.
The desireable part of the received pulse with ISI is known as the main cursor, defined
as h0 in Figure 2.4. The interference in the current UI from the previous bit is known as
the first post cursor, h1; similarly, the interference from the nth previous bit is hn. These
post-cursors are detrimental to the recovery of the desired main cursor. The elimination
of ISI is the one of the main challenges in transceiver design.
Chapter 2. Background 7
2.1. Equalization
The process of eliminating ISI is known as equalization. Various equalizer circuits exist
both in the analog and digital domains [4] [6] [9] [10] [11] [12] [13] [14] [15].
The frequency response of a band-limited channel is similar to that of a low pass filter
with several poles. A simple solution to extend the bandwidth is to implement a circuit
whose transfer function contains a zero near the channel’s cut-off frequency. Thus this
transfer function provides an amplification to counter the signal attenuation caused by
the channel. This technique, which is known as linear equalization, is illustrated in Figure
2.5.
fb/2 fb/2 fb/2X =
Figure 2.5: Ideal linear equalization.
There is a downside, however, to the design of a circuit with an ever-increasing frequency
response. The presence of white noise at the input of such a circuit would prove to
be detrimental as the high frequency noise would be greatly amplified. Consequently
this would make detection of the received signal impossible. An alternate solution is to
extend the bandwidth by providing a boost to frequency components which lie slightly
higher than the cut-off. Figure 2.6 shows this technique of linear equalization. The linear
equalizers are implemented either as infinite impulse response (IIR) [2] [10] [14] or finite
impulse response (FIR) filters [9] [16] [17], both in the analog domain. However, if the
Chapter 2. Background 8
input is sampled by an ADC at the front end, an equivalent filter can be implemented
as a feed-forward equalizer (FFE), allowing for digital implementation [5] [6] [18].
fb/2
fb/2
fb/2X =
Figure 2.6: Ideal linear equalization.
Approaching the problem of ISI from the time domain, it becomes apparent that if the
post-cursor ISI can be eliminated at the sampling instance, then only the main cursor
is present. At the centre of the nth UI, the received signal xn can be represented as in
Equation 2.1 below [19]. vn is the white Gaussian noise at the sampling instance, hn is
the impulse response of a bandlimited channel, and In is a sequence of symbols from the
transmitter corresponding to the set of non-return to zero data (NRZ) of { -1, 1 }.
xn = In · h0 + isi+ zn = In · h0 +∞∑
k=0, k 6=n
Ik · hn−k + zn, n= 0, 1, 2 . . . (2.1)
Without any equalization, a slicer would simply produce high or low depending on the
sign of the signal. Ideally, it would detect the sign of the main cursur, h0. However, it
can so happen that enough post-cursor ISI is introduced for a slicer to sample the signal
to be a sign opposite to that of the main cursor. By eliminating the post-cursor ISI, this
issue can be circumvented. A type of non-linear equalizer that employs this technique is
known as the Decision Feedback Equalizer (DFE) [1] [20] [21] [22] [23], shown in Figure
2.7.
Chapter 2. Background 9
FF
bn
bn-1
xn
w1a1=bn-1·w1
yn = xn - a1
Critical Path
Figure 2.7: Decision Feedback Equalizer architecture for 1-tap.
A DFE operates on the assumption that the previous bits are detected correctly. Based
on this assumption, and earlier knowledge of the relative values of the post-cursors to
the main cursor, the portion of the signal due to these components are subtracted at
the summing node of the circuit. Thus, if the signs of the previous bits are stored, then
by scaling the signs of these bits with weights corresponding to their ratios of the main
cursor, the ISI’s due to these cursors is eliminated. For example, if the first post-cursor
dominates the ISI, then by scaling the previous bit by the weight w1 (this is a 1-tap DFE
architecture), the signal being sliced becomes:
yn = In · h0 + In−1 · h1 − w1 · bn−1 + vk = In + vk (2.2)
This decision feedback structure can easily be expanded to more weights, by simply
adding additional flip flops in the feedback path. The circuit design of the DFE is
straight forward if implemented in a current summing structure where the current source
of each post-cursor differential pair is scaled relative to the input differential pair of the
current bit [24].
The conventional DFE architecture is often difficult to design due to the stringent timing
constraint of the feedback path. The delay of this path must be less than one bit period.
Chapter 2. Background 10
Especially with the presence of multiple DFE weights, the loading at the summing node is
further increased and limits the overall speed . An efficient alternative design eliminates
the feedback loop of the decision entirely, relaxing the timing constraint [25]. Parallelism
is employed by having two dedicated paths, one assuming the previous bit was a one and
the other assuming the previous bit was a zero. This architecture, shown in Figure 2.8, is
known as the look-ahead architecture also known as the speculative DFE. The two paths
provide speculative bits b+n and b−n from which the correct bit is then selected from a
MUX whose select signal is the correct previous bit. This design allows for higher data
rates, at the price of increased hardware requirements [26].
FFbnxn bn-1
1
0
a1
a1
y-n = xn + a1
y+n = xn - a1 b+
n
b-n
Figure 2.8: Speculative DFE architecture.
2.2. Clock and Data Recovery
Equalization is often incorporated both at the transmitter and at the receiver. Once the
receiver has equalized the incoming data, its next and ultimate goal is to sample the data
precisely to obtain the correct bits that were sent. The receiver circuit generates a clock
from the received data such that the rising edge of the clock (for example) is aligned with
the centre of the data. This process of recovering the data, as well as extracting a clock
to sample the received data is called clock and data recovery, as illustrated in Figure 2.9.
Chapter 2. Background 11
In essence, sampling the data at the centre of the UI when it is most correlated to the
main cursor has the highest probability of obtaining the correct bit.
Recovered CLK
Input Data
Data Sampled at UI Center
Figure 2.9: Sampling of input data with recovered clock.
The block diagram for a simple CDR architecture is shown in Figure 2.10. The phase
detector (PD) uses the information from the transitions of the data to adjust the optimal
sampling position of the clock. Any misalignment in phase between the clock and the
data, φERR, is forced to zero through a stable negative feedback. The charge pump uses
the relative information in phase between the data and the clock to increase or decrease
a voltage which is filtered and fed to the voltage controlled oscillator (VCO) or phase
interpolator (PI). The VCO can also adjust for any frequency offset that might occur
between the data and the clock.
This architecture, known as a phase tracking CDR [27] [28] [29], is similar to that of
a phase-locked loop (PLL), except for the fact that the input data is not periodic. To
account for this difference, the phase detector for the CDR only updates the phase infor-
mation that is propagated through the loop when there is a data transition.
Chapter 2. Background 12
Phase Detector
(PD)
Charge Pump(CP)
Loop Filter(LF)
VCO CK
FF
Din
Dout
Figure 2.10: Architecture of a clock and data recovery (CDR) circuit.
2.3. ADC-based CDRs
An alternative to the conventional phase tracking CDR is the ADC-based CDR architec-
ture. The latter sample the data and convert it to digital. The benefits of the ADC-based
CDRs are that the equalization can take place entirely in the digital domain. Figure 2.11
(a), shows the basic architecture for a phase tracking ADC-based CDR. In this archi-
tecture, equalization can take place in both the analog and digital domain; however, an
analog feedback is still required to adjust the sampling phase of the clock. The bene-
fit of phase tracking CDRs is that they tend to be able to tolerate lossier channels [30] [31].
It is possible to eliminate the analog feedback and sample the data by a blind clock.
This CDR structure, depicted in Figure 2.11 (b), is known as a blind ADC-based CDR
[4] [5] [6] [7] [8]. The obvious benefits of this structure is that a VCO/PI is completely
eliminated from the design, and the clock and data recovery becomes entirely digital.
With such a structure being in the digital domain, the design could be easily ported to
other technologies. On the other hand, since the data is blindly sampled, the digital
CDR must internally estimate the average transition phase, φAV G, from the incoming
samples. In order to have an accurate phase estimate, blind ADC-based CDRs normally
Chapter 2. Background 13
Digital CDR+DFE
N-bit ADC
AnalogEqualizer
DOUT
CLKREC
RXIN
Analog Feedback
Digital CDR+DFE
N-bit ADC
AnalogEqualizer
Blind CLK
DOUTRXIN
(a) Phase Tracking ADC-based Receiver
(b) Blind ADC-based Receiver
ΦREC
For (a) CLKREC
For (b) ΦREC ΦAVG ΦPICK ΦAVG
Data
(c) Recovered Clock
Figure 2.11: Architecture of a phase tracking ADC-based CDR.
oversample the data. The blind sampling ADC-based CDR uses the phase estimate to
precisely select the correct sample in the current UI window. A comparison between
the recovered clock in the phase tracking and the recovered phases in ADC-based CDR
is shown in Figure 2.11(c). The recovered phase in the ADC-based CDR also contains
the estimate of the UI centre phase, φPICK . Usually, the UI window is represented with
a digital number; therefore, φPICK is simply equal to φAV G + 0.5UI. The data decision
scheme is very flexible due to the ease of design in digital. In previous works, interpolating
and extrapolating schemes have been employed [4] [5] [6] [7] [8].
Figure 2.12 illustrates the operation of an ADC-based CDR with an oversampling rate
of 2 (2x). In this case, the CDR uses interpolation for both phase detection and for data
selection; this was the technique used in previous works [5] [6]. In the 2x case, the ADC
has 5 bits. This high resolution is necessary to obtain an accurate estimate of the phase
via interpolation between adjacent samples. The average UI centre, φPICK , is used to
interpolate between two adjacent samples. The value of the interpolated data sample is
Dn. This high resolution in the ADC also permitted these works to implement an FFE,
allowing for higher channel losses.
Chapter 2. Background 14
UI Window
Φx Φx Φx Φx
0.5 UI
ΦPICK
S1
S2
S3
S4
S5
S6S7
S8 S9
ΦPICK ΦPICK ΦPICK
D1
D2
D3
D4
Figure 2.12: Interpolation of phase and data.
FF
FF
DFE Consant
DFE Consant
0
1
Previous Bit
Dn bn
Figure 2.13: Architecture of a digital DFE.
A DFE was viable in both the 5-bit and 3-bit ADC-based CDRs [5] [6] [8]. The digital
DFE worked as follows: once the data bit was obtained either through interpolation or
extrapolation, a DFE constant, represented as a digital number, would be subtracted and
added, as in the case of the speculative DFE architecture. Finally, a MUX would select
the sign of the correct speculative bit while discarding the other, based of the previous
bit. Clearly, the DFE would only be making a difference if the unequalized sample is
slightly below or above zero, such that the DFE coefficient bumps it up or down just
Chapter 2. Background 15
above the threshold, such that the sign is flipped. A DFE was feasible for both designs
due to a substantial number of levels in the voltage domain. Clearly, if the ADC resolu-
tion is too coarse, then equalization has little effect.
0
50
100
150
200
250
300
Tota
l Po
we
r (m
W)
Power of ADC-based CDRs
2X 5-bit CDR [5]2x 5-bit CDR [6]2x 5-bit CDR [4]3x 3-bit CDR [8]
Figure 2.14: Power Comparison of previous ADC-based CDRs of various ADC resolution andoversampling rate [4] [5] [6] [8].
As seen in Figure 2.14, one of the main drawbacks with ADC-based CDRs of the previous
designs, all at 5Gb/s operation, is that they all consumed more than 100mW of power,
(95 mW for the 3x case) [4] [5] [6] [8]. However, by taking a closer look at the comparison
in power between the case of 5-bit 2x vs. 3-bit 3x, it is clear that the latter consumes
less power. The key contributor to this reduction is the reduction of comparator use per
UI as seen in Figure 2.15.
In the 5-bit 2x case, 62 comparators sample the data in one UI. This is in contrast with
the 3-bit 3x case, where only 21 comparators sample the data per UI. However, there
comes a penalty with the reduction of resolution. The 5-bit 2x case is able to implement
an FFE, which provides significant benefits for higher channel loss; while the implementa-
Chapter 2. Background 16
12
34
5
1
2
3
4
50
50
100
150
200
555
151515
444
353535
121212
333
Oversampling Rate
757575
282828
999
222
155155155
606060
212121
Comparator Use UI vs. ADC Bits and OSR
666
111
124124124
454545
141414
333
939393
303030
777
ADC bits
626262
151515
313131
Com
para
tor
Use
per
UI
Figure 2.15: Number of comparators as a function of the oversampling rate and the ADCresolution. The 2x-5bit designs [4] [5] [6] used 62 comparators per UI, and the 3x-3bit [8] used21 comparators per UI (Assuming a flash ADC is used).
tion of one in the 3X case is not feasible. Following the trend, as the resolution continues
to decrease, the FFE becomes impractical, but as mentioned previously, as does the DFE.
The next chapter proposes a design that reduces the power even further by way of re-
duction of total number of comparators per UI.
3 Proposed Design
The previous works on blind oversampling ADC-based CDRs have all been somewhere on
the equivalent number of bits (ENOB) vs. oversampling rate spectrum, as seen in Figure
2.15. The higher the product of the oversampling rate and 2ENOB − 1, the greater the
number of comparators the system uses per UI. Consequently, this increases the power of
the analog front-end and to a lesser extent also of the digital CDR. However, the benefit
of having a higher ENOB is that it allows digital equalization to be more precise. For
example, for binary oversampling, digital equalization is not possible because there is no
amplitude information in the samples. On the other hand, a 2x-oversampled 5-bit ADC-
based CDR, has 32 levels in the voltage domain . This enables the implementation of a
digital DFE by simply subtracting a number that corresponds to the ISI at the sampling
instance. Of course, schemes for adjusting the DFE coefficient based on the sampling
instance would be needed, but it has been done [4] [5] [6] [8].
The focus of this thesis is an architecture that reduces the analog power by reducing
the ENOB to 1.5-bit or equivalently 3 ADC-levels, but by increasing the oversampling
rate to 4. This architecture, which makes 8 comparator uses per UI, is implemented
by two comparators with offsets that are symmetric about the common-mode. The two
comparators form a 3-level ADC with adjustable thresholds. Alternatively, this can be
looked at as a speculative DFE. In this view, the comparators’ offsets represent the tap
weight corresponding to the 1st post-cursor ISI. The outputs of the comparators are
17
Chapter 3. Proposed Design 18
speculative binary bits, which are used for data-decision inside the Digital CDR.
3.1. Proposed Full Rate Architecture
In the full rate architecture shown in Figure 3.1, the receiver input is first equalized by
a CTLE, to allow for channels with higher losses. The CTLE design [32](produced by
Clifford Ting, a former MASc student in the group) is a source degenerated pair, with
tunable one-hot coded resistors. The eye diagrams shown above the system correspond
to a behavioural simulation from a channel with 18dB attenuation at Nyquist. It can be
seen that before the CTLE, the eye is completely closed, and the CTLE opens the eye
by 0.1 UI.
CS Flag
DOUT
S[1:4]P
5Gb/s
RX
20 GHz CLK ÷4
5 GHz
DigitalCDR
POS
NEG
Comparator Pair
4+α
-α
1
2
CTLE/VGA
7CLTE
4VGA
Offset Calib.Engine
4
Offset Calib.Engine
S[1:4]N
Eye Before CTLE Eye After CTLE Composite Eye
4
4
0.1 UI 0.45 UI
Figure 3.1: A Full rate system block diagram of the proposed speculative ADC-based CDR.
Chapter 3. Proposed Design 19
This equalized data is then sampled by two comparators, one with a positive offset (POS)
and one with a negative offset (NEG). The two comparators sample the 5 Gb/s data at
20GSa/s, equivalently 4 samples per UI per comparator. A composite eye after the DFE
is generated by combining the speculative outputs of the comparators. The composite
eye is opened even further to 0.45 UI.
Figure 3.2 shows a behavioural simulation of a 5Gb/s input to the receiver from a channel
with a 7dB attenuation at Nyquist.
14.5 14.7 14.9 15.1 15.3 15.5 15.7
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
time (ns)
Vol
tage
(V
)
Analog Input to CDR for a 7dB Loss at Nyquist Channel
Figure 3.2: MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB Channelat Nyquist.
However at this stage, it is not known what the previous bit was or which sample corre-
sponds to the centre of the UI. Therefore, all of the samples must be sent to the Digital
CDR where this information can be processed. The full-rate Digital CDR is shown in
Figure 3.3.
In the full-rate architecture, there are 8 samples in one CDR clock cycle, 4 from each
comparator path. The CDR stores one sample from the next cycle. This is vital for
Chapter 3. Proposed Design 20
÷2 LF
DFE Decisions
FF
Φx ΦAVG
bPREV.
CS Flag
ΦxP
ΦxN
Cycle Slip Detection
DOUT
S[1:5]N S[1:5]P
PD
(POS)
(NEG)
DD(POS)
(NEG) bN
bP
Arrange Data
FR Digital CDR
5 GHz CLK
S[1:4]P
2
1
S[1:4]N
Figure 3.3: Full-rate Digital CDR.
phase detection where the entire UI must be represented by 5 samples; where the 5th
sample is the first of the following UI. Note that for a causal system a delay is added to
the four current samples such that the 5th sample in the UI window is available. This
data is then used to extract the phase information from the samples, and to make a
data decision for each UI window. The data decision block simply chooses the sample
from both paths that is closest to the centre of the UI via the estimate of φPICK . Two
speculative bits are extracted from the data decision for one UI, the correct speculative
bit is then chosen in the DFE Decision Block.
3.2. Front-End Architecture
3.2.1. Front-End Speculative DFE
The equalization of the proposed design is achieved on the analog end by adjusting the
DC offset of the comparators sampling the data. Naturally for the purpose of DFE, this
is done in pairs of comparators whose offsets are symmetric about the common mode.
Chapter 3. Proposed Design 21
The offset is chosen to match the magnitude of the first post-cursor ISI and defined as
α; with one comparator having a postive differential offset and its complementary pair
having a negative one. Figure 3.4 shows the pulse response of the output of a 7dB channel
at Nyquist. In this case, the magnitude of the differential offset would correspond to the
voltage at first post cursor.
14.5 14.7 14.9 15.1 15.3 15.5 15.7 15.9
0
0.1
0.2
0.3
0.4
0.5
time (ns)
Vol
tage
(V
)
Pulse Response for a 7dB Loss at Nyquist Channel
h0
h0 = |α|
Figure 3.4: Pulse Response of a 7dB Channel at Nyquist.
For instance, if the previous bit was a 1, the previous bit’s pulse response would introduce
an α amount of undesired positive ISI at the precise centre sampling point of the current
bit. In this case, by sampling the current bit using the comparator with the positive
offset, we are effectively cancelling the first post-cursor ISI. Similarly, if the previous bit
is 0, we will use the comparator with a negative offset to sample the current bit.
The only issue is that in the implemented system the previous bit is unknown at the
comparator output stage. If the decision feedback were to be implemented at full-rate, a
flip-flop would have to be designed to operate for 5Gb/s. The proposed system simplifies
Chapter 3. Proposed Design 22
this timing requirement by storing the outputs of both comparators, thus acquiring both
cases, and feeding the DeMUXed data forward to the digital CDR. In the digital CDR,
a finite state machine (FSM) determines the correct comparator output to be chosen.
Figure 3.5 depicts the analog and digital inputs for the same waveform snippet as in
Figure 3.2.
It is important to observe the differences in the POS comparator waveform and the NEG
comparator waveform. Specifically how the window length for a specific bit is different
for the two cases; this is even more true with the presence of jitter. If our decision scheme
has to decide between two samples that are opposite, then it is crucial in selecting the
correct comparator output; hence the operation of the DFE.
3.2.2. Comparator Architecture
The implementation of the comparator is shown in Figure 3.6. This comparator (de-
signed by Sadegh Jalali, a Ph.D student in the group) is a strong arm latch with offset
cancellation transistors connected to both sides.
There are three offset transistors per side, which are binary weighted. This comparator
architecture is capable of tolerating a total offset of 250mV in either direction. However,
this would correspond to the absolute worst case in regards to mismatch. The offset
transistors all have an identical off-chip bias allowing for tunability. Therefore, the res-
olution of the offset transistors corresponds directly to the worst case total offset of any
comparator.
The offset can be either calibrated manually or automatically from a small finite state
machine, via the control signal calib. During calibration, the two input pairs are forced
to have the same input, so that the differential input would be zero. The offset due to
mismatch is then calibrated. During normal operation, the input pair would be forced
to the differential offset of α, corresponding to the first post-cursor ISI. The differential
Chapter 3. Proposed Design 23
Figure 3.5: MATLAB Behavioural Simulation of 5Gb/s Receiver Input from a 7dB Channelat Nyquist.
Chapter 3. Proposed Design 24
vddvddVbias
Vbias
OSP2
calib
10
RefP
VinP
calib
10
clk clk
clkclkclk clk
VoutM
Positive Offset Cancellation
Input Pairs Negative Offset Cancellation
Reset and Positive Feeback
OSP1 OSP0 OSN2 OSN1 OSN0RefP
RefM RefM
VinM
VoutP
Figure 3.6: The comparator has offset cancellation in order to have a tuneable offset.
α is directly connected to two input pins on the chip. Therefore, it can be easily tuned
with a DC-voltage supply.
3.3. Phase Detection
The purpose of the phase detection block is to detect the zero-crossings in each of the
UIs. For this full-rate system with an oversampling rate of 4, the UI window consists
of five consecutive samples as seen in Figure 3.7. The fifth sample is the first sample of
the next UI. To make the system causal, a delay is added such that the four incoming
samples belong to the previous CDR cycle, while the fifth sample is from the current
Chapter 3. Proposed Design 25
cycle.
A B C D E
UI Window
Figure 3.7: The UI window of 5 consecutive samples.
The phase detection scheme is done by XORing the adjacent binary samples for each
path, as seen in Figure 3.8. If there exists a transition it is assigned a 3-bit weight
corresponding to its position in the UI. However, this will produce two sets of zero
crossings: one with respect to +α and another with respect to -α. In order to obtain the
true zero crossing with respect to the common mode, the complementary phases (φPOSx
and φNEGx ) of the two paths are averaged.
XOR
XOR
XOR
XOR
A
B
C
D
E
1/8
3/8
5/8
7/8
Phase Detector Slice
ΦXPOS or NEG
Figure 3.8: Phase estimation is done by determining the transition point in the UI.
Chapter 3. Proposed Design 26
This is shown in Figure 3.9, where φPOSx and φNEG
x from the POS and NEG paths are
summed and divided by two to produce φx. This is then subtracted by the average phase
from the filter to produce φERR. The φERR is averaged by the filter to provide an estimate
for the average zero crossing, φAV G, as seen in Figure 3.10. Ideally as in any closed loop
system, the absolute φERR is minimized, and its value provides an observation point for
debugging tracking issues. The filter is third order, with the ability to track frequency
ramps.
1/8
3/8
5/8
7/8
POS PDSlice
ΦXPOS
1/8
3/8
5/8
7/8
NEG PDSlice
ΦXNEG
½
ΦX
Figure 3.9: The phase estimate from the POS and NEG sets is averaged to obtain the truezero-crossing. This is the operation of the systems full-rate phase detector.
Through behavioural simulations, we observed that the proposed PD operates just as
effectively as a phase detection scheme whose samples are the outputs of zero-offset com-
parators. Note, the zero crossing PD is not the optimal PD, as the data is unequalized.
However, as will be shown shortly, the proposed technique has significant benefits in
opening the horizontal eye. Figure 3.11, shows the comparison between a zero crossing
PD that also XORs adjacent samples vs. the proposed PD. Both of these phase detection
system are operating on samples that have been oversampled by 4.
Chapter 3. Proposed Design 27
FFK1 FFK2 FFK3K3
φAVG
Loop Filter
Phase Detector
φERRφX
Figure 3.10: The phase estimate from the phase detector, φx, is subtracted by the loop filteraverage phase, φAV G, to produce φERR, which is fed back to the loop filter.
4000 5000 6000 7000 8000 9000 10000 110000
100
200
300
400
500
600
700
800
900
1000
CDR Clock Cycle
φAVG
(UI=
1024)
Phase Detection of Zero Crossing PD vs. Proposed PD
Proposed PD
Zero−Crossing PD
FrequencyOffset = 20ppm
Figure 3.11: Behavioural simulation showing the comparison of a zero-crossing PD schemeand the proposed one.
3.4. Data Decision
The 8 samples from the POS and NEG paths (4 samples each) are passed to the Data
Decision along with φAV G. The first sample from the following UI is not passed, as it
is only needed for phase detection. Since φAV G determines the average phase crossing,
then the average UI centre is defined as φPICK , where φPICK = φAV G + 0.5UI.
Chapter 3. Proposed Design 28
The data decision scheme is straight forward, the sign of the sample in the UI window
closest to φPICK is chosen as the bit of that UI window. Figure 3.12 illustrates the
selection process. Importantly, the selection of the centre sample in each UI window is
done for both the POS set of 4 samples and the NEG set. Each of these two paths will
independently provide a speculative bit, based on φPICK . Then the correct bit, either bP
or bN , is selected based on the bit from the previous cycle, bprevious. If bprevious is 1 then
b = bP , otherwise b = bN .
A B C D A
0 10241 UI Window
φPICK
Figure 3.12: The selection of the correct sample is based on the position of φPICK .
As indicated above, the DFE selection process for the full-rate system is a simple MUX,
which is controlled by the bit from the previous CDR cycle. However, the complete
system is not full-rate and is a DEMUXed system, which will be presented in the next
section.
3.5. Complete System Architecture
The complete system architecture is shown in Figure 3.13. The input 5Gb/s data is
sampled by 8 comparator pairs. The clocking network is generated via a clock divider,
Chapter 3. Proposed Design 29
which divides a 10GHz external clock into 8 2.5GHz phases (one phase per comparator
pair). The timing between consecutive phases is 50ps; therefore having a total sampling
rate of 20GSa/s. In one cycle of the 2.5GHz clock, the comparator chain obtains 8
sample pairs, corresponding to 2UIs. These samples are then DEMUXed by a factor of
4; producing 32 POS samples and 32 NEG samples which correspond to 8 UI windows.
These samples are then fed to the digital CDR, shown in Figure 3.14, which is triggered
on a divided-by-four rate of one of the clock phases. The digital implementation of CDR
is just a scaled-rate version of the full-rate blocks. The phase detector and the data
decision for the complete system have 8 slices of the full-rate versions, the filter remains
identical for both cases, and the DFE Decisions becomes a chain of MUXes instead of a
single one (as illustrated in section 3.4.
CS Flag
DOUT
S[1:32]PS[1:8]P
5Gb/s
RX
10 GHz CLK
÷48
2.5 GHz 625 MHz
DigitalCDR
DEMUX 8:32
POS
NEG
Comparator Pair
4+α
-α
8
2
CTLE/VGA
7CLTE
4VGA
1
Clock Divider
Offset Calib.Engine
4
Offset Calib.Engine
x8
S[1:8]N DEMUX 8:32
S[1:32]N
Figure 3.13: The complete system architecture.
In the complete system, the Data Decision block detects 8 bit pairs per CDR cycle. These
Chapter 3. Proposed Design 30
÷2 LF
DFE Decisions
FF
Φx[1:8] ΦAVG
b[8]
CS Flag
Φx[1:8]P
Φx[1:8]N
Cycle Slip Detection
DOUT
S[1:33]N S[1:33]P
PD
(POS)
(NEG)
DD(POS)
(NEG) b[1:8]N
b[1:8]P
Arrange Data
Digital CDR
625 MHz CLK
S[1:32]P
2
8
S[1:32]N
Figure 3.14: The complete digital CDR architecture.
eight bit pairs are then passed on from the Data Decision to the subsequent block, DFE
Decision. This block is just an expanded version of the full-rate DFE, a chain of MUXes
shown in Figure 3.15. A MUX chooses between the POS bit and the NEG bit based on
the selection of the previous bit. Specifically, if the previous bit was a 1 the POS bit is
chosen, and the NEG bit is chosed if it was a 0. The previous bit simply corresponds
to the output of the previous MUX. As can be seen from Figure 3.15 this architecture is
a chain of MUXes whose outputs are the final bit decisions for the 8 UI windows. The
very first MUX is controlled by the final bit of the previous CDR cycle, as in the full-rate
case.
Chapter 3. Proposed Design 31
b1
Prev_b8
To PRBS Comparator
b2
P(b1)
N(b1)
P(b2)
N(b2)
b7
b8P(b8)
N(b8)
b[1:8]
Figure 3.15: The selection of the correct set of samples is done with a chain of MUXes.
3.5.1. Cycle Slip
The digital CDR obtains 32 sample pairs corresponding to 8 UI windows. The 8 UI
windows will detect 8 bits if there is no frequency offset between the sampling clock and
the data. However, if there is a frequency offset, then the system must be able to account
for the cases:
1. If the received data rate is faster than the blind clock, this implies that φPICK is
increasing, and a bit must be dropped when φPICK transitions from bin D to A.
2. If the received data rate is slower than the blind clock, this implies that φPICK is
decreasing, and a bit must be added when φPICK transitions from bin A to D.
The boundary conditions of the two cases are illustrated in Figures 3.16 and 3.17 respec-
tively. The two figures show the last two UI windows of the previous cycle as well as the
first two of the current cycle. In Figure 3.16, the data frequency is faster than the clock
meaning that the samples arrive earlier and earlier with respect to their clock edge. This
Chapter 3. Proposed Design 32
S32S31
S30S29S28S27
S26S25 S1 S2
S5S4S3 S6
S8S7 S10S9
CDR Cycle (n-1) CDR Cycle (n)
ФPICK(n-1) ФPICK(n-1) ФPICK(n) ФPICK(n) ФPICK(n)
UI Window
A B C D A A B C D A A B C D A A B C D A A
Sample Bins
Figure 3.16: Cycle Slip; data rate is faster than blind sampling clock.
S32S31
S30S29S28S27
S26S25 S1 S2
S5S4S3 S6
S8S7 S10S9
CDR Cycle (n-1) CDR Cycle (n)
ФPICK(n-1) ФPICK(n-1) ФPICK(n) ФPICK(n)
A B C D A A B C D A A B C D A A B C D A A
Sample Bins
Figure 3.17: Cycle Slip; data rate is slower than blind sampling clock.
means that φAV G and φPICK are both increasing (shifting to the right) with respect to
that sample. This happens until the condition is reached where φPICK rolls over from
bin D to bin A. In this case there would be a duplicate bit as can be seen from the figure.
Chapter 3. Proposed Design 33
Therefore, the first bit must be removed.
In Figure 3.17, the data frequency is slower than the clock, meaning that the samples
arrive later and later with respect to their clock edge. This means that φAV G and φPICK
are both decreasing (shifting to the left) with respect to that sample. This happens until
the condition is reached where φPICK rolls over from bin A to bin D. In this case, there
would be a missing bit as can be seen from the figure. This missing bit is added by taking
the sample from bin A.
For any CDR operation within the 8 UI windows, there could be either 7,8, or 9 bits.
This information is represented via the CS Flag signal.
We present the simulation and measurement results in the next chapter.
4 Simulations and Measurement
This section first describes the behavioural simulations of the entire transceiver model
followed by measured results as well as the measurement process of the chip.
4.1. Behavioural Simulations
The entire transmitter-receiver system was modelled in Simulink, shown in Figure 4.1,
using an event-driven model [33]. The main benefit of using the event-driven model is its
high speed that allows more UIs to be processed compared to the conventional simulation
technique. The transmitter sends PRBS-7 or PRBS-31 data at a rate of 5Gb/s. This
data is then fed through the channel model; which is a time domain model of the step-
response. The channel model step response also takes into account the apperture window
of our sampling function. The ISI depth is 90 UIs including precursors and postcursors.
The output of the channel model is then sampled by 8 comparator pairs, each triggered
by an 8-phase 2.5GHz clock. Finally this sampled data is DEMUXed and given to the
model of the proposed CDR for processing. Verification of the CDR’s detected bits are
done by a model of a BERT with a 32 bit long FIFO. The simulation is done for 106
bits where the first 2.5 × 104 bits are not counted towards the bit error count. Those
initial bits are ignored in order to give the loop filter an initial time to achieve lock.
Jitter is added on the transmitter clock; both random jitter (RJ) and sinusoidal jitter
(SJ) are included in simulations. Frequency offset between the transmitter and receiver
34
Chapter 4. Simulations and Measurement 35
4fTX−RX , is distributed symmetrically between the transmitter and receiver clocks.
PRBSGenerator
Channel Model
8:32DeMUX
Digital CDR
BERT8xComparator
Pairs
8 32 7..9 ErrorCount
÷2 ÷8Sinusoidal
Jitter Source
5 GHzTX Clock
5 GHzRX ClockReceiver
Figure 4.1: Simulink Model of the entire system.
The channel models used for the Simulink behavioural simulations are shown in Table
4.1, below. Channel 1 and 2 were obtained from the channel measurements that were
done by Shayan Shahramian, a former member of the group. These two channels are the
measurements from two daughtercards on a backplane.
Name Description Loss at 2.5 GHz(dB)
Channel 1 8” FR4 Tyco 6.9Channel 2 24” FR4 Tyco 10.7Channel 3 Simulated Channel 18
Table 4.1: Channel models used for simulation.
The S-parameters were obtained; then, a transfer function was fitted to the S21 parame-
ter, and a step-response was correspondingly generated to be used for the systems channel
model. These two channels exhibit 7dB and 11dB loss at Nyquist frequency. Addition-
ally, higher attenuation channels were measured. However, obtaining a fitted transfer
function proved to be difficult due to several notches being present in the frequency re-
sponse. Thus, Channel 3 was generated manually to simulate higher attenuation channels
Chapter 4. Simulations and Measurement 36
that require the use of the CTLE.
4.1.1. DFE Performance
The main metric in assessing the performance of the DFE is the CDR’s tolerance to
input jitter. Channel 1 served as an initial stepping-stone to verify the functionality of
the CDR as well as its sensitivity to the DFE coefficient, which in the proposed system
corresponds to the comparator offset, α. Figure 4.2 shows the eye from Channel 1 going
into the receiver. As can be seen the eye has a significant horizontal and vertical eye
opening of 0.74UI and 360mV.
As would be expected for such a case, the DFE weight does not play a significant role
in increasing the eye opening of an already opened eye. This can be observed in the
reconstructed eye, after the DFE, corresponding an optimal offset of α = 50mV . The
0.1UI increase in the horizontal eye opening can directly be observed in the jitter tol-
erance simulations. The optimal offset was found by sweeping α in 25 mV increments
in order to obtain the optimal jitter tolerance curve, as can be seen in Figure 4.4. It
can be seen that the high frequency jitter tolerance increases as the α is increased, un-
til the optimal point of α = 50mV is reached. At this point, any further increase in α
causes a decrease in the high frequency jitter tolerance as the data is being over-equalized.
One important observation from the eye diagrams and the jitter tolerance curve is that
the jitter tolerance is mostly dependent on the horizontal eye opening. This is the case
because the data decision just takes the sign of the sample that is nearest to the estimate
of the UI centre, φPICK . The key determinant in how well the CDR perfroms in the
presence of jitter is the phase detector. The vertical information comes only into effect
when the phase estimate is near the edge of the eye, where the likelihood of the specu-
Chapter 4. Simulations and Measurement 37
lative samples being opposites is high. In this case, the DFE opens the eye if we select
the correct speculative bit.
The effect of the DFE is more visible for a channel with more loss, as is the case with
Figure 4.2: Simulated eye diagrams for Channel 1, before and after DFE.
Channel 2. In the case of Channel 2, the increased attenuation is clearly visible by in-
spection of the eye opening, as seen in Figure 4.3. With no DFE, α = 0mV , the eye is
almost completely closed. With the DFE enabled, the horizontal eye opening has a much
more significant increase relative to the case in Channel 1.
Chapter 4. Simulations and Measurement 38
Figure 4.3: Simulated eye diagrams for Channel 2, before and after DFE.
In Figure 4.4 the jitter tolerance curves of sweeping the comparator offset, show that
the optimal offset is somewhere between 50mV to 75mV. With the DFE disabled (when
α = 0mV ), the CDR is on the brink of breaking when tracking jitter.
Though the CTLE opens the eyes of both Channel 1 and Channel 2 even further, the
benefits of having a CTLE enabled are best observed for a channel with even more at-
tenuation. Channel 3 has 18dB loss at Nyquist frequency, and from Figure 4.6 it can be
seen that the eye prior to the CTLE is completely closed.
Chapter 4. Simulations and Measurement 39
0.1
1
10
1E+05 1E+06 1E+07 1E+08 1E+09 1E+10
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Offset = 0Offset = 25mVOffset= 50mVOffset= 75mVOffset = 100mV
Data Type: PRBS-7 Target BER: 10-6
ΔfTX-RX = 200ppm
Figure 4.4: Jitter tolerance curves for Channels 1 vs. α.
0.01
0.1
1
10
1E+05 1E+06 1E+07 1E+08 1E+09 1E+10
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Offset = 0Offset = 25mVOffset= 50mVOffset= 75mVOffset = 100mV
Data Type: PRBS-7 Target BER: 10-6
ΔfTX-RX = 200ppm
Figure 4.5: Jitter tolerance curves for Channels 2 vs. α.
Chapter 4. Simulations and Measurement 40
Figure 4.6: Simlated eye diagrams of channel 3, before CTLE, after CTLE, and after DFE.
The CTLE opens up the eye a little bit; however, more equalization is needed. The
Chapter 4. Simulations and Measurement 41
CTLE frequency response was obtained from Cadence circuit simulations and exported
to MATLAB. Then to get the equivalent Channel model plus CTLE, the two impulse
responses were convolved. The effect of the DFE is well illustrated on the input data
which is already equalized by the CTLE. The eye is opened by almost 0.45UIPP , for an
α = 100mV . Figure 4.7, shows the CDR’s jitter tolerance for channel 3 with the CTLE
enabled. The figure confirms that with the DFE disabled, or even with α < 50mV , the
CDR cannot obtain error free data. However, once the DFE coeffiecient is increased to
α = 100mV , the CDR is able to recover error free data for a high frequency jitter of
0.1UIPP .
0.01
0.1
1
10
1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Offset= 50mVOffset = 100mVOffset = 125mVOffset = 150mV
Data Type: PRBS-7 Target BER: 10-6
ΔfTX-RX = 200ppm CTLE Enabled
Figure 4.7: Jitter tolerance curves for Channel 3 vs. α.
4.1.2. Effect of Comparator Offset on Jitter Tolerance
To simulate the effect of non-ideal comparator offsets due to mismatch, each of the
comparators was assigned a random offset from a Gaussian distribution with a mean of
Chapter 4. Simulations and Measurement 42
0mV and a specified standard deviation. This was done initially to give insight for the
required precision of the offset calibration blocks. Figure 4.8, shows the CDR’s jitter
tolerance for channel 1 for a sweep of standard deviation values, with an optimal α for
Channel 1 of 50mV. It can be seen that the CDR is able to recover error free data for
a standard deviation of 40mV but cannot operate at all for a standard deviation of 60mV.
0.01
0.1
1
10
1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Sigma = 0Sigma = 20mVSigma = 40mVSigma = 60mV
Data Type: PRBS-7 Target BER: 10-6
ΔfTX-RX = 200ppm Offset = 50mV
Figure 4.8: Jitter tolerance curves for Channel 1 vs. standard deviation of comparator offset.
The effect of non-ideal comparator offsets was also simulated for Channel 2, to observe
this impact for lossier channels. Figure 4.9, shows the CDR’s jitter tolerance for Channel
2 for a sweep of standard deviation values, with an optimal α for Channel 2 of 60 mV. As
can be seen the CDR is able to tolerate even less comparator offset for a lossier channel.
In fact for the case of Channel 2, the CDR is able to recover error free data with the
presence of high frequency jitter of standard deviation less or equal to 20 mV. It can’t
operate for deviations of 40 mV as in the case of Channel 1. These simulations provided
an initial estimate for the precision of the offset calibration to be less than 20 mV.
Chapter 4. Simulations and Measurement 43
0.01
0.1
1
10
1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Sigma = 0Sigma = 20mVSigma = 40mVSigma = 60mV
Data Type: PRBS-7 Target BER: 10-6
ΔfTX-RX = 200ppm Offset = 50mV
Figure 4.9: Jitter tolerance curves for Channel 2 vs. standard deviation of comparator offset.
4.2. Experimental Results
This section will present the measured results of the fabricated test chip. First the re-
ceiver layout and testing environment will be shown followed by the measured results
of CDR operation. Finally, the power dissipation of the chip will be assessed against
previous works.
4.2.1. Receiver Layout and equipment setup
The chip was fabricated in Fujitsu’s 65nm technology. The test chip consists of the digital
CDR, a clock divider, test registers, and an on-chip BERT. The test setup overview is
shown in Figure 4.10. The 5Gb/s data is generated with a PRBS-7 generator and passed
through a FR-4 channel, of various lengths (16”,32,”,48”). The chip clock is derived from
Chapter 4. Simulations and Measurement 44
Comp.Core
Digital CDR
TestRegisters
PRBSComparator16 64
64
9
Error Count
CLKDivider
2.5GHz CLKs8
÷ 4
10 GHzRX CLK
FR4 Channel
5 GHzTX CLK
PRBS-7Generator
CTLE
DUT
FPGA
Calib.
Figure 4.10: Measurement setup.
a 10 GHz source. An FPGA is used to control the test registers which contain the CDR’s
settings. An on-chip BERT is used to verify the error count of the system. The error
count is output from the chip and observed on the logic analyzer. Figure 4.11 shows the
die photo along with the pin names.
Table 4.2 describes the purpose of each of the pins. The data pins are outputs from the
FPGA that write to the test registers on-chip, in order to control both the CDR and
CTLE settings. These same data[0:5] pins are used to manually write to each of the
test registers of the calibration block. The pin Prog Calib controls both the input to
the comparators during calibration, as well as the data pins from the FPGA, enabling
writing to the calibration blocks rather than the digital CDR.
The measurement setup is shown in Figure 4.12.
• Signal Generator 1: Agillent 83712B 10MHz-20GHz Synthesized CW Generator
Chapter 4. Simulations and Measurement 45
VBIAS_CLKGEN
AVS_EQ
VCM_CLKGEN
AVD_ADC
CLKN
AVS_ADC
CLKP
AVS_ADC
VREFP
VB
VREFN
AVS_ADC
PROG_CALIB
AVD_ADC
RST_CAL
AVS_ADC
AVD_ADC
VSS_D
IG
RST_D
IG
VD
D_D
IG
VSS_D
IG
DATA_STB
AD
R_STB
DATA[5
]
DATA[4
]
DATA[3
]
DATA[2
]
DATA[1
]
DATA[0
]
REG
_CLK
VSS_D
IG
VD
D_D
IG
VD
D_IO
VSS_IO
---
---
---
VBIA
S_EQ
2B
RX_IN
AVS_EQ
RX_IN
X
AVS_EQ
AVD
_EQ
AVD
_EQ
VCM
_CTLE
VBIA
S_EQ
AVD
_D
IG
VBIA
S_EQ
2A
AVS_D
IG ---
---
DOUT[7]
VSS_IO
DOUT[6]
VDD_IO
DOUT[5]
VSS_IO
DOUT[4]
VSS_IO
CLKOUT
VDD_IO
DOUT[3]
VSS_IO
DOUT[2]
VDD_IO
DOUT[1]
VSS_IO
DOUT[0]
A CLK Divider
B
C
D
ADC & DMX & Offset Calib.
Digital CDR
CTLE
65 x 100 μm2
300 x 240 μm2
240 x 165 μm2
180 x 180 μm2
B C
A D
2mm
2mm
Process 65nm CMOS
5Gb/s
1.2V
Data Rate
Supply
Figure 4.11: Chip die photo.
• Signal Generator 2: Centellax TG1C1-A Clock Synthesizer
• PRBS Generator: Centellax TG1B1-A 10G PRBS
Chapter 4. Simulations and Measurement 46
Pin Name Description
RX in/RX inx Differential Input DataVrefP/VrefN Differential Comparator Offset, αCLKp/CLKn Differential CLK InputVbias Clkgen CLK Divider BiasVcm Clkgen CLK Divider Common ModeVcm CTLE CTLE Common ModeVbias Eq,Vbias Eq2a,Vbias Eq2b CTLE/VGA biasesVB Offset Cancellation Transistor BiasRst Cal Reset CalibrationProg Calib Enable CalibrationRst Dig Reset DigitalData Stb Data strobe for test registerAdr Stb Address strobe for test registerReg CLK Clock for test registersData[0...5] Address/Data bits for test registerDout[0...7] Output Data bits to Logic AnalyzerAVD ADC/AVS ADC ADC Core + DEMUX Power SupplyAVD EQ/AVS EQ Equalizer Power SupplyVDD DIG/VSS DIG Digital CDR Power SupplyVDD IO/VSS IO Input/Output Buffer Power Supply
Table 4.2: Pin description.
• Logic Analyzer: Tektronix TLA714 Logic Analyzer
• Narda 4346 180◦ Hybrid (2-18GHz)
The chip was measured with a probe card. An Agillent 83712B clock generator was used
to generate a 10GHz singled ended clock, which was then converted to be differential and
passed to the chip as CLKp and CLKn. A Centellax TG1C1-A Clock Synthesizer was
used to generate a single ended 5GHz clock with 10kHz to 100MHz of sinusoidal jitter.
The clock was then used to generate 5Gb/s PRBS data with jitter. This data was then
sent through an FR4 channel to the chip as RX in and RX inx. An FPGA was used
to program the test registers of the CDR and the Offset Calibration blocks. Finally, a
Tektronix TLA714 Logic Analyzer was used to observe the error count from the on-chip
BERT.
Chapter 4. Simulations and Measurement 47
PRBS GeneratorCLKIN CLKIN
DOUT DOUT
Signal Gen. (2)10MHzIN
Channl2
Signal Gen. (1)
10MHzOUT
RFOUT FMIN
180° Hybrid
SEIN
OUT0 OUT180Probe Card
DUT
FR4 Channel
DC Power Supplies
FPGA
Data_Stb,Adr_Stb,
Prog_calib,Rst_cal,Rst_dig,
Reg_CLK
Data[0...5]
66
RX_INX
RX_IN
CLKP
CLKNLogic
Analyzer
DOUT[0...7]
8
AVD_ADC, AVD_EQ, VDD_DIG, VDD_IO,Vcm_CTLE, Vcm_Clkgen, Vb, Vbias_Clkgen, Vbias_Eq,Vrefp, Vrefn
5 GHz CLK, 1kHz-100MHz SJ
5 Gb/s PRBS
10 GHz CLK, Single Ended
10 GHz CLK, Differential
Figure 4.12: Detailed equipment setup.
4.2.2. Offset Cancellation
The first step in the measurement process was to calibrate for the offset due to mismatch.
With availability of multiple dies, each die had to be calibrated to cancel its particular
offset. It was recognized early on in the testing period that the automatic calibration was
not working. This was debugged to be due to an unintentional floating node in the digital
calibration blocks. However, manual calibration was still possible. The floating node did
not affect the functionality of the system, but as will be presented later, it increased the
total power consumption. When calibrating for the offset, the logic analyzer was vital in
determining the correct offset code per comparator. During calibration, the differential
voltage applied to the comparator is set to zero, by way of setting VrefP and VrefN to
the common-mode voltage. If there was a significant amount of offset, the output would
Chapter 4. Simulations and Measurement 48
be a constant high or low. In the case of a small offset, the output would be seen on
the edge of either being high or low and thus would be seen as toggling on the logic
analyzer. This observation proved to be the key indicator for determining when offset
was minimized. Thus by increasing or decreasing the current on either side of the input
pair, through adjusting the offset cancellation codes, the output would begin toggling
when the optimal code was set. Upon completion of the offset calibration, we initiated
evaluation of the CDR’s performance.
4.2.3. Jitter Tolerance
The channels used for the CDR were FR4 traces of several lengths. Table 4.3, describes
the channels that were used for measurements. The corresponding eye diagram for 5Gb/s
PRBS-7 data of the three channels are also shown in Figure 4.13. It can be seen that for
Channel C, the eye is completely closed coming into the CDR.
Name Description Loss at 2.5 GHz (dB)
Channel A 16” FR4 Trace 5.9Channel B 32” FR4 Trace 9.3Channel C 48” FR4 Trace 12.9
Table 4.3: Channels used for measurements.
The effect of α on the jitter tolerance, is very noticeable for Channel B, whose attenuation
is 9.3 dB at 2.5GHz. Figure 4.14 shows the effect of various α values on the jitter toler-
ance of the CDR for 5Gb/s PRBS-7 data, targeting a BER of 10−12. The CDR operates
optimally for this channel when α = 60mV , achieving a high frequency jitter tolerance of
0.39UIPP . If the DFE is disabled, corresponding to alpha = 0mV , the perfomance with
respect to jitter tolerance worsens as seen by the bumps in the corresponding curve. On
the other hand if the data is over-equalized, the performance drops as well, as seen from
the curve corresponding to an α of 100mV.
Chapter 4. Simulations and Measurement 49
Figure 4.13: Measured eye diagrams of the channels.
Chapter 4. Simulations and Measurement 50
0.01
0.1
1
10
1E+04 1E+05 1E+06 1E+07 1E+08
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Data Type: PRBS-7 Target BER: 10-12
104 105 106 107 108
α = 0mV
α = 60mV
α = 100mV
Figure 4.14: Measured jitter tolerance vs. α for Channel B.
In the case of Channel B, there wasn’t a need for the CTLE to be enabled. However,
as the channel length is increased to 48” and consequently the loss increased to 12.9dB
at Nyquist frequency, the need for a CTLE becomes apparent. Figure 4.15 shows the
jitter tolerance curves for different channel lengths. In fact, for Channel A, the CDR
operates to the limit of our jitter source; tolerating 0.63UIPP of high frequency jitter.
For Channel C, the CDR cannot operate with the CTLE disabled. With it enabled, and
with alpha = 60mV , the CDR tolerates up to 0.31UIPP of high frequency jitter.
The channels used for simulation were not available for measurement, and a one-to-
one comparison could not be made directly. Comparing the simulated performance for
Channel 2 (10.7dB loss at 2.5GHz) and the measured performance for Channel B (9.3dB
loss at 2.5GHz), the measured high frequency jitter tolerance is higher than the simulated,
0.39UIPP versus 0.21PP . This higher jitter tolerance in the measurement results can be
accosted to the smaller channel attenuation.
Chapter 4. Simulations and Measurement 51
0.01
0.1
1
10
1E+04 1E+05 1E+06 1E+07 1E+08
Jitt
er
Tole
ren
ce (
UI p
p)
Frequency (Hz)
Data Type: PRBS-7 Target BER: 10-12
104 105 106 107 108
16-inch FR4, CTLE Off
48-inch FR4, CTLE Off
48-inch FR4, CTLE On
Figure 4.15: Measured jitter tolerance vs. channel length.
4.2.4. Power Comparison
Reducing the power consumption of ADC-based CDRs was one of the main goals in this
work. The chip consisted of the following three power grids:
1. ADC core, DEMUX, CLK Divider, Offset Calibration Blocks
2. Equalizer
3. Digital CDR
It was observed that the digital power consumed by the CDR was 18 mW. Mysteriously,
the power for the ADC grid was 45.6 mW. When comparing this with the ADC and CLK
Divider power for the 3-bit ADC, 3x oversampling case [8], whose power was 52.8 mW,
it was clear that there was something off. This was evident because the 3X case had 21
comparator per UI vs. 8 per UI in this work.
By re-simulating the design, it became clear that there was a bug in the offset calibration
blocks. A node had been left floating unintentionally, leading to a voltage of around 0.6V
Chapter 4. Simulations and Measurement 52
appearing on the input of several gates. This led to each calibration block consuming
a constant current, and burning 10 mW of power for all 8 blocks. The simulated ADC
power with this introduced bug was 43 mW. This value was used to obtain a realistic
estimate of the measured power with the bug fixed. The bug-free simulated ADC power
was 33 mW, and by scaling this by the ratio of the simulated and measured powers of
the design with the bug, the estimated ADC and CLK divider power was determined to
be 35.5 mW. The comparison of this work with blind ADC-based CDRs is summarized
in Table 4.4.
CDR DataRate(Gb/s)
Tech.(nm)
ChannelLoss(dB)
ADCPower(mW)
Dig.Power(mW)
CLKPower(mW)
TotalPower(mW)
[5] 5 65 10 110 68.4 NA 178.4[4] 5 65 15 NA NA NA 280[6] 5 65 13.3 NA 57.6 NA 211.2[7] 10 65 10 109 111.6 NA 306[8] 5 65 5.8 38.4 42 14.4 94.8
This Work(CTLE Off)
5 65 9.7 *20.5 18 *15 *53.5
This Work(CTLE Off)
5 65 12.9 *20.5 18 *15 *76.3
*The result reported is the estimate of the measured power as described in the text.
Table 4.4: Channels used for measurements.
Compared to previous blind ADC-based CDRs, it is clear that this work consumes almost
half the power of the most efficient alternative. Importantly, this reduction in power does
not come with a sacrifice in the ability to tolerate high channel loss. The power reduction
of this design is best illustrated graphically in Figure 4.16. This figure compares the
estimated measured power of this design with the most efficient alternative being the 3x
case [8]. Both the digital and ADC powers are halved.
Chapter 4. Simulations and Measurement 53
0
5
10
15
20
25
30
35
40
45
ADC Power Digital Power CLK Power
Po
we
r (m
W)
3x Blind CDR
This Work
Figure 4.16: Comparison of power between this work and blind ADC-based CDR with 3xoversampling [8].
5 Conclusion
This thesis presented the work of a 4x oversampling, 3-level ADC-based CDR, sampled
by comparator pairs with symmetric offsets used for equalization. It was the goal of this
work to reduce the overall power consumption in ADC-based CDRs without sacrificing
performance for channels with high loss.
This work incorporates a non-uniformly distributed 3-level ADC. This ADC is composed
of two comparators which have off-chip tunable thresholds set to the first post-cursor
ISI. This is equivalent to the implementation of a speculative DFE architecture. A 2-bit
4-level ADC with a linear distribution would be too coarse for operation as a DFE. Fur-
thermore, this architecture is unique from conventional DFEs in the way that the clock
phase is recovered without recovering the clock. Rather, the clock phase is estimated in
a digital filter as is done in blind ADC-based CDR architectures.
Through measurement of input 5Gb/s PRBS data, the CDR was able to operate for
channel lengths up to 48 inches, corresponding to an attenuation of 12.9 dB at Nyquist.
For a 32 inch channel, the effect of the DFE was evident on the jitter tolerance curve. For
an α = 60mV , the CDR was able to attain a high frequency jitter tolerance of 0.39UIPP .
Meanwhile, with the DFE disabled or α = 0mV , the CDR was on the brink of operation
as evident by the bumps in the jitter tolerance curve.
54
Chapter 5. Conclusion 55
An unintentional mistake during the RTL design of the digital offset calibration blocks
led to a node being left floating. This led to the digital calibration blocks consuming
a constant 10mW of power. After re-simulation of the ADC power grid, in an attempt
to get a realistic estimate of the power for an error free circuit, it was determined that
the ADC and clock divider consumed 35.5 mW of power. Meanwhile, the digital CDR
consumed only 18 mW (measured), tallying up for a total estimate of measured power to
be 53.5 mW. This power is nearly half of the power used in the most efficient alternative
(3x) [8], which consumed 94.8 mW. Without a CTLE, the CDR was able to tolerate up
to 9.7dB of loss; comparable to previous works of 10dB, 15dB, 13.3dB, 5.8dB. With the
CTLE enabled, the CDR was able to tolerate up to 12.9 dB of channel loss, consuming
only 76.3 mW; less than all previous designs.
5.1. Thesis Contributions
The contributions of this thesis are summarized below:
1. Design, simulation, and measurement of a 4x oversampling, 3-level blind ADC-
based CDR.
2. A paper entitled, A 4x, 3-Level Blind ADC-Based Receiver, submitted to the Sym-
posium on VLSI Circuits.
5.2. Future Work
Two possible avenues for future research in this area are:
1. Further reduction in power consumption. The work presented in this thesis con-
sumes 53.5mW for 5Gb/s data rate. This translates to 10.7mW/Gb/s, which is
atleast a factor of two higher than the industry standard.
Chapter 5. Conclusion 56
2. Clock phase distribution to time-interleaved ADC’s. It is well known that any skew
between the clock phases of the time-interleaved ADC’s will result in loss in jitter
tolerance. Techniques need to be developed to either precisely distribute the phases
or to calibrate them afterwards.
References
[1] Timothy O. Dickson, John F. Bulzacchelli, and Daniel J. Friedman. A 12-Gb/s
11-mW Half-Rate Sampled 5-Tap Decision Feedback Equalizer With Current-
Integrating Summers in 45-nm SOI CMOS Technology. IEEE Journal of Solid-State
Circuits, 44(4):1298–1305, April 2009.
[2] Yasuo Hidaka. A 4-CHannel 10.3Gb/s Backplane Transceiver Macro with 35dB
Equalizer and Sign-Based Zero-Forcing Adaptive Control. ISSCC Dig. Tech. Papers,
pages 188–190, 2009.
[3] Jri Lee Ming-Shuan Chen, Yu-Nan Shih, Chen-Lun Lin, Hao-Wei Hung. A 40Gb /
s TX and RX Chip Set in 65nm CMOS. ISSCC Dig. Tech. Papers, 11(22):541–542,
2011.
[4] Hisakatsu Yamaguchi, Hirotaka Tamura, Yoshiyasu Doi, Yasumoto Tomita,
Takayuki Hamada, Masaya Kibune, Shuhei Ohmoto, Keita Tateishi, Oleksiy
Tyshchenko, Ali Sheikholeslami, Tomokazu Higuchi, Junji Ogawa, Tamio Saito,
Hideki Ishida, and Kohtaroh Gotoh. CDR and CMA Adaptive Equalizer in 65nm
CMOS. ISSCC Dig. Tech. Papers, 44(4):168–170, 2010.
[5] Oleksiy Tyshchenko, Ali Sheikholeslami, Senior Member, Hirotaka Tamura, Masaya
Kibune, Hisakatsu Yamaguchi, and Junji Ogawa. A 5-Gb / s ADC-Based Feed-
Forward CDR. IEEE Journal of Solid-State Circuits, 45(6):1091–1098, 2010.
57
References 58
[6] Siamak Sarvari, Tina Tahmoureszadeh, Ali Sheikholeslami, Hirotaka Tamura, and
Masaya Kibune. A 5Gb/s speculative DFE for 2x blind ADC-based receivers in
65-nm CMOS. Symposium on VLSI Circuits, 2:69–70, June 2010.
[7] Clifford Ting, Joshua Liang, Ali Sheikholeslami, Masaya Kibune, and Hirotaka
Tamura. 7.4 A Blind Baud-Rate ADC-Based CDR. ISSCC Dig. Tech. Papers,
pages 122–124, 2013.
[8] M Sadegh Jalali, Clifford Ting, Behrooz Abiri, Ali Sheikholeslami, Masaya Kibune,
and Hirotaka Tamura. A 3x Blind ADC-based CDR. IEEE Asian Solid-State Cir-
cuits Conference (A-SSCC), 4:349–352, 2013.
[9] Afshin Momtaz and Michael M Green. An 80mW 40 Gb/s 7-Tap T/2-Spaced Feed-
Forward Equalizer in 65 nm CMOS. IEEE Journal of Solid-State Circuits, 45(3):629–
639, 2010.
[10] Jian-hao Lu and Shen-iuan Liu. A Merged CMOS Digital Near-End Crosstalk Can-
celler and Analog Equalizer for Multi-Lane Serial-Link Receivers. IEEE Journal of
Solid-State Circuits, 45(2):433–446, 2010.
[11] Horace Cheng, Faisal A Musa, Anthony Chan Carusone, Senior Member, and Ab-
stract Pulse-width Pwm-pe. A 32 / 16-Gb / s Dual-Mode Pulsewidth Modulation
30-dB Loss Compensation Using a High-Speed CML Design Methodology. IEEE
Transactions on Circuits and Systems, 56(8):1794–1806, 2009.
[12] Yong Liu, Byungsub Kim, Timothy O Dickson, John F Bulzacchelli, and Daniel J
Friedman. A 10Gb / s Compact Low-Power Serial I / O with DFE-IIR Equalization
in 65nm CMOS. ISSCC Dig. Tech. Papers, pages 182–184, 2009.
[13] Massimo Pozzoni, Simone Erba, Davide Sanzogni, Marcello Ganzerli, Paolo Viola,
Daniele Baldi, Matteo Repossi, Giorgio Spelgatti, and Francesco Svelto. 8.5 A
References 59
12Gb/s 39dB Loss-Recovery Unclocked-DFE Receiver with Bi-dimensional Equal-
ization. ISSCC Dig. Tech. Papers, 11:541–542, 2010.
[14] Dong Hun Shin, Ji Eun Jang, Frank O’Mahony, and C. Patrick Yue. A 1-mW 12-
Gb/s continuous-time adaptive passive equalizer in 90-nm CMOS. IEEE Custom
Integrated Circuits Conference, (Cicc):117–120, September 2009.
[15] Ricky Yuen, Marcus van Ierssel, Ali Sheikholeslami, William Walker, and Hiro-
taka Tamura. A 5Gb/s Transmitter with Reflection Cancellation for Backplane
Transceivers. IEEE Custom Integrated Circuits Conference, (Cicc):413–416, Septem-
ber 2006.
[16] Jonathan Sewter and Anthony Chan Carusone. A 3-Tap FIR Filter With Cascaded
Distributed Tap Amplifiers for Equalization Up to 40 Gb / s in 0 . 18- m CMOS.
IEEE Journal of Solid-State Circuits, 41(8):1919–1929, 2006.
[17] Altan Hazneci and Sorin P Voinigescu. A 49-Gb/s, 7-Tap Transversal Filter in
0.18um SiGe BiCMOS for Backplane Equalization. IEEE CSIC Digest, pages 4–7,
2004.
[18] Behrooz Abiri, Ali Sheikholeslami, Senior Member, and Hirotaka Tamura. An Adap-
tation Engine for a 2x Blind ADC-Based CDR in 65 nm CMOS. IEEE Journal of
Solid-State Circuits, 46(12):3140–3149, 2011.
[19] John Proakis. Digital Communications. McGraw-Hill Science/Engineering/Math, 4
edition, 2000.
[20] Yue Lu and Elad Alon. 2.2 A 66Gb/s 46mW 3-Tap Decision-Feedback Equalizer in
65nm CMOS. ISSCC Dig. Tech. Papers, pages 30–32, 2013.
References 60
[21] Hideyuki Sugita, Kazuhisa Sunaga, Koichi Yamaguchi, and Masayuki Mizuno. 8.4
A 16Gb/s 1st-Tap FFE and 3-Tap DFE in 90nm CMOS. ISSCC Dig. Tech. Papers,
pages 368–369, 2010.
[22] Mike Hardwoord. A 12 . 5Gb / s SerDes in 65nm CMOS Using a Baud- Rate ADC
with Digital Receiver Equalization and Clock Recovery. ISSCC Dig. Tech. Papers,
pages 436–438, 2007.
[23] Aida Varzaghani, Chih-kong Ken Yang, Senior Member, and Abstract A Gs. A 4 .
8 GS / s 5-bit ADC-Based Receiver With Embedded DFE for Signal Equalization.
ISSCC Dig. Tech. Papers, 44(3):901–915, 2009.
[24] J.S. Bal. Circuit Blocks for an Analog CMOS Decision Feedback Equalizer. 1992.
[25] S. Kasturia and J.H. Winters. Techniques for high-speed implementation of nonlinear
cancellation. IEEE Journal on Selected Areas in Communications, 9(5):711–717,
June 1991.
[26] Adesh Garg, Anthony Chan Carusone, Sorin P Voinigescu, and Senior Member. A
1-Tap 40-Gb / s Look-Ahead Decision Feedback Equalizer in 0 . 18- m SiGe BiCMOS
Technology. IEEE Journal of Solid-State Circuits, 41(10):2224–2232, 2006.
[27] Shunichi Kaeriyama. A 10Gb/s/ch 50mW 120x130um2 Clock and Data Recovery
Circuit. ISSCC Dig. Tech. Papers, pages 264–265, 2005.
[28] Declan Dalton, Kwet Chai, Eric Evans, Mark Ferriss, Student Member, Dave Hitch-
cox, Paul Murray, Sivanendra Selvanayagam, Paul Shepherd, and Lawrence Devito.
A 12.5-Mb/s to 2.7-Gb/s Continuous-Rate CDR With Automatic Frequency Acqui-
sition and Data-Rate Readback. IEEE Journal of Solid-State Circuits, 40(12):2713–
2725, 2005.
References 61
[29] Ke-Chung Wu Jri Lee. A 20Gb / s Full-Rate Linear CDR Circuit with Automatic
Frequency Acquisition. ISSCC Dig. Tech. Papers, 2:366–368, 2009.
[30] Bo Zhang, Ali Nazemi, Adesh Garg, Namik Kocaman, Mahmoud Reza Ahmadi,
Mehdi Khanpour, Heng Zhang, Jun Cao, and Afshin Momtaz. A 195mW / 55mW
Dual-Path Receiver AFE for Multistandard 8.5-to-11.5 Gb/s Serial Links in 40nm
CMOS. pages 34–36, 2013.
[31] Jun Cao and Bo Zhang. A 500 mW ADC-Based CMOS AFE With Digital Calibra-
tion for 10 Gb/s Serial Links Over Multimode Fiber. IEEE Journal of Solid-State
Circuits, 45(6):1172–1185, 2010.
[32] Clifford Ting. A Blind Baud-Rate CDR and Zero Forcing Adaptive DFE For an
ADC-Based Receiver. MASc Thesis, (University of Toronto), 2013.
[33] Marcus Van Ierssel, Hisakatsu Yamaguchi, Ali Sheikholeslami, Senior Member, Hi-
rotaka Tamura, and William W Walker. Event-Driven Modeling of CDR Jitter
Induced by Power-Supply Noise , Finite Decision-Circuit Bandwidth , and Channel
ISI. IEEE Journal of Solid-State Circuits, 55(5):1306–1315, 2008.