Ibm July 2009 Jssc

15
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 1927 A 5.75 to 44 Gb/s Quarter Rate CDR With Data Rate Selection in 90 nm Bulk CMOS Lucio Rodoni, Student Member, IEEE, George von Büren, Student Member, IEEE, Alex Huber, Member, IEEE, Martin Schmatz, Member, IEEE, and Heinz Jäckel, Member, IEEE Abstract—This paper presents a quarter-rate clock and data recovery (CDR) circuit for plesiochronous serial I/O-links. The 2 -oversampling phase-tracking CDR, implemented in 90 nm bulk CMOS technology, covers the whole range of data rates from 5.75 to 44 Gb/s realized in a single IC by the novel feature of a data rate selection logic. Input data are sampled with eight parallel differential master-slave flip-flops, where bandwidth enhancement techniques were necessary for 90 nm CMOS. Precise and low-jitter local clock phases are generated by an analog delay-locked loop. These clock phases are aligned to the incoming data by four parallel phase rotators. The phase-tracking loop of the CDR is realized as a digital delay-locked loop and is therefore immune against process tolerances. The CDR is able to track a maximum frequency deviation of 615 ppm between incoming data and a local reference clock and fulfills the extended XAUI jitter tolerance mask. A bit error rate 10 was verified up to 38 Gb/s using a2 1 PRBS pattern. With a low power consumption per data rate of only 5.74 mW/(Gb/s) the CDR meets the specifications of the International Technology Roadmap for Semiconductors for 90 nm CMOS serial I/O-links at the maximal data rate of 44 Gb/s. The CDR occupies a chip area of 0.2 mm . Index Terms—Clock and data recovery (CDR), CMOS analog integrated circuits, current-mode logic (CML), delay-locked loop (DLL), high-speed serial link, jitter tolerance. I. INTRODUCTION T HE aggregate data communication bandwidth of key components in telecommunication equipment and com- puter servers has experienced a continuous increase. This progress has been achieved by increasing the serial data rate and by integrating more power- and area-efficient transceivers on a single CMOS IC. Key trends in CMOS technology, power consumption, and aggregate data rate are summarized in Table I according to the forecast of the International Roadmap for Semiconductors (ITRS) published in 2004 [1]. In Table I, the transceivers are categorized into high-integration-level serial transceivers (e.g., 200 8 Gb/s) and high-performance serial Manuscript received November 07, 2008; revised March 06, 2009. Current version published June 24, 2009. This work was supported by the Swiss Federal Office for Professional Education and Technology under Contract/Grant number KTI 7995.1. L. Rodoni, G. von Büren, and H. Jäckel are with the Swiss Federal In- stitute of Technology (ETH) Zurich, Electronics Laboratory, 8092 Zurich, Switzerland (e-mail: [email protected]; [email protected]; [email protected]). A. Huber is with the Institute of Microelectronics, University of Applied Sciences Northwestern Switzerland, 5210 Windisch, Switzerland (e-mail: alex. [email protected]). M. Schmatz is with the Zurich Research Laboratory, IBM Research, 8803 Rüschlikon, Switzerland (e-mail: [email protected]). Digital Object Identifier 10.1109/JSSC.2009.2021913 TABLE I SERIAL TRANSCEIVER ROADMAP OF ITRS [1] transceivers (e.g., 40 40 Gb/s). High-integration-level [2], [3], high-performance [4], [5], and electrical/optical [6], [7] chip-to-chip transceivers, representing the state of the art, are summarized in Table II. One of the critical and speed-limiting circuit blocks in a se- rial I/O link macro-cell is the clock and data recovery (CDR) circuit in the receiver. The first 40 Gb/s CMOS CDR was pre- sented in 2003 and was realized in a 0.18 m process [8]. This 40 Gb/s CDR employs a quarter-rate architecture with a multi- phase LC oscillator and a passive loop filter. In 2007, a quarter- rate 3 -oversampling 40–44 Gb/s CDR with 1:16 DEMUX im- plemented in 90 nm CMOS was presented [9]. This CDR fulfills the ITU-T G.8251 jitter tolerance mask and its power consump- tion is less than 1/3 of a comparable commercial SiGe CDR with 1:16 DEMUX. In multi-channel applications, where every participant has nominally the same local reference frequency , the CDR of each receiver aligns the phase of its plesiochronous sam- pling clock to the incoming data by using phase interpolation techniques [10]. Since no VCO is needed in each CDR, cou- pling between channels is reduced. The control of the sampling position is realized by analog [11] or digital [10], [12] phase- tracking loops. Using an analog phase interpolator, a 10.8 Gb/s half-rate CDR implemented in 0.11 m CMOS fulfills the SDH/ SONET jitter tolerance at a BER 10 , consuming a power of 220 mW and an area of 0.35 mm [11]. A half-rate 25 Gb/s CDR implemented in 90 nm CMOS achieving a BER 10 incorporates a digital first-order loop filter, consumes 98 mA from a 1.1 V supply, and occupies a die area of 0.064 mm only, and is therefore suited for high-density integration [12]. It has been shown with a 65 nm SOI CMOS technology that area and power consumption per data rate of a quarter-rate 40 Gb/s CDR can be as low as 0.03 mm and 1.8 mW/(Gb/s), respectively 0018-9200/$25.00 © 2009 IEEE

Transcript of Ibm July 2009 Jssc

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009 1927

A 5.75 to 44 Gb/s Quarter Rate CDR With Data RateSelection in 90 nm Bulk CMOS

Lucio Rodoni, Student Member, IEEE, George von Büren, Student Member, IEEE, Alex Huber, Member, IEEE,Martin Schmatz, Member, IEEE, and Heinz Jäckel, Member, IEEE

Abstract—This paper presents a quarter-rate clock and datarecovery (CDR) circuit for plesiochronous serial I/O-links. The2 -oversampling phase-tracking CDR, implemented in 90 nmbulk CMOS technology, covers the whole range of data rates from5.75 to 44 Gb/s realized in a single IC by the novel feature of a datarate selection logic. Input data are sampled with eight paralleldifferential master-slave flip-flops, where bandwidth enhancementtechniques were necessary for 90 nm CMOS. Precise and low-jitterlocal clock phases are generated by an analog delay-locked loop.These clock phases are aligned to the incoming data by fourparallel phase rotators. The phase-tracking loop of the CDR isrealized as a digital delay-locked loop and is therefore immuneagainst process tolerances. The CDR is able to track a maximumfrequency deviation of 615 ppm between incoming data and alocal reference clock and fulfills the extended XAUI jitter tolerancemask. A bit error rate 10 �� was verified up to 38 Gb/s usinga 2� 1 PRBS pattern. With a low power consumption per datarate of only 5.74 mW/(Gb/s) the CDR meets the specifications ofthe International Technology Roadmap for Semiconductors for90 nm CMOS serial I/O-links at the maximal data rate of 44 Gb/s.The CDR occupies a chip area of 0.2 mm�.

Index Terms—Clock and data recovery (CDR), CMOS analogintegrated circuits, current-mode logic (CML), delay-locked loop(DLL), high-speed serial link, jitter tolerance.

I. INTRODUCTION

T HE aggregate data communication bandwidth of keycomponents in telecommunication equipment and com-

puter servers has experienced a continuous increase. Thisprogress has been achieved by increasing the serial data rateand by integrating more power- and area-efficient transceiverson a single CMOS IC. Key trends in CMOS technology, powerconsumption, and aggregate data rate are summarized in Table Iaccording to the forecast of the International Roadmap forSemiconductors (ITRS) published in 2004 [1]. In Table I, thetransceivers are categorized into high-integration-level serialtransceivers (e.g., 200 8 Gb/s) and high-performance serial

Manuscript received November 07, 2008; revised March 06, 2009. Currentversion published June 24, 2009. This work was supported by the Swiss FederalOffice for Professional Education and Technology under Contract/Grant numberKTI 7995.1.

L. Rodoni, G. von Büren, and H. Jäckel are with the Swiss Federal In-stitute of Technology (ETH) Zurich, Electronics Laboratory, 8092 Zurich,Switzerland (e-mail: [email protected]; [email protected];[email protected]).

A. Huber is with the Institute of Microelectronics, University of AppliedSciences Northwestern Switzerland, 5210 Windisch, Switzerland (e-mail: [email protected]).

M. Schmatz is with the Zurich Research Laboratory, IBM Research, 8803Rüschlikon, Switzerland (e-mail: [email protected]).

Digital Object Identifier 10.1109/JSSC.2009.2021913

TABLE ISERIAL TRANSCEIVER ROADMAP OF ITRS [1]

transceivers (e.g., 40 40 Gb/s). High-integration-level [2],[3], high-performance [4], [5], and electrical/optical [6], [7]chip-to-chip transceivers, representing the state of the art, aresummarized in Table II.

One of the critical and speed-limiting circuit blocks in a se-rial I/O link macro-cell is the clock and data recovery (CDR)circuit in the receiver. The first 40 Gb/s CMOS CDR was pre-sented in 2003 and was realized in a 0.18 m process [8]. This40 Gb/s CDR employs a quarter-rate architecture with a multi-phase LC oscillator and a passive loop filter. In 2007, a quarter-rate 3 -oversampling 40–44 Gb/s CDR with 1:16 DEMUX im-plemented in 90 nm CMOS was presented [9]. This CDR fulfillsthe ITU-T G.8251 jitter tolerance mask and its power consump-tion is less than 1/3 of a comparable commercial SiGe CDR with1:16 DEMUX.

In multi-channel applications, where every participant hasnominally the same local reference frequency , the CDRof each receiver aligns the phase of its plesiochronous sam-pling clock to the incoming data by using phase interpolationtechniques [10]. Since no VCO is needed in each CDR, cou-pling between channels is reduced. The control of the samplingposition is realized by analog [11] or digital [10], [12] phase-tracking loops. Using an analog phase interpolator, a 10.8 Gb/shalf-rate CDR implemented in 0.11 m CMOS fulfills the SDH/SONET jitter tolerance at a BER 10 , consuming a powerof 220 mW and an area of 0.35 mm [11]. A half-rate 25 Gb/sCDR implemented in 90 nm CMOS achieving a BER 10incorporates a digital first-order loop filter, consumes 98 mAfrom a 1.1 V supply, and occupies a die area of 0.064 mm only,and is therefore suited for high-density integration [12]. It hasbeen shown with a 65 nm SOI CMOS technology that area andpower consumption per data rate of a quarter-rate 40 Gb/s CDRcan be as low as 0.03 mm and 1.8 mW/(Gb/s), respectively

0018-9200/$25.00 © 2009 IEEE

1928 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

TABLE IIPUBLISHED SERIAL TRANSCEIVERS AND CDRS

Fig. 1. Architecture of the phase tracking loop.

[13]. Performance figures (e.g., input data range, reference clockfrequency range) of these CDRs are listed in Table II. Noneof them are able to handle the complete input data range from10 to 40 Gb/s. Either the input data rate range is limited due tothe VCO of the CDR [8], [9], [13] or the plesiochronous CDRasks for a too large reference frequency range [11], [12]. There-fore, we propose and have implemented successfully a data rateselection logic that allows coverage of the whole range of datarates even from 5.75 to 44 Gb/s while the reference frequencyrange is 5.75 to 11 GHz [14]. This feature makes the circuit es-pecially suitable in multi-standard applications, enabling newlink rates while supporting compatibility with legacy rates.

Section II gives an overview of the proposed CDR architec-ture. The building blocks of the CDR, such as the samplers,8:32 demultiplexer, digital control loop with data rate selection,phase rotator, delay-locked loop (DLL), and clock buffer, aredescribed in detail in Section III. Finally, measurement resultsare presented in Section IV and a summary is given in Section V.

II. CDR ARCHITECTURE

In high-density serial I/O links, the transmitter (TX) andreceiver (RX) are clocked by two independent reference clocks

having the same nominal frequency. The CDR of the receiverhas to track a slowly drifting phase difference between theincoming data and the RX clock caused by a bounded fre-quency difference in the range of 10 to 100 ppm betweenthe quartz-based plesiochronous TX and RX clocks. Hence, aphase-tracking loop in the CDR is sufficient for this purpose.

Since the sampler is the speed-limiting circuit block ofthe CDR, parallel architectures, e.g., half-rate [12], [15] andquarter-rate CDRs [8], [13], are employed to demultiplex thedata at the input. A higher demultiplexing factor increases thenumber of samplers and the number of clock edges, but hasthe following two advantages: 1) the regeneration phase ofthe comparator at the input is enlarged and 2) the samplingclock frequency is lowered, simplifying the on-chip clockdistribution.

The block diagram of the 2 -oversampling quarter-rate CDRis shown in Fig. 1. We chose a dual-loop architecture [10]: 1) thephase tracking loop is realized as a digital DLL and 2) the refer-ence clock phases of the phase rotators are generatedby an analog DLL. Eight parallel samplers clocked byacquire the four data bits and the four edgesneeded to evaluate the sampling position [16]. Eight parallel 1:4

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1929

demultiplexers reduce the data rate from 10 to 2.5 Gb/s and alignthe sampled bits, which are separated by 1/8th of the period ofthe reference clock signal, to a single 2.5 GHz clock phase. Atthis point, the transition from differential signaling to full-swingCMOS signal levels is performed, since the data are aligned andthe clock period is long enough to design the digital blocks bya standard design flow [17]. The aligned 16 data bitsand 16 edge bits are compared in the edge detector,which solves the 16 Alexander equations [16] and outputs aearly/late signal after majority voting. The digital loop filterthen evaluates the sequence of early and late bits and assertsif needed an up/down signal for the phase rotator. The up/downcounter translates the up/down signal to a thermometer-codedword controlling the four phase rotators. These four phase ro-tators shift the local reference phases so that the sam-pling phases are aligned to the incoming data. Thelocal reference clock phases of the phase rotators aregenerated with a DLL from a single reference clock.

All circuit blocks operating at a clock frequency above2.5 GHz are implemented as current-mode logic (CML) tomeet the speed requirements. In addition, CML circuits have ahigher immunity to supply noise and generate less switchingnoise on the power supply. The proposed CDR macro-cell re-quires only a single reference clock phase reducing complexityand power consumption of the reference clock distributionnetwork. The four 10 Gb/s data bits are buffered andfed to output pins for testing and measurement purposes.

III. CDR BUILDING BLOCKS

A. Samplers

In this 2 -oversampling quarter-rate CDR, the front-endsampling latch that is present in each sampler is the mostspeed-critical building block. The front-end sampling latch hasto be able to track the incoming 40 Gb/s signal, sample the datawith a 10 GHz clock signal, and then decide if the voltage atits input is below or above a threshold voltage within a timeperiod of half of a 10 GHz clock period. The latch following thesampling latch has relaxed speed constraints because it operatesat a reduced data rate of 10 Gb/s. Together with the samplinglatch it forms a master–slave flip-flop (MS-FF) and provides astable output that is valid during a full 10 GHz clock period.

At the input of the eight samplers the data signal should haverise and fall times , that are shorter than one half of the bittime . Based on first-order RC-circuit analysis

, the total input capacitanceof the eight samplers including wiring capacitance and pad hasto be kept under

(1)

allowing an input capacitance of 10 fF per sampler excludingwire and pad capacitance of 30 fF and 70 fF, respectively.

A sampling latch consisting of a track-and-hold stage, im-plemented as NMOS pass transistors, followed by a latch [13],[18] has not been chosen because of the rigorous requirement ofthe clock signal. These requirements are short clock fall time toachieve a high time resolution [19], high common-mode voltage

of the sampling clock, and large clock swing to fully switch onand off the pass gate. Since all other circuits use CML signaling,differential CML latches are preferred. Samplers composed ofCML latches implemented in 90 nm CMOS are able to regen-erate a 40 Gb/s data signal [9], [20].

Fig. 2(a) shows the block diagram of our sampler, whichconsists of a front-end sampling latch [Fig. 2(b)], a slave latch[Fig. 2(c)], and a CML buffer. The CML latches and the CMLbuffer are fully differential circuits to achieve a higher immu-nity to power supply variations than pseudo-differential circuits[8], [21], [22]. The sample transistors of the front-end sam-pling latch are limited in size ( m and nm)since the input capacitance of each sampler has to be lower than10 fF. In order to reduce rise and fall times at the output, theload resistors have to be decreased and the tail current increased.Therefore, the widest transistor that keeps the input capacitancebelow 10 fF has been chosen. The tail current has to provide acurrent of 1.54 mA to bias the transistors m ata current density of 0.11 mA m, which has been evaluated bysimulation to achieve peak . In order to provide enough regen-erative gain to fully switch the following differential pair and toguarantee enough noise margin, a voltage swing of 600 mV isrequired, resulting in a load resistance . The latchtransistors and the sample transistors have equal tran-sistor dimensions so that both have current densities of peakand present the same load to the clock transistors . The tran-sistor pair of the differential clocking stage has to steer thecurrent at a lower frequency than the transistor pair , .To guarantee full current switching with a typical CML clocksignal, the clock transistors are by a factor 1.5 wider than

and in order to reduce the required signal swing forproper switching of the differential pair .

With a fan-out of 1, this sampling latch is able to regeneratethe input data up to 32 Gb/s. This configuration is defined ascase I. Because the second CML latch in the MS-FF configura-tion of Fig. 2(a) has to process a four times lower data rate, thedimensions of its devices have been scaled by a factor of 1/3 inorder to reduce the capacitive load of the first latch. In this con-figuration (case II) a maximal data rate of 37 Gb/s is achieved,because the tracking bandwidth of the sampling latch has beenincreased and its regeneration time [23, eq. (1)] has been re-duced. To further increase the bandwidth of the sampling latch,shunt peaking inductors [24] are introduced (case III) as shownin Fig. 2(b). With integrated inductors of nH, whereone inductor occupies an area of 20 20 m using the two top-most metal layers [25], [26], the tracking bandwidth of the firstCML latch is extended by a factor 1.2 enabling the sampling ofinput data up to 44 Gb/s. A CML buffer after the second latch isneeded to drive not only the demultiplexer but also the 10 Gb/soutput driver. Even though the first latch of the sampler incorpo-rates two shunt-peaking inductors, the layout of one sampler isstill very compact and occupies an area of 50 m 35 m only,where the two inductors occupy 46% of the area.

To quantify and compare the sensitivity, timing resolution,and bandwidth of these three sampling front-ends (I, II, III) eachof them has been characterized with the procedure described in[27]. The idea is that the latch can be separated in a linear por-tion, described by an integration window, and an ideal sampler

1930 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

Fig. 2. (a) Sampler. (b) Inductive shunt-peaked first CML latch. (c) Second CML latch.

Fig. 3. (a) Required voltage �����, (b) sensitivity function � and (c) normalized transfer function �� ���� for the three cases: (I) 1st CML latch with fanout1, no �; (II) Fig. 2(a), no �; (III) Fig. 2(a).

plus decision-dependent feedback [27]. By this, the bandwidthlimitations of the sampling stage and the impact of the finiteslew rate of the clock (0.4 V/32 ps in this case) can be includedin the transfer function of the overall data path. The integra-tion window is derived by measuring the sensitivity for a shortvoltage pulse as a function of the sampling time relative to theclock edge. Fig. 3(a) illustrates the amplitude that is justsufficient to flip the latch for a given time offset with re-spect to the latch clock. The sensitivity function , or moreprecisely the function one divided by the sensitivity per ps, isdepicted for the three cases in Fig. 3(b). The sensitivity windowof the CML latch with peaking inductors is slightly smaller thanthe others, indicating superior time-resolution capability. More-over, the CML latch with peaking inductors (III) has the best

DC input sensitivity voltage [27, eq. 29]. The transfer func-tion is derived by taking the Fourier transfer functionof the sensitivity function . The normalized transfer func-tion normalized for the target sensitivity of 5 mV isshown in Fig. 3(c). The sampler (III) shown in Fig. 2(a) has thehighest equivalent 3 dB bandwidth.

B. Data Alignment and 8:32 Demultiplexer

The 8:32 demultiplexer block following the samplers consistsof eight parallel paths where each is built by a cascade of one 1:2demultiplexer and one 2:4 demultiplexer. Eight 10 Gb/s inputsignals at the CML level, which are separated by 1/8th of the pe-riod of the reference clock signal, are converted to 32 output sig-nals at full-swing CMOS levels at a data rate of 2.5 Gb/s. These

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1931

Fig. 4. Block diagram of the digital control circuit.

output signals are aligned to a single 2.5 GHz clock CMOSsignal. The 2.5 GHz clock is derived from the 10 GHz samplingclock and serves as clock for the digital logic. The design goalfor this alignment is to balance the loading of all sampling clockphases , without inserting any dummy elements. Dif-ferent capacitive loads connected to the clock signalswould potentially lead to phase shifts, which result in inaccura-cies of the sampling points. Simply resampling the input signalsof the demultiplexer by one of the clock phases is notpossible since the clock phase used for sampling would betoo heavily loaded, and furthermore, the timing margins in thelatches of the first demultiplexer stage would be too small. Inorder to increase the timing margins, the first four samples ,

, , and are delayed by one half of a clock period. Al-though an additional 50 ps of timing margin is obtained, correctoperation is still not guaranteed for all process corners whenall eight signals are sampled with one clock phase. We there-fore used the and phases of the divided clock at 5 GHz,where each of them samples four input signals. This adds an-other 25 ps of timing margin. The frequency divider can be de-signed to present a small capacitive load to one of the samplingclock phases. This minimum load is the only loading imbal-ance of the clock phases . Moreover, using the andphases further leads to symmetrical loads connected to the fre-quency divider, which is favorable in terms of speed. However,at the output of the first demultiplexer stage, the data signalsare not aligned yet. The data alignment can easily be achievedby sampling the signals with a single 2.5 GHz clock signal atthe input of the second demultiplexer stage. At a nominal datarate of 2.5 Gb/s, the timing margin is large enough (150 ps).The presented alignment procedure results in a minimum im-balanced loading of the sampling clock signals, and it is robustto process, voltage, and temperature (PVT) variations becauseof the large timing margins.

C. Digital Control Loop With Rate Selection

Fig. 4 illustrates the block diagram of the digital controllogic, which offers the option to select between three differentinput data rates. All these circuit blocks, which run at 2.5 and1.25 GHz, are synthesized circuits and are placed and routedwith a digital design tool.

The edge detector solves the Alexander equations [16]

(2)

for 16, 8, or 4 data/edge pairs depending on the selected datarate. The detector outputs a single early or late signal after ma-jority voting. In order to relax the speed requirements for thedigital CMOS loop filter, the output signals of the edge detectorare demultiplexed by a factor of two. The loop filter, running at1.25 GHz, is realized as a finite state machine (FSM), which ac-cumulates the EARLY[1:0] and LATE[1:0] bits. The state ma-chine can make zero, one, or two steps depending on the dif-ference of the number of EARLY[1:0] and LATE[1:0] bits. Thestate machine consists of twelve states, arranged as two circles.After running through one circle of six states, an up or downsignal, respectively, is generated by the FSM. These up anddown signals increment or decrement the thermometer-codedup/down counter value, which controls the phase rotator. Sincedouble steps are possible, the state machine needs at least threeclock cycles between two consecutive up or down impulses.Hence, the maximum update rate of the phase rotator for thenominal data rate is

(3)

A higher update rate, which would increase the jitter tolerance,could be reached by reducing the number of states in the FSM.

1932 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

Fig. 5. Principle of the rate selection: Quarter-rate (QR), half-rate (HR), and full-rate (FR) mode. �: sample points; �: discarded samples.

However, the minimum number of states is given in order notto induce false phase steps due to the overall delay of the CDRloop.

A special feature of our digital control logic is the capabilityto support different data rates. The logic responsible for the datarate selection is implemented in the edge detector as shown inFig. 4 and operates as depicted in Fig. 5. Quarter-rate (QR) op-eration is used for an input data rate from 23 to 44 Gb/s. Theearly/late generation logic generates for each of the 16 data/edgebit pairs an early/late signal by solving the Alexander equations[16]. When the data rate is lower and the bit length larger, be-tween 11.5 and 23 Gb/s, the CDR operates in half-rate (HR)mode. The edge samples used in the quarter-rate modeare omitted and only the data samples are evaluated.In this mode, the even data samples take the role of the edge bitsand the odd data samples are still data bits. From these eightdata/edge pairs, the early/late information is generated. For astill lower input data rate ranging from 5.75 to 11.5 Gb/s, thefull-rate (FR) mode is possible. Here, every other sample of theodd data samples are alternately used as a data and an edge bit,respectively. Hence, our receiver can cover the full range of datarates from 5.75 to 44 Gb/s, even though the multi-phase DLL,which generates the reference clock phases , is bandlimited. The DLL operates from 5.75 to 11.5 GHz and limitsthe lower data rate of the CDR.

D. Phase Rotator

In order to update the sampling position, we use four parallelphase rotators which are controlled by the thermometer-codedup/down counter. Using a full thermometer code, discontinu-ities in the phase rotator transfer characteristics can be avoided.The reference clock phases generated in the DLL arefed to the four phase rotators. One phase rotator as shown inFig. 6 consists of a phase selection stage followed by a phaseinterpolation stage [10]. The first stage, consisting of two 4:1multiplexers [Fig. 7(a)], selects two clock phases from twoadjacent phase octants. The interpolation process with eightclock phases results in a better phase linearity compared tointerpolation schemes using six or I/Q phases [12]. The phase

Fig. 6. Block diagram of one phase rotator.

interpolator [Fig. 7(b)] is a dual input differential amplifierand blends the two selected phases according to the 8-bit ther-mometer-coded value . Retiming flip-flops betweenthe up/down counter and the phase rotator guarantee that allcontrol signals , , changetheir states at the same time, thus avoiding phase glitches.The common-mode outputs of the selector and the interpolatorare regulated by a replica bias as all CML circuits of thisCDR. An important practical requirement is that amplitudeand common-mode voltage of the sampling clock always havetheir correct amplitude and voltage level - even after start-up -to assure the presence of the CDR system clock. This impliesthat the control signals , , areinitialized correctly.

The eight interpolation steps together with the eight inputclock phases result in a total of 64 phase steps. Hence,one phase step amounts to

(4)

and the ideal phasor of the output signal of the phaserotator having equal phase steps is

(5)

where parameter denotes the amplitude. The value of in-teger ranges in the interval , where is thenumber of interpolation steps. Since the phase interpolation isachieved by adding two signals with different phases and not bya real rotation of the phase of a single signal, the interpolated

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1933

Fig. 7. (a) Schematic of the 4:1 phase selector. (b) Schematic of the phase interpolator (type-I).

Fig. 8. (a) Equal versus interpolated phase steps of one octant of the 360 circle and phase error � . (b) Simulated phase step (DNL) and (c) absolute phase error(INL) for type-I and type-II phase interpolators with a 10 GHz clock.

output signal is not equal to the ideal output signaland is

(6)

The interpolated phase steps in (6) and equal phase steps in (5)are calculated and displayed as and , respectively, for oneoctant in Fig. 8(a). The maximum deterministic interpolationphase error is 0.5 and is by a factor of tensmaller than one phase step. Furthermore, the phase steps varybecause inputs and output are not fully isolated due to capacitivefeedback. The simulated phase steps (DNL) and absolute phaseerror (INL) for a clock frequency of 10 GHz are shown for our

implementation, a type-I phase interpolator, and a type-II phaseinterpolator in Fig. 8(b) and (c), respectively. A type-I phaseinterpolator has a common-source stage as shown in Fig. 7(b).A type-II phase interpolator incorporates a cascode stage [10,Fig. 10]. The maximum phase step of a type-I phase interpolatoroccurs due to the parasitic effect of capacitive coupling betweengate and drain when the interpolation boundary is reached andthe output clock of the 4:1 multiplexer is switched, e.g., fromto . Although the alternative design (type-II) has a better iso-lation property, it was not used since it has a too low unity gainfrequency under worst-case process condition. Furthermore, ithas been reported that a type-II phase interpolator has a morenonlinear transfer characteristic at lower clock frequencies [10,

1934 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

Fig. 9. Simulated transfer characteristic of the four phase rotators.

Fig. 11]. The simulated transfer function of the four phase ro-tators with a clock frequency of 10 GHz and an update rate of1.25 GHz is depicted in Fig. 9 and reveals no deterministic phaseoffset between the four clock phases. At lower clock frequenciesthe transfer characteristic becomes slightly more nonlinear dueto sharper clock edges and wider spacing between the interpo-lating edges.

A total of 64 phase steps for one 100 ps reference clockperiod or 16 steps for one data unit interval (UI) of 25 ps areprovided, resulting in a nominal timing resolution of 1.56 ps.When the phase rotator is updated with the maximum updaterate evaluated in (3), the maximal possible fre-quency offset between TX and RX clocks that can betracked correctly is

(7)

or expressed in parts per million (ppm): ppmppm.

Besides the frequency offset, which can be tracked, the jittertolerance is the second key parameter for CDRs employed inchip-to-chip communication. For a first-order phase trackingCDR, the maximum jitter amplitude that the CDR can tol-erate is limited by

(8)

and is inversely proportional to the jitter frequency .Jitter tolerance can be increased by a higher update rate

or a larger phase step , where the latterincreases the dither amplitude of the loop.

E. Delay-Locked Loop

A DLL operating between 5.75 and 11.5 GHz generates theeight clock phases for the four phase rotators. Com-pared to a PLL implementation, a DLL solution is preferred dueto better immunity to on-chip noise because a voltage-controlled

delay line (VCDL) does not suffer from cycle-to-cycle accumu-lated jitter as a voltage-controlled oscillator does. Any accumu-lated jitter created by supply or substrate noise is corrected whena clean reference clock edge arrives at the input of the VCDL.

Differential CML delay elements have been used in theVCDL to achieve short gate delays with low supply and sub-strate noise sensitivity. Four delay elements are sufficient togenerate the eight clock phases . If we assume a con-stant phase error for each delay element, the phase errorssum up after each delay element. The maximum phase erroroccurs between the phases , and , and amountsto . In order to sample the data bits correctly, the phasedifference between two sampling clock phases , and ,

has to be in the range . Fig. 10illustrates how the phase tracking loop of the CDR aligns theedges of sampling clocks to the incoming data streamfor the two extreme cases. For a positive and a negative phaseerror , the absolute phase error has to be lower thanone fourth of the clock phase margin (CPM) of a sampler. At11 GHz the maximal tolerable phase error per delay elementhas to be lower than 4.2 .

In a DLL, a single loop integrator suffices to drive the steady-state phase error to zero. A typical DLL consists of a VCDL,phase detector, charge pump, and loop filter [10], [28]. In re-ality, charge pump DLLs have a static phase offset between thetwo clock signals at the phase detector input mainly due to themismatch between the charge pump’s up and down currents.Thus, the steady-state phase error is not zero and depends onthe non-idealities of the charge pump.

The block diagram of the implemented DLL is shown inFig. 11(a). Its structure is similar to the typical DLL but thecharge pump is replaced by a differential operational transcon-ductance amplifier (OTA). The OTA is a voltage-to-currentconverter that pumps current in and out of its load ca-pacitance and is therefore equivalent to a charge pumpthat steers a current in and out of the loop filter capacitance.Fig. 12(a) illustrates a possible linear time-invariant phasedomain model of the DLL [29] where is the gainof the phase detector and the OTA [30]. Since the OTA has afinite output resistance, the steady-state error of the DLL isnon-zero and is inversely proportional to the loop gain. In orderto determine the steady-state phase error, the phase domainmodel shown in Fig. 12(a) cannot be applied since it neglectsthe output resistance of the OTA. The phase domain modelillustrated in Fig. 12(b) includes all gain stages and all poles,the dominant pole and higher order poles. The dominant pole is

and is formed by the output impedance of the OTA andthe input capacitance of the control element (CE). Hence, theOTA limits the bandwidth of the loop and determines the loopgain.

Dummy delay elements are added in front of for signalconditioning and behind to provide the same load capaci-tance for all delay elements in the delay line. Thetunable delay elements are implemented using phase interpo-lation technique [31]–[33] as shown in Fig. 11(b) and (c). Thegate delay of the delay element can be tuned proportionally to

between the delay of the interpolator alone and thesum , where is the delay of the buffer inserted in the

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1935

Fig. 10. Maximum allowed phase error � for the quarter-rate 2�-oversampling CDR.

non-direct signal path. This results in a maximal tuning factorof

(9)

The tuning factor has been reduced to 1.9 to preventerroneous phase locking over all process corners while usinga 90 XOR phase detector [34]. This tuning factor limits therange for exact equidistant clock phases to frequencies from 6to 11.5 GHz. At 5.75 GHz a systematic phase error of 2.88per delay element is introduced but does not compromise theoperation of the CDR because at the lower data rates, the CPMfor sampling is larger.

The 90 XOR phase detector based on the Gilbert cell multi-plier, shown in Fig. 13(a), is sufficient to perform a direct phasedetection at 11 GHz. The circuit is simpler than edge-triggeredphase detectors and consequently consumes less power as well.The differential output signal PX is the result of the multiplica-tion of the two differential input phase signals PH1 and PH2.The DC component of the signal PX is then proportional to thephase difference between the inputsignals PH1 and PH2. Different propagation times andfrom the input ports PH1 and PH2 to the output node of theGilbert cell multiplier causes a systematic offset in the transfer

function of the phase detector as shown in Fig. 13(b) and (c).This offset generates an intolerable phase error of 9 4.2per delay element. To compensate this error, we implemented asymmetrical phase detector PD formed by two Gilbert cell mul-tipliers PD1 and PD2 as shown in Fig. 11(a). The input signalsof PD2 are swapped with respect to PD1, generating a nega-tive offset and thus compensating the offset of PD1 as shownin Fig. 13(c). This symmetrical phase detector is connected tothe VCDL in such a way that the phase detector PD1 comparesthe phases P0 and P2, while phase detector PD2 compares thephases P1 and P3. This connection scheme leads to equallyloaded delay elements and removes the systematicphase errors. The high-frequency components of the output sig-nals PX1 and PX2 of the Gilbert cell multipliers are low-passfiltered before they are summed to generate the control signalPX. The filtering reduces the high-frequency amplitude of PXand prevents potential saturation of the input stage of the dif-ferential OTA. The OTA provides sufficient gain (37 dB) to thecontrol loop to keep the steady-state phase error of one delayelement below 0.4 (determined with the final value theorem[34]). Successively, the control element (CE) converts the dif-ferential output voltage of the differential OTA into the differ-ential control voltage of the VCDL. Compared to acharge pump solution, a linear and differential high-gain ampli-fier in the control loop has been preferred in order to minimize

1936 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

Fig. 11. (a) DLL block diagram. (b) Simplified schematics of one delay element. (c) Schematics of the interpolator.

Fig. 12. Phase domain model: (a) with integrator (OTA) and loop filter � , (b) including finite output resistance of OTA and higher order poles.

the generated switching noise. Moreover, the whole control loopof the DLL has been implemented differentially to reduce the in-fluence of common mode, substrate, and power supply noise inthe DLL and generate clean clock phases with low jitter.

Measurements performed on a separate test chip confirm thelow noise and high PSRR of the implemented DLL. At 10 GHz,a peak-to-peak jitter below 2 ps has been measured on the single-ended output p2a as shown in Fig. 14(a). This value is compa-rable to the jitter of the input reference clock. A supply noise of100 mV modulated at 5 MHz amounts to a peak-to-peak jitterof the single-ended output p2a of 5.6 ps ( 0.22 UI at 40 Gb/s)as reported in Fig. 14(b). But when considering the differentialsignal , the peak-to-peak jitter remains below2 ps ( 0.08 UI at 40 Gb/s) as shown in Fig. 14(c).

The phase error measured between and after the delayelement DE1 is below 2 . Thus, the phase error between and

amounts to 6 . Mismatches between the devices (polysiliconresistors and NMOS transistors) in the differential stages are thecause of DC offset at the output of the delay elements asillustrated in Fig. 15(a). This DC offset propagates through thecascaded differential stages in the DLL, where it gets amplified,and causes a phase error at the output. A maximum DC offsetup to 50 mV can occur after the last delay element of the VCDLproducing an intolerable phase error of 25 ( 0.28 UI). Tosolve this problem, clock buffers are introduced after the DLLto reduce the accumulated DC offset, thereby restoring the re-quired phase precision.

F. Clock Buffer

Clock buffers are placed between the DLL and the phase ro-tator and between the phase rotator and the samplers in order to

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1937

Fig. 13. (a) Schematic of the Gilbert cell multiplier. (b) Simplified block diagram of the Gilbert cell multiplier. (c) Ideal transfer characteristic of the Gilbert cellmultiplier.

Fig. 14. Measured peak-to-peak jitter of the DLL. (a) Single ended signal p2a without power supply noise: 1.8 ps. (b) Single-ended signal p2a with 100 mVmodulated power supply at 5 MHz: 5.6 ps. (c) Differential signal p2a–p2b with power supply noise: 2 ps.

drive the relatively large capacitive loads. Shunt-peaking induc-tors , of 1 nH are used to compensate the large load capac-itance at the output, thus also reducing the power consumptionof the clock buffer by 30%. With the nominal load capacitanceof 100 fF, a gain of 4.5 dB at 10 GHz is achieved. The powerconsumption of the buffer is 5 mW.

The clock buffers were designed to reduce DC offsets gen-erated in the DLL. These DC offsets cause duty cycle distor-tion on differential signals compromising the phase precisionand reducing the clock phase margin (CPM) of the system asshown in Fig. 15(b). Two samplers (sampler0 and sampler4) areclocked with the two complementary phases and

. The DC offset between oa and ob causesa phase error between and , which reduces the CPMof the system.

To reduce the phase error caused by DC offsets, a clock bufferwith regulated output DC levels is implemented. Fig. 15(c)shows the schematic of the implemented clock buffer. Capaci-tive degeneration is used to reduce the gain at low frequencieswithout sacrificing the gain at high frequencies. The DC levelsof the outputs oa and ob are regulated to the same DC level

, set in the bias circuit, reducing the influence of theinput DC offset. For input DC offsets up to 200 mV the phaseaccuracy of the output is improved by a factor of 25 with re-spect to the input signal, thus reducing the maximal phase errorcaused by input DC offset to 0.02 UI. The offset introducedby mismatches between the devices in the clock buffer is 15 mV

corresponding to a phase error of maximal 2.15 . This phaseerror is much smaller than the error of the DLL.

IV. MEASUREMENTS

Our CDR circuit has been fabricated in a 90 nm bulk CMOStechnology and occupies 570 350 m 0.2 mm . Thelayout and the die micrograph of the CDR circuit are shownin Fig. 16. All inputs and outputs are ESD protected exceptthe differential 40 Gb/s data inputs. An ESD protection cir-cuit similar to [35] cannot be placed at the 40 Gb/s input portsince any additional capacitance at the input lowers the inputpole frequency. The CDR is able to lock to a PRBS 2 1 datastream up to 44 Gb/s if the input signal is applied to the chipusing on-wafer probes. The 40 Gb/s input eye diagram witha 10 GHz sinusoidal clock signal is illustrated in Fig. 17(a).The recovered 10 Gb/s data measured on-wafer without ESDprotection and together with the packaged module includingESD protection are illustrated in Fig. 17(b) and (c), respec-tively. Since the recovered 10 Gb/s data signal is the bufferedoutput signal of the front-end MS-FF (Figs. 1 and 2), the eye dia-grams [Fig. 17(b), (c)] for full-, half-, and quarter-rate modes alllook alike. The operating ranges for full-, half-, and quarter-ratemodes cover the data ranges from 5.75 to 11.5 Gb/s, 11.5 to23 Gb/s, and 23 to 44 Gb/s, respectively. For all data rates, thecircuit consumes 230 mA from a 1 V power supply voltage(analog part: 215 mA, digital section: 15 mA). This results inan overhead of power consumption of a factor of two and four

1938 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

Fig. 15. (a) Generation of DC offsets in CML stages. (b) Clock buffer with regulated DC levels at the output. (c) Phase error � generated by DC offset indifferential signals and reduction of the system CPM.

Fig. 16. Chip photo and layout of the CDR.

for half rate and quarter rate mode, respectively, assuming thatthe power consumption ideally scales with the data rate. As afuture feature, this overhead could be reduced to 30% and 50%,respectively, by turning off the unused circuit blocks, e.g., partof samplers, clock buffers, and demultiplexers.

In all operating modes, the maximum frequency offsetthat can be tracked is 615 ppm for a BER of 10 up to38 Gb/s. The limit was set by the measurement setup because the

input pattern from the pattern generator was not error free above38 Gb/s. The value of 615 ppm is sufficient to countervail in-equalities of the clock frequencies of two chips clocked fromdifferent crystal oscillators. Besides the frequency offset, whichcan be tracked, the jitter tolerance is the second key parameterfor CDRs employed in chip-to-chip communication. Since ourjitter tolerance measurement setup was limited to 27 Gb/s, thejitter tolerance measurements have been performed only withthe IC mounted on a substrate (Fig. 18) at its maximum datarate of 24 Gb/s. This maximum is limited by losses and mis-matches of the 1.6 cm input line on the substrate. To illustratethese effects, measured eye diagrams of the 24 Gb/s data streamat the input and output of this line are depicted in Fig. 18. Twodifferent measurement setups are needed for the jitter tolerancetest. Setup I has been used for jitter frequencies between 10 kHzand 1 MHz where the incoming data signal is directly modu-lated. At jitter frequencies above 1 MHz, it was not possibleto modulate the data anymore and setup II has to be used. Insetup II the system clock of the CDR has been modulated rel-ative to the incoming data. The measured jitter tolerance plotat 24 Gb/s of the packaged CDR and the extended jitter tol-erance mask for XAUI [36] are illustrated in Fig. 19. For alljitter frequencies and all jitter amplitudes, the XAUI mask canbe fulfilled by our circuit. The dip around the jitter frequency of

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1939

Fig. 17. (a) 40 Gb/s input data, 10 GHz sinusoidal clock signal. (b) Recovered 10 Gb/s data measured on-wafer without ESD protection. (c) Recovered 10 Gb/sdata measured with the packaged module including ESD protection.

Fig. 18. Eye diagram of a 24 Gb/s data stream at the input of the package (lefteye diagram) and at the pad of the circuit (right eye diagram).

Fig. 19. Jitter tolerance of the packed CDR at 24 Gb/s achieving a BER�10 .

20 MHz, where the maximum jitter amplitude, which the CDRcan tolerate, is lower than the clock phase margin of the sam-pler, is due to the loop delay mainly caused by pipelining stagesin the digital part.

Finally, Table III shows a comparison with previously pub-lished 40 Gb/s CMOS CDRs with analog [8], [15] or digital loopfilters [9], [13], [14]. Fully analog CDRs are area consuming,however dissipate less power but have a larger BER 10compared to [9] and [14]. Among the three CDRs with a digital

TABLE III40 GB/S CMOS CDRS

loop filter, our CDR covers the largest range of data rates. Fur-thermore, it consumes less power (30%) and has a smaller chiparea than the 3 -oversampling CDR with an integrated 1:16DEMUX [9]. Only the circuit in [13] reaches superior perfor-mance with respect to power and area, but uses a more advancedand expensive SOI CMOS technology that allows to implementalso the speed-critical circuit blocks in CMOS logic instead ofthe more power- and area-consuming CML logic.

V. SUMMARY

A semi-digital clock-data-recovery circuit implemented in90 nm bulk CMOS for 40 Gb/s chip-to-chip communicationis presented. Thanks to the novel rate selection feature in thefully digital loop filter, a very large data rate range from 5.75 to44 Gb/s can be covered. From 5.75 to 38 Gb/s a BER 10is achieved even for a frequency offset of 615 ppm and datajitter amplitudes above the XAUI mask. Measurement results ofthe DLL circuit showed that differential signaling in the clockpath keeps jitter generation caused by power supply noise low.By inductive shunt-peaking in the speed-critical blocks, like thesamplers and the clock buffers, the required high bandwidth isreached at a low power consumption. The power consumptionper data rate of 5.3 mW/(Gb/s) of the proposed CDR is belowthe ITRS power budget requirement (Table I) for high-speedtransceivers implemented in 90 nm CMOS technology.

ACKNOWLEDGMENT

The authors thank R. Brun, D. Holzer for the design ofthe digital logic, T. Toifl, H. Schmid, D. Müller, S. Schmid,P. Looser, D. Barras, C. Kromer, C. Menolfi, T. Morf, M. Kossel

1940 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 7, JULY 2009

and J. Weiss for fruitful discussions, and M. Lanz and M. Witzigfor bonding.

REFERENCES

[1] The International Technology Roadmap for Semiconductors (2004 Up-date): Test and Test Equipment. ITRS, 2004 [Online]. Available: http://www.itrs.net/Links/2004Update/2004_02_Test.pdf

[2] K. L. J. Wong, H. Hatamkhani, M. Mansuri, and C. K. K. Yang, “A27-mW 3.6-Gb/s I/O transceiver,” IEEE J. Solid-State Circuits, vol. 39,no. 4, pp. 602–612, Apr. 2004.

[3] J. Poulton, R. Palmer, A. M. Fuller, T. Greer, J. Eyles, W. J. Dally,and M. Horowitz, “A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS,”IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2745–2757, Dec. 2007.

[4] J. F. Bulzacchelli, M. Meghelli, S. V. Rylov, W. Rhee, A. V. Rylyakov,H. A. Ainspan, B. D. Parker, M. P. Beakes, A. Chung, T. J. Beukema, P.K. Pepeljugoski, L. Shan, Y. H. Kwark, S. Gowda, and D. J. Friedman,“A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS tech-nology,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2885–2900,Dec. 2006.

[5] K. Fukuda, H. Yamashita, F. Yuki, M. Yagyu, R. Nemoto, T. Take-moto, T. Saito, N. Chujo, K. Yamamoto, H. Kanai, and A. Hayashi,“An 8 Gb/s transceiver with 3x-oversampling 2-threshold eye-trackingCDR circuit for�36.8 dB-loss backplane,” in IEEE ISSCC Dig. Tech.Papers, 2008, pp. 98–598.

[6] C. Kromer, G. Sialm, C. Berger, T. Morf, M. L. Schmatz, F. Ellinger, D.Erni, G.-L. Bona, and H. Jäckel, “A 100 mW 4� 10 Gb/s transceiverin 80-nm CMOS for high-density optical interconnects,” IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2667–2679, Dec. 2005.

[7] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, “A 90 nm CMOS16 Gb/s transceiver for optical interconnects,” in IEEE ISSCC Dig.Tech. Papers, 2007, pp. 44, 586.

[8] J. Lee and B. Razavi, “A 40-Gb/s clock and data recovery circuit in0.18-�m CMOS technology,” IEEE J. Solid-State Circuits, vol. 38, no.12, pp. 2181–2190, Dec. 2003.

[9] N. Nedovic, N. Tzartzanis, H. Tamura, F. M. Rotella, M. Wiklund,Y. Mizutani, Y. Okaniwa, T. Kuroda, J. Ogawa, and W. W. Walker,“A 40–44 Gb/s 3x oversampling CMOS CDR/1:16 DEMUX,” IEEE J.Solid-State Circuits, vol. 42, no. 12, pp. 2726–2735, Dec. 2007.

[10] S. Sidiropoulos and M. A. Horowitz, “A semidigital dual delay-lockedloop,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1683–1692,Nov. 1997.

[11] R. Kreienkamp, U. Langmann, C. Zimmermann, T. Aoyama, and H.Siedhoff, “A 10-Gb/s CMOS clock and data recovery circuit with ananalog phase interpolator,” IEEE J. Solid-State Circuits, vol. 40, no. 3,pp. 736–743, Mar. 2005.

[12] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H.Jackel, “A 25-Gb/s CDR in 90-nm CMOS for high-density intercon-nects,” IEEE J. Solid-State Circuits, vol. 41, no. 12, pp. 2921–2929,Dec. 2006.

[13] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf,J. Weiss, and M. Schmatz, “A 72 mW 0.03 mm inductorless 40 Gb/sCDR in 65 nm SOI CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2007,pp. 226–598.

[14] G. v. Büren, L. Rodoni, A. Huber, R. Brun, D. Holzer, M. Schmatz, andH. Jäckel, “5.75 to 44 Gb/s quarter rate CDR with data rate selectionin 90 nm bulk CMOS,” in Proc. ESSCIRC, 2008, pp. 166–169.

[15] C. F. Liao and S. I. Liu, “40 Gb/s transimpedance-AGC amplifier andCDR circuit for broadband data receivers in 90 nm CMOS,” IEEE J.Solid-State Circuits, vol. 43, no. 3, pp. 642–655, Mar. 2008.

[16] J. D. H. Alexander, “Clock recovery from random binary data,” Elec-tron. Lett., vol. 11, pp. 541–542, Oct. 1975.

[17] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, “High-speed elec-trical signaling: Overview and limitations,” IEEE Micro, vol. 18, pp.12–24, 1998.

[18] C.-K. K. Yang and M. A. Horowitz, “A 0.8 �m CMOS 2.5 Gb/s over-sampling receiver and transmitter for serial links,” IEEE J. Solid-StateCircuits, vol. 31, no. 12, pp. 2015–2023, Dec. 1996.

[19] H. O. Johansson and C. Svensson, “Time resolution of NMOS sam-pling switches used on low-swing signals,” IEEE J. Solid-State Cir-cuits, vol. 33, no. 2, pp. 237–245, Feb. 1998.

[20] Y. Okaniwa, H. Tamura, M. Kibune, D. Yamazaki, C. Tsz-Shing, J.Ogawa, N. Tzartzanis, W. W. Walker, and T. Kuroda, “A 40-Gb/sCMOS clocked comparator with bandwidth modulation technique,”IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 1680–1687, Aug. 2005.

[21] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H.Tamura, and H. Onodera, “40 Gb/s 4:1 MUX/1:4 DEMUX in 90nm standard CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2005, pp.152–590.

[22] T. Chalvatzis, K. H. K. Yau, P. Schvan, M. T. Yang, and S. P.Voinigescu, “A 40-Gb/s decision circuit in 90 nm CMOS,” in Proc.ESSCIRC, 2006, pp. 512–515.

[23] G. v. Buren, L. Rodoni, C. Kromer, H. Jackel, A. Huber, and T. Morf,“Low power sampling latch for up to 25 Gb/s 2x oversampling CDR in90 nm CMOS,” in Proc. ESSCIRC, 2006, pp. 106–109.

[24] S. S. Mohan, M. d. M. Hershenson, S. P. Boyd, and T. H. Lee, “Band-width extension in CMOS with optimized on-chip inductors,” IEEE J.Solid-State Circuits, vol. 35, no. 3, pp. 346–355, Mar. 2000.

[25] F. Ellinger, M. Kossel, M. Huber, M. Schmatz, C. Kromer, G. Sialm,D. Barras, L. Rodoni, G. v. Buren, and H. Jackel, “High-Q inductorson digital VLSI CMOS substrate for analog RF applications,” in Proc.IEEE Int. Microwave and Optoelectronics Conf. (IMOC), 2003, vol. 2,pp. 869–872.

[26] C. Kromer, “10 Gb/s to 40 Gb/s receiver for high-density optical in-terconnects in 80 nm CMOS,” Ph.D. dissertation, Swiss Federal Inst.Technol. (ETH), Zurich, Switzerland, 2005, ETH No. 16347.

[27] T. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M.Kossel, T. Morf, J. Weiss, and M. L. Schmatz, “A 22-Gb/s PAM-4 re-ceiver in 90-nm CMOS SOI technology,” IEEE J. Solid-State Circuits,vol. 41, no. 4, pp. 954–965, Apr. 2006.

[28] J. G. Maneatis, “Low-jitter process-independent DLL and PLL basedon self-biased techniques,” IEEE J. Solid-State Circuits, vol. 31, no.11, pp. 1723–1732, Nov. 1996.

[29] J. R. Burnham, G.-Y. Wei, C.-K. K. Yang, and H. Hindi, “A compre-hensive phase-transfer model for delay-locked loops,” in Proc. IEEECustom Integrated Circuits Conf. (CICC), 2007, pp. 627–630.

[30] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, R. Reute-mann, M. Ruegg, M. L. Schmatz, and J. Weiss, “A 0.94-ps-RMS-jitter0.016-mm 2.5-GHz multiphase generator PLL with 360 digitallyprogrammable phase shift for 10-Gb/s serial links,” IEEE J. Solid-StateCircuits, vol. 40, no. 12, pp. 2700–2712, Dec. 2005.

[31] B. Lai and R. C. Walker, “A monolithic 622 Mb/s clock extraction dataretiming circuit,” in IEEE ISSCC Dig. Tech. Papers, 1991, pp. 144–306.

[32] M. Soyuer, J. F. Ewen, and H. L. Chuang, “A fully monolithic 1.25 GHzCMOS frequency synthesizer,” in Symp. VLSI Circuits Dig., 1994, pp.127–128.

[33] J. Savoj and B. Razavi, “A 10-Gb/s CMOS clock and data recoverycircuit with a half-rate linear phase detector,” IEEE J. Solid-State Cir-cuits, vol. 36, no. 5, pp. 761–768, May 2001.

[34] R. E. Best, Phase-Locked Loops: Design, Simulation and Applications,4th ed. New York: McGraw-Hill, 2003.

[35] M. Kossel, C. Menolfi, J. Weiss, P. Buchmann, G. von Bueren, L.Rodoni, T. Morf, T. Toifl, and M. Schmatz, “A T-coil-enhanced 8.5Gb/s high-swing sst transmitter in 65 nm bulk CMOS with ��16 dBreturn loss over 10 GHz bandwidth,” IEEE J. Solid-State Circuits, vol.43, no. 12, pp. 2905–2920, Dec. 2008.

[36] IEEE Standard for Information Technology: Media Access Control(MAC) Parameters, Physical Layers, and Management Parameters for10 Gb/s Operation, IEEE Std. 802.3ae-2002, 2002, pp. 0_1-516.

Lucio Carlo Rodoni (S’03) was born in Biasca,Switzerland, in 1971. He received the Dipl. Ing.(M.S.) degree in electrical engineering from theSwiss Institute of Technology (ETH) Zürich,Switzerland, in 1998.

From 1998 to 2000, he was with Mandozzi Elec-tronics Inc., where he was involved in the develop-ment of digital audio mixers and 2 Mb/s transmissionsystems for audio and data. From 2000 to 2002, hewas a Research Engineer with TChip Inc. developingglobal positioning system (GPS) RF front-end chips.

Since 2002, he has been a member of the RF Integrated Circuit (RFIC) Group,Electronics Laboratory, ETH Zürich, Switzerland. Between October 2006 andMarch 2007, he was with IBM Zurich Research Laboratory, involved in a se-ries-source terminated transmitter project. His main interests are integrated cir-cuits for high-speed interconnect applications.

RODONI et al.: A 5.75 TO 44 Gb/s QUARTER RATE CDR WITH DATA RATE SELECTION IN 90 nm BULK CMOS 1941

George von Büren (S’03) was born in Zürich,Switzerland, in 1974. He received the Dipl. Ing.(M.S.) degree in electrical engineering from theSwiss Federal Institute of Technology (ETH) Zurich,Switzerland, in 1999.

From 1999 to 2002, he was with u-blox Inc.,where he was involved in the development ofembedded computers and GPS receivers. In 2002,he joined the Electronics Laboratory, ETH Zurich,as a Research Assistant to pursue his Ph.D. degreein collaboration with the IBM Zurich Research

Laboratory in Rüschlikon. From October 2006 to March 2007, he was withIBM Zurich Research Laboratory developing on a series-source terminatedtransmitter in 65 nm CMOS. His research interests are the field of analog andmixed-signal design, with current focus on PLLs and clock and data recoverycircuits for serial I/O-links.

Alex Huber (S’93–M’00) was born in Zürich,Switzerland, in 1967. He received the Dipl. Ing.degree and the Ph.D. degree in electrical engineeringform the Swiss Federal Institute of Technology(ETH), Zürich, in 1993 and 2000, respectively.

From 1993 to 2000, he was with the ElectronicsLaboratory, ETH Zürich, as a Research Assistant,where he worked on RF circuit design and modelingof InP/InGaAs HBT devices. Since October 1999,he has been with the Institute of Microelectronicsof the University of Applied Sciences Northwestern

Switzerland, Windisch, Switzerland. His main research interests includelow-power and high-speed integrated circuits in CMOS technologies for sensorand communication applications.

Martin L. Schmatz (S’94–M’97) was born on May8th, 1967, in St. Gallen, Switzerland. He received thedegree in electrical engineering in 1993 and the Ph.D.degree in 1998, both from the Swiss Federal Insti-tute of Technology (ETH), Zürich, for his work onlow-power wireless receiver designs and on noise-pa-rameter measurement systems. He received the ETHmedal for his diploma work and the ETH-SEU awardfor outstanding research activities in 1995.

In 1999, he joined the IBM Zürich Research Labo-ratory, where he established and managed a research

team working on high-speed and high-density CMOS serial-link systems. Bymid of 2008, he took over management responsibilities for the Systems Depart-ment at the IBM Zürich Research Laboratory with focused research on a widerange of server systems building blocks. He is a member of the IBM Academyof Technology and also manages the IBM-ETH Center for Advanced SiliconElectronics (CASE).

Heinz Jäckel (M’82) received the Ph.D. degree inelectrical engineering at the ETH Zurich in 1979.

In 1980, he joined the IBM Research Divisionwhere he held scientific and management positionsfor 13 years in IBM Rüschlikon, Switzerland, andIBM Yorktown Heights, USA. During this time hecarried out research in superconducting Josephsonjunction computers, GaAs-MESFET ICs, and op-toelectronics. He has been a full Professor at theElectronics Laboratory of the Swiss Federal Instituteof Technology, ETH Zurich, since 1993, heading

the High Speed Electronics and Photonics group (http://www.ife.ee.ethz.ch/,http://www.photonics.ee.ethz.ch/). In electronics the research activities of hisgroup concentrate on development of III/V technology, design and charac-terization of ultrafast InP-HBT transistors for �100 Gb/s electronics, andmulti-10 GHz RF and digital 10–40 Gb/s CMOS IC design. In the area ofultra-dense and Tb/s lightwave communication research, topics are integratedInP-based mode-locked diode lasers, all-optical switches for all optical signalprocessing at Tb/s data rates, and planar InP-based photonic crystals. Prof.Jäckel has authored or coauthored over 100 publications, and holds around 20patents.