LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED...
Transcript of LOW POWER DIGITAL EQUALIZATION FOR HIGH SPEED...
LOW POWER DIGITAL EQUALIZATION
FOR HIGH SPEED SERDES
Masum Hossain
University of Alberta
0
Outline
• Why ADC-Based receiver?
• Challenges in ADC-based receiver
• ADC-DSP based Receiver
– Reducing impact of Quantization Noise
– Variable Resolution ADC
– low-latency high-resolution TDC-based timing recovery
– Implemented Prototype and Measured Results
1
Tx FIR Filter:
• Peak power constrained
• Limited by supply
voltage
Peaking equalizer:
• Analog - does not scale well
• Limited by supply voltage
• PVT variation
Decision Feedback Eq.:
• Latency constrained
• Difficult for multilevel
signaling
Existing equalization strategy does not scale well with technology, channel loss and data rate
2
Conventional mixed-signal Link
ADC-based high speed Link
Analog mixed-signal Digital
Benefits of DSP-based equalization:
• Scales well with technology
• Frequency response can be well controlled
• Can equalize both pre and post cursors
3
Challenges of DSP-based equalization:
• ADC-DSP is power hungry.
• Higher loop latency make timing recovery difficult
Mixed-signal vs ADC-based Link
4
• Variable
Resolution
• Predictive
ADC
• Timing Recovery
• 3-bit TDC
• 8-tap Digital FFE
• 3-tap in Look-up table
• 5-tap in conventional way
PAM-4 Digital Receiver Architecture
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
Transient Data Edge
5
4 Fixed Reference
• Between two consecutive samples signal changes a lot
• Need to cover entire dynamic range - 4 Fixed References.
Variable Resolution ADC – 12 dB loss
6
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
TransientDataEdge
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
Transient Data Edge
• Between two consecutive samples signal changes around 20% - 30%
• Need to cover a portion of entire dynamic range – Reference Switching
Variable Resolution ADC – 25 dB loss
7
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
TransientDataEdge
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
Transient Data Edge
• Edge comparator output defines the next probable location of references
Variable Resolution ADC – 25 dB loss
8
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
TransientDataEdge
1 2 3 4 5 6 7 8 9 10 11 120
0.2
0.4
0.6
0.8
1
Time (Bit period)
No
rma
lize
d S
tep
Res
po
nse
&
Co
mp
ara
tor
Ref
eren
ce
Transient Data Edge
• Fine references are carried over to the mid of two coarse references
Fine
Reference
2 Edge Reference
Variable Resolution ADC – 25 dB loss
9
Fine EVENEDGE
Fine ODD
Quad Edge
Octal ODD
Quad Coarse
Octal EVEN
Coarse
• Quad and Octal
clock is retimed with
a the original quad
clock
/2PGEN
PGEN
3.5 GHz
Matched
delay
OctalQuad
Variable Resolution ADC – Sample and Hold
10
• Unbalance the capacitive load attached to the input of the strong-ARM latch
• Store the bit-decisions into a 6T SRAM to reduce the area.
Ref: [2]
ADC Offset Correction
11
Measured ADC Performance
12
• Timing Recovery
• 3-bit TDC
• 8-tap Digital FFE
• 3-tap in Look-up table
• 5-tap in conventional way
• Variable
Resolution
• Predictive
ADC
PAM-4 Digital Receiver Architecture
ФN
Digital
Filter
Digital FFE
MM Phase
Detector
ФQ
• MM based phase detection is not as robust as 2x (i.e. data and edge) sampled CDR
• Bang-bang or 1 bit phase quantization at the Phase detector increases in-band jitter
• Lowering loop bandwidth increases VCO phase noise contribution
• Loop latency makes it difficult to achieve wider loop bandwidth
13
Timing Recovery Challenge for ADC-based Receiver
Effect of Timing Noise on SNR
Effect of timing noise on SNR is less when we consider channel loss!!!
Phase Tracking vs Blind ADC based
[Clifford et.al. JSSC, 2013]
• Simple But latency sensitive
• ADC benefits from jitter tracking
• Less latency sensitive
• ADC does not benefits from
jitter tracking
Region 0
Region 1
Region 2
Region 3
16
Low-latency Timing Recovery
Proposed CDR Advantages:
• ADC bypass significantly reduces latency
• 3b SAR TDC reduces bang-bang dithering by 4x.
• Wider loop BW effectively filters VCO phase noise
17
Low-latency Timing Recovery – SAR TDC operation
Frequency (MHz)Jit
ter
Tole
rance (
UIp
p)
Equipment limit
18
Free-running
Locked
• Integrated jitter = 0.5 ps
• In-band phase noise = - 90 dBc/Hz
Phase Noise Jitter Tolerance with 27-1 pattern
102
CDR Performance
19
• 8-tap Digital FFE
• 3-tap in Look-up table
• 5-tap in conventional way
• Timing Recovery
• 3-bit TDC
• Variable
Resolution
• Predictive
ADC
PAM-4 Digital Receiver Architecture
2 3 4 5 60
50
100
150
200
250
300
ADC Resolution (No. of bits)
Po
wer (
mW
)
Noise Source Constrain Transfer Gain
NLEQ Power/Gain/BW LEQ + FFE
ΦN Power and latency FFE
NADC Power/Settling time FFE
NQZ ADC Resolution FFE
NLEQ NADC
ФN
NQZDigital FFE
Timing
Recovery
Flash ADC, Fs=14GS/s
20
Noise Sources in ADC-based Receiver
Quantization Noise Impact
1 2 3 4 5 6 7-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
ANALOG INPUT FREQUENCY (GHz)
AM
PL
ITU
DE
(d
B)
FFT at the ADC Output (Simulated)
FFT at the FFE Output (Simulated)
ADC quantization Noise Floor (Theoretical)
Quantization noise floor at the FFE output (Theoretical)
3
2222Pr
2Pr
22
,
PostQPostMainQMaineeQ
FFEQZoutQZ
WNWNWNhNN
postmainpre
XX
hhh
hW
PostMainex ,,Pr,
22
NQZ
Z-1
NQMainNQPost
hmain hpost
N bit
2N bit
Although Digital FFE output can be 2N bit, we are
we are still limited by ADC’s N bit resolution
Z-1
hmain hpost
NQZ
If FFE can be moved ahead of the ADC than we can
Minimize ADC’s quantization noise penalty
2N bit
How can we build a digital FFE with resolution better than the ADC?
N bit N bit
22
How to reduce ADC quantization noise impact?
• LUT based first three taps
reduces quantization noise
impact
• 3 to 8 taps does not
significantly amplify
quantization noise
LUT FFE Conv. FFE
5 bit 5 5 5 5 5
9
Address Decoder
9
23
Reducing Quantization Noise Impact
46
810
46
810
50
100
150
No. of taps
Power for different no. of taps and tap resolution
Tap resolution
46
810
46
810
50
100
150
No. of taps
Power for different no. of taps and tap resolution - LT
Tap resolution
3-tap LUT + 5-tap Conventional8-tap Conventional
Proposed approach is 30% lower power compared to conv. FIR implementation
Reducing Quantization Noise Impact
500 µm
500 µ
m
1300 µm
1000 µ
m
• Area increases by 4x but
• Standard cell SRAM will reduce is by 25%
• Area will scale significantly with technology
3-tap LUT + 5-tap Conventional8-tap Conventional
Area Impact of the proposed solution
Digital Interface
CH270
3.5 GHz
Clock GenTDC
P0
P315
Reference
Generator
CH0
CH90
CH180
(Coarse S/H)
(Edge S/H)
P0 HR(Fine S/H)
2.5
1
2
Even
Odd
T-to-B1.5
2
Mode
Selection
2
3
4
5
5.5
High BW
Amplifier
Passive
Equalizer
T-to-B
T-to-B
T-to-B
DSP
Implemented in TSMC 65nm FPGA
40 mW
29 mW
28 mW
33 mW
30 mW Analog
Clk. Gen +
Buffer
TDC
DSP
Digital
35 mW
26 mW23 mW
24 mW
26 mW Analog
Clk. Gen +
Buffer
TDC
DSP
Digital
26
Digital:
• T-to-B, Mode selection
• Retimer
Long Reach
Medium Reach
Implemented Prototype in 65nm CMOS
27
To
FPGA
• Heavily digital solution
• Input needs only 7 GHz bandwidth
Implemented Prototype in 65nm CMOS
PCB for testing
Cyclone V FPGA
Matched SMA
cables
Input Clock
FPGA Interface
28
• Varying channel loss by cascading SMA cables.
Experimental Setup
29
-0.5 -0.25 0 0.25 0.50
10
20
31
Time (UI)
AD
C C
od
e
-0.5 0 0.50
10
20
31
Time (UI)
AD
C C
od
e
Linear Equalizer
output EYE
Reconstructed digital
EYE from ADC output
frequency responses
of LR, MR and SR channelsS
R
MR
LR
• Tx has 6 dB equalization
• Linear equalizer boost: 6 to 14 dB
Input EYE in Digital Domain
30
Equalized output code Equalized output code
BE
RO
ccu
rren
c
e
-3 -1 1 3 -3 -1 1 3
3-tap LUT + 5-tap
Conventional
8-tap Conventional
• FPGA gives the distribution of the bins
• The distribution is converted into log-scale
• Gaussian fit to extract the BER.
Link Margin at 28Gb/s 30 dB Channel
5.7 pJ/bit
4.6 pJ/bit
3.25 pJ/bit
2.1 pJ/bit2.1 pJ/bit
ADC
TDC
FFE
Channel Loss (dB)
Pow
er
(mW
) @
28 G
b/s
31
BE
R
• Receiver can achieve BER up to 10-9
Data rate: 28 Gb/s PAM-4
Link Margin Test and Energy Efficiency
32
Shafik
ISSCC 2015[4]
Frans
VLSI 2016[5]
Cui
ISSCC 2016[3]
Rylov ISSCC
2016 [6]
This Work
Technology 65 nm CMOS 16 nm FinFET 28 nm CMOS 32 nm CMOS 65 nm CMOS
Data Rate
(Gb/s)
10
NRZ
56
PAM-4
32
PAM-4
25
NRZ
28
PAM-4
ADC
Architecture
32x TI SAR
ADC
32x TI SAR
ADC
32x TI SAR
ADC
4x Flash ADC 4x Flash ADC
ENOB@
Nyquist
4.74 4.9 5.85 4 4.1
Timing
Recovery
N/A Baud-rate Baud-rate Baud-rate Edge & Data Sampled
Tracking BW --- --- --- --- 10+ MHz
Jitter
Tolerance
--- ---- --- --- 0.2 UIpp @ 50 MHz
Channel Loss
Equalization
36.4 dB
@ 5 GHz
25 dB
@ 14 GHz
32 dB
@ 8 GHz
40 dB
@ 12 GHz
30 dB
@ 7 GHz
Power (mW) 79(w/o DSP)
87(w DSP)
410(w/o DSP) 320 453 130@30 dB w/o
45 @ 15 dB DSP
160@30 dB with
60 @ 15 dB DSP
FOM (pJ/bit) 8.7 7.32 10 18.12 5.71@ 30 dB with
2.14@ 15 dB DSP
Comparison with state-of-art
• ADC- DSP Based receivers are the future for multilevel signaling in advanced CMOS – but it’s power has to be reduced.
• DSP needs to be more information efficient – Non-uniform quantization is a simple way to improve effective resolution.
• ADC for wireline is different than general purpose ADC. General purpose ADC considers each sample ‘uncorrelated’ but in reality channel ISI makes them ‘correlated’ –predictive ADC is a simple way to take advantage of that.
• Timing recovery is as important as data recovery – Multibit TDC and lower latency is an effective way to improve timing recovery loop and meet jitter requirement of the ADC.
33
Summary of ADC Based Receiver