1000BASE-T Gigabit Ethernet Baseband DSP IC...

4
1000BASE-T Gigabit Ethernet Baseband DSP IC Design Hsiu-ping Lin, Nancy F. Chen, Jyh-Ting Lai, and An-Yeu Wu Graduate Institute of Electronics Engineering, and Department of Electrical Engineering, National Taiwan University, Taipei, 106, Taiwan, R.O.C. Abstract - The design of Gigabit Ethernet baseband DSP IC is based on IEEE standard 802.3ab and focuses on signal processing in 1000BASE-T PHY layer. To achieve the target bit-error rate (BER) of less than 10 -10 , the receiver must conquer channel impairments including Inter-Symbol Interference (ISI), Echo, Near-End Cross Talk (NEXT), and Far-End Cross Talk (FEXT). We present low-power, low complexity, and high performance architectures of Echo Canceller, joint Decision Feedback Equalizer, and Trellis Decoder of the baseband DSP receiver in Gigabit Ethernet. The proposed baseband architectures are integrated, co- simulated with the analog front end, and implemented in 2.5- V 0.25μm CMOS standard cell design flow. I. INTRODUCTION Based on the IEEE standard 802.3ab, the data rate of 1000BASE-T Gigabit Ethernet is 1Gb/s for copper wiring over distances of 100m [1]. 1-Gb/s is achieved by employing full duplex baseband transmission over four pairs of Category 5 cabling, indicating 250 Mb/s over each wire pair. Pulse Amplitude Modulation with the five levels {-2, -1, 0, 1, 2} (PAM5) is used as the transmission scheme of each wire pair. By grouping the four symbols transmitted in four channels, a four-dimensional (4-D) symbol, which carries eight information bits is formed. The transmitted symbol rate for each wire is thus 125 M baud/s, corresponding to a symbol period of 8 ns. To achieve the target bit error rate (BER) requirement of less than , the receiver has to cope with channel impairments such as ISI, Echo, NEXT, and FEXT. The overall block diagram of 1000BASE-T transceiver for Gigabit Ethernet is shown in Fig. 1. 10 10 After being processed by the analog front end, the channel outputs are digitized through the A/D converter. An adaptive Feedforward Equalizer (FFE) removes the precursor ISI to make the channel minimum phase and whitens the noise, while echo is cancelled by the adaptive Echo Cancellers. The DFE/TCM decoding block is then employed to equalize the postcursor ISI and trellis decoding. Afterwards, the decoded symbols are mapped into the information bits and sent to the MAC layer. This paper focuses on cost-effectively eliminating echo and ISI , which are the major channel impairments, by Variable Step-Sized Partial-Update Echo Canceller (VSSPU-EC) and Joint Decision- Feedback Prefilter and 1-tap Look-ahead Parallel Decision Feedback Decoder (DFP+PDFD) respectively. The Baseband system is integrated, co-simulated with the analog front end, and implemented in 2.5-V 0.25μm CMOS standard cell design flow. MAC PCS DFE & TCM Decoder + Hybrid AFE AFE AFE AFE FFE FFE FFE FFE NEXT Cancellers NEXT Cancellers NEXT Cancellers NEXT Cancellers Echo Cancellers Echo Cancellers Echo Cancellers Echo Cancellers Pulse Shaping Pulse Shaping Pulse Shaping Pulse Shaping 125MHz DAC 125MHz DAC 125MHz DAC 125MHz DAC Timing Recovery PC UTP CAT5 Cable ADC ADC ADC 125MHz ADC Fig.1 Overall block diagram of 1000BASE-T transceiver II. VARIABLE STEP-SIZED PARTIAL UPDATE ECHO CANCELLER Due to long impulse response of echo, a traditional echo canceller requires tens to hundreds of taps in a direct-form adaptive filter. Since echo is the most dominant channel impairment in 1000BASE-T Gigabit Ethernet, reducing computational complexity of echo cancellers is a crucial task. To conquer this, [2] proposed partial-update least-mean-square (LMS) algorithm, where only a portion of weights are updated each time. However, the convergence rate of the partial-update echo canceller (PU-EC) is seriously retarded. Therefore, we propose VSSPU-EC to reduce computational complexity without suffering from slow convergence rate. The mathematical description of our VSSPU-EC is demonstrated in the following paragraph. The weight updating equation in sequential partial update LMS algorithm [3] is given by + = + + k i i k k k i k i w , x , e w w , 1 , 1 , µ if (k–i+1) mod N = 0 otherwise (1) where i denotes the tap number of the weight, k denotes the time index, w the weight, μ the step size, e is the error, and x is the input. For N=1, the algorithm reduces to the traditional LMS

Transcript of 1000BASE-T Gigabit Ethernet Baseband DSP IC...

Page 1: 1000BASE-T Gigabit Ethernet Baseband DSP IC Designaccess.ee.ntu.edu.tw/Publications/Conference/CA33_2004.pdf · 2010-07-15 · 1000BASE-T Gigabit Ethernet Baseband DSP IC Design Hsiu-ping

1000BASE-T Gigabit Ethernet Baseband DSP IC Design

Hsiu-ping Lin, Nancy F. Chen, Jyh-Ting Lai, and An-Yeu Wu

Graduate Institute of Electronics Engineering, and Department of Electrical Engineering,

National Taiwan University, Taipei, 106, Taiwan, R.O.C. Abstract - The design of Gigabit Ethernet baseband DSP IC is based on IEEE standard 802.3ab and focuses on signal processing in 1000BASE-T PHY layer. To achieve the target bit-error rate (BER) of less than 10-10, the receiver must conquer channel impairments including Inter-Symbol Interference (ISI), Echo, Near-End Cross Talk (NEXT), and Far-End Cross Talk (FEXT). We present low-power, low complexity, and high performance architectures of Echo Canceller, joint Decision Feedback Equalizer, and Trellis Decoder of the baseband DSP receiver in Gigabit Ethernet. The proposed baseband architectures are integrated, co-simulated with the analog front end, and implemented in 2.5-V 0.25μm CMOS standard cell design flow.

I. INTRODUCTION

Based on the IEEE standard 802.3ab, the data rate of 1000BASE-T Gigabit Ethernet is 1Gb/s for copper wiring over distances of 100m [1]. 1-Gb/s is achieved by employing full duplex baseband transmission over four pairs of Category 5 cabling, indicating 250 Mb/s over each wire pair. Pulse Amplitude Modulation with the five levels {-2, -1, 0, 1, 2} (PAM5) is used as the transmission scheme of each wire pair. By grouping the four symbols transmitted in four channels, a four-dimensional (4-D) symbol, which carries eight information bits is formed. The transmitted symbol rate for each wire is thus 125 M baud/s, corresponding to a symbol period of 8 ns.

To achieve the target bit error rate (BER) requirement of less than , the receiver has to cope with channel impairments such as ISI, Echo, NEXT, and FEXT. The overall block diagram of 1000BASE-T transceiver for Gigabit Ethernet is shown in Fig. 1.

1010−

After being processed by the analog front end, the channel outputs are digitized through the A/D converter. An adaptive Feedforward Equalizer (FFE) removes the precursor ISI to make the channel minimum phase and whitens the noise, while echo is cancelled by the adaptive Echo Cancellers. The DFE/TCM decoding block is then employed to equalize the postcursor ISI and trellis decoding. Afterwards, the decoded symbols are mapped into the information bits and sent to the MAC layer.

This paper focuses on cost-effectively eliminating echo and ISI , which are the major channel impairments, by Variable Step-Sized Partial-Update Echo Canceller (VSSPU-EC) and Joint Decision-Feedback Prefilter and 1-tap Look-ahead Parallel Decision Feedback Decoder (DFP+PDFD) respectively. The Baseband system is integrated, co-simulated with the analog front end, and implemented in 2.5-V 0.25μm CMOS standard cell design flow.

MAC PCS

DFE & TCM Decoder +

Hybrid

AFEAFEAFEAFEFFEFFEFFEFFE

NEXT Cancellers

NEXT

Cancellers

NEXT Cancellers

NEXT Cancellers Echo

Cancellers

Echo Cancellers

Echo

Cancellers

Echo Cancellers

Pulse Shaping

Pulse Shaping

Pulse Shaping

Pulse Shaping 125MHz

DAC

125MHzDAC

125MHz

DAC

125MHz

DAC

Timing Recovery

PC

UTPCAT5Cable

ADCADCADC125MHz

ADC

Fig.1 Overall block diagram of 1000BASE-T transceiver

II. VARIABLE STEP-SIZED PARTIAL UPDATE ECHO CANCELLER

Due to long impulse response of echo, a traditional echo

canceller requires tens to hundreds of taps in a direct-form adaptive filter. Since echo is the most dominant channel impairment in 1000BASE-T Gigabit Ethernet, reducing computational complexity of echo cancellers is a crucial task. To conquer this, [2] proposed partial-update least-mean-square (LMS) algorithm, where only a portion of weights are updated each time.

However, the convergence rate of the partial-update echo canceller (PU-EC) is seriously retarded. Therefore, we propose VSSPU-EC to reduce computational complexity without suffering from slow convergence rate. The mathematical description of our VSSPU-EC is demonstrated in the following paragraph.

The weight updating equation in sequential partial update LMS algorithm [3] is given by

+

=+−

+

ki

ikkki

ki

w ,

x ,eww

,

1,

1,

µ if (k–i+1) mod N = 0 otherwise

(1)

where i denotes the tap number of the weight, k denotes the time index, w the weight, µ the step size, e is the error, and x is the input. For N=1, the algorithm reduces to the traditional LMS

Page 2: 1000BASE-T Gigabit Ethernet Baseband DSP IC Designaccess.ee.ntu.edu.tw/Publications/Conference/CA33_2004.pdf · 2010-07-15 · 1000BASE-T Gigabit Ethernet Baseband DSP IC Design Hsiu-ping

algorithm. The step size of our VSSPU-EC is recursively varied with the following equation:

µ(n+1) = αµ(n) + γe2(n), (2)

where 0 < α < 1 and γ > 0, and n is the time index. One should note that µ(n+1) is bounded by µmax and µmin . Typically, µmax is selected to provide the maximum possible rate of convergence, while µmin is chosen based on the design requirement of steady-state misadjustment.

Fig.2 shows the block diagram of VSSPU-EC with 160 taps. We divide the 160 tap weights into five updating blocks, each containing 32 tap weights. An LMS Engine is applied to each updating block to handle the weight updating tasks. The architecture of LMS Engine is shown in Fig. 3.

…Tap32…Tap32

Fig.2 Variable Step-Sized Partial Update Echo Canceller

———

WeightRegister

File

ff/32

u

To BoothMultiplier

f

Wn+1

Wn

x e Fig.3 Architecture of the LMS Engine

Am

plitu

de

0 20 40 60 80 100 120 140 160 180

Echo— W

time

0.08

0.06

0.04

0.02

0

-0.02

-0.04

-0.06

-0.08

-0.1

-0.12

Am

plitu

de

0 20 40 60 80 100 120 140 160 180

Echo— W

time

0.08

0.06

0.04

0.02

0

-0.02

-0.04

-0.06

-0.08

-0.1

-0.12

Am

plitu

de

0 20 40 60 80 100 120 140 160 180

Echo— W

time

0.08

0.06

0.04

0.02

0

-0.02

-0.04

-0.06

-0.08

-0.1

-0.12

Fig. 4 Comparison between echo response and the estimated tap weights

Iteration number

MSE

10-1

10-2

10-3

10-4

0

10-1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5x104

— VSSPU-EC— PU-EC

Boundary for SNR=23.5dB

Iteration number

MSE

10-1

10-2

10-3

10-4

0

10-1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5x104Iteration number

MSE

10-1

10-2

10-3

10-4

0

10-1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5x104Iteration number

MSE

10-1

10-2

10-3

10-4

0

10-1

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5x104

— VSSPU-EC— PU-EC

Boundary for SNR=23.5dB

Fig. 5 Convergence rate of VSSPU-EC and PU-EC

Fig. 4 shows that our echo response and estimated tap weights

are almost identical, indicating our VSSPU-EC eliminates echo effectively. Fig. 5 shows that VSSPU-EC speeds up the convergence rate by 8000 to 10000 iterations from PU-EC with minor hardware increment when SNR=23.5dB.

III. JOINT EQUALIZATION AND DECODING

To eliminate ISI and meet the 10 BER requirement in Gigabit Ethernet transmission, we apply joint DFP+1-Tap LA-PDFD [4]. The block diagram of DFP+1-Tap LA-PDFD is shown in Fig.6.

10−

DFP

DFP

DFP

DFP

DFU

1-TapLA-PDFD

bits

yn,4

yn,3

yn,2

yn,1Zn,1

Zn,4

Zn,3

Zn,2

ChannelCoefficient

Fig.6 Block diagram of joint DFP and 1-tap LA-PDFD [4]

Fig.7 shows our proposed DFP architecture. The FFE shapes

the channel to make the postcursor channel impulse response seen by the DFE/TCM block minimum-phase. Most of the postcursor channel energy is then concentrated in the beginning of the impulse response, thus shortening the effective channel impulse response seen by 1-Tap LA-PDFD. By adopting this architecture in our PDFD, branch metrics in parallel decision feedback decoders can be computed in a look-ahead fashion, and the critical path of 1-Tap LA-PDFD is shortened.

Fig.8 and Fig.9 show the detailed architecture of our Decision Feedback Prefilter design [5]. Fig.10 shows the block diagram of 1-tap LA-PDFD [4].

Administrator
Page 3: 1000BASE-T Gigabit Ethernet Baseband DSP IC Designaccess.ee.ntu.edu.tw/Publications/Conference/CA33_2004.pdf · 2010-07-15 · 1000BASE-T Gigabit Ethernet Baseband DSP IC Design Hsiu-ping

5D FFE

WUC8D

D FBE

WUB

5D D

8D

D

5D

2D 2D

Log

M

WUG

D

d(n)

d(n-3)

Fig.7 The proposed Decision Feedback Prefilter Architecture

D D

4 - 2

D D

4 - 2

D D

3 - 2

D

4 - 2

D

4 - 2

D D

3 - 2

D D

4 - 2

4 - 2

D D

4 - 2

D D

4 - 2

4|2

shifter

D

truncate

D

Zero-force LMS

d(n-2)

9(compensation)

rnd. rnd.

Bw_t Bw_t Bw_t Bw_t Bw_t Bw_t Bw_t

B_in

B_in

ADCInput

(Booth) (Booth)

truncateD D D D D D D

D DD D

D D D DD

3 - 2

D

Fig.8 Architecture of FFE

D D D D

3 - 2

D

B_b_tDecision

3 - 2

D

3 - 2

D D

D

B_b_t

3|2

FFE output

B_b_tB_b_t B_b_t B_b_t B_b_t B_b_t B_b_t B_b_t B_b_t B_b_t B_b_t

To slicer

3

3|2

DD

4|2

DD

4|2

DD

4|2

3 - 2

B_b_t

D

DD

Fig.9 The architecture of FBE Fig. 11 shows the error-rate performance of 1-tap LA-PDFD

and conventional Decision Feedback Equalizer (DFE) using uncoded PAM5. The coding gain of adopting 1-Tap LA-PDFD is 4dB, while the maximum achievable coding gain by adopting traditional Viterbi Decoder is 5.3dB. The performance loss is tolerable since the target BER of 10 is still met under worst-case 1000BASE-T channel condition [6]. This performance loss is due to error propagation of the DFP. Because the DFP only cancels the ISI caused by the less significant tail of the postcursor ISI, the effect of the error propagations in DFP is less than that of the conventional DFE.

10−

1D-LA-BMUMUXU

4D-BMU ACSU SMUbits

yn,1

yn,4

yn,3

yn,240 1D-BMs

32 1D-symbols

64 1D-BMs 32 4D-BMs

Fig.10 1-Tap LA-PDFD architecture [4]

SNR (dB)

100

10-1

10-3

10-2

10-4

10-5

10-6

BER

0 5 10 15 16 17 18 19 20 21 22 23 24 25SNR (dB)

100

10-1

10-3

10-2

10-4

10-5

10-6

BER

SNR (dB)

100

10-1

10-3

10-2

10-4

10-5

10-6

BER

SNR (dB)

100

10-1

10-3

10-2

10-4

10-5

10-6

BER

0 5 10 15 16 17 18 19 20 21 22 23 24 25

Fig.11 Comparison of error performance between 1-tap LA-PDFD and conventional DFE using uncoded PAM5.

IV. SYSTEM SIMULATION AND IMPLEMENTATION RESULTS

Maximum Likelihood Timing Recovery (MLTR) [7] is applied for timing synchronization. Fig. 12 shows the eye diagram and phase variation of MLTR in our design. We co-simulated our digital circuit with the analog front end over distances of 87-100m. Fig. 13 shows the digital eye diagram and phase variation of joint analog equalizer, digital equalization, and timing recovery during start-up with output SNR of 23.95dB.

time

Am

plitu

de

0 0.5 1 1.5 2 2.5Time X104

500

400

300

200

100

0

-100

-200

-300

time

Am

plitu

de

0 0.5 1 1.5 2 2.5Time X104

500

400

300

200

100

0

-100

-200

-300

time

Am

plitu

de

0 0.5 1 1.5 2 2.5Time X104

500

400

300

200

100

0

-100

-200

-300

(a)

Page 4: 1000BASE-T Gigabit Ethernet Baseband DSP IC Designaccess.ee.ntu.edu.tw/Publications/Conference/CA33_2004.pdf · 2010-07-15 · 1000BASE-T Gigabit Ethernet Baseband DSP IC Design Hsiu-ping

time

Phas

e

5

0 0.5 1 1.5 2 2.5Time X104

35

30

25

20

15

10

time

Phas

e

5

0 0.5 1 1.5 2 2.5Time X104

35

30

25

20

15

10

(b)

Fig. 12 MLTR in 1000BASE-T (a) Eye diagram, (b) Phase variation

0 5000 10,000 15,000

2.5

2

1.5

1

0

-0.5

-1

-1.5

-2

-2.5

Am

plitu

de

time0 5000 10,000 15,000

2.5

2

1.5

1

0

-0.5

-1

-1.5

-2

-2.5

Am

plitu

de

time (a)

40

35

30

25

20

15

10

5

00 5000 10,000 15,000time

Phas

e

40

35

30

25

20

15

10

5

00 5000 10,000 15,000time

Phas

e

(b)

Fig. 13 Joint Simulation of AFE, Equalization, and Timing Recovery when Start-Up.87 to 100m with output SNR = 23.95dB (a) Digital eye diagram (b) Phase variation We applied worst-case environment (WCOM) of Artisan

mµ25.0

mm54.4

standard cell library to implement our 1000BASE-T Gigabit Ethernet Baseband DSP circuit. The critical path of our design is 7.48ns, which meets the timing requirement 8ns in 1000BASE-T. The core size of our design is 3 and the CHIP size of our design including PADs occupies

. The chip layout and chip summary are shown in Fig. 14(a) and (b).

mmmm 99.399. ×

mm54.4×

V. CONCLUSION

With low-power and low-cost DSP techniques, we eliminated

echo and ISI, two of the most dominant channel impairments in 1000BASE-T Gigabit Ethernet, while still meeting the BER specification of 10-10. MLTR is applied for timing synchronization, and the complete baseband system is co-

simulated with the analog front end. Our proposed system architecture is implemented in 2.5V, 0.25μm CMOS standard cell design.

(a)

Cell Library Artisan 0.25μm 1P5M

Voltage 2.5V Operating Freq. 125MHz

Die Size 3.99mm × 3.99mm Core Size 4.54mm × 4.54mm I/O Pad 61

(b) Fig.14 Implementation Results (a) Chip layout (b) Chip summary

REFERENCES

[1] Physical Layer Parameters and Specifications for 1000 Mb/s Operation Over 4-Pair of Category-5 balanced Copper Cabling, Type 1000-BASE-T, IEEE Standard 802.3ab-1999, 1999.

[2] Runsheng He, Nersi Nazari and Sehat Sutardja, “A DSP Based Receiver for 1000BASE-T PHY,” IEEE Int. Solid State Circuit Conf., San Francisco, pp. 310-311, Feb. 2001.

[3] Scott C. Douglas, “Adaptive filters employing partial updates,” IEEE Trans. On Circuit and Systems—II: Analog and Digital Signal Processing, vol. 44, No. 3, Mar. 1997.

[4] Erich F. Haratsch, and Kamram Azadet, “A 1-Gb/s Joint Equalizer and Trellis Decoder for 1000BASE-T Gigabit Ethernet,” IEEE Journal of Solid-State Circuit, vol. 36, No. 3, pp.374-384, March 2001.

[5] Meng-Da Yang and A.-Y. Wu, “A New Pipelined Adaptive DFE Architecture with Improved Convergence Rate,” in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS-2002), Phoenix, vol. 4, pp. 213-216, May 2002.

[6] G. Fettweis and H. Meyr, “A 100-Mbits/s Viterbi decoder chip: Novel architecture and its realization,” In Proc. IEEE Int. Conf. Commun. (ICC), vol. 2, Apr. 1990, pp. 463-467.

[7] J.W.M. Bergmans and HoWai Wong-Lam, “A Classic of Data-Aided Timing Recovery Schemes,” in IEEE Trans. On Commun., vol. 43, No.2/3/4, Feb/Mar/Apr. 1995.