Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient :...

25
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

Transcript of Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient :...

Page 1: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Efficient VLSI architectures for basebandsignal processing in wireless base-station

receivers

Sridhar RajagopalSrikrishna Bhashyam, Joseph R. Cavallaro, and

Behnaam Aazhang

This work is supported by Nokia, TI, TATP and NSF

Page 2: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Motivation

Computationally complex algorithms for base-stations

– multiple users, high data rates

– matrix inversions, floating point accuracy needed

– DSP solutions infeasible for real-time [S.Das’99]

Real-time implementations for baseband receiver?

– multiuser channel estimation

*S.Das et al., “Arithmetic Acceleration Techniques for Wireless Base-station Receivers”, Asilomar 1999

Page 3: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Contributions

New estimation scheme

– designed from an implementation perspective

– bit-streaming, fixed-point architecture

– reduced complexity, same error rate performance

Real-time architecture design– exploit bit-level parallelism

– area-constrained, time-constrained

– real-time with minimum area

Page 4: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Baseband signal processing

MultipleUsers

Base-Station Receiver

MultiuserChannel

estimation

MultiuserDetection Decoding

Antenna

Information Bits

TrackingTraining

Page 5: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Channel estimation

Direct Path

Reflected Path

Noise +MAI

User 1 User 2

Base Station

Estimates unknown fading amplitudes and asynchronousdelays.

Page 6: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Need for multiuser channel estimation

Detector performance depends on estimation accuracy

Best estimator : Maximum Likelihood

=> jointly estimate parameters for all users

=> Multiuser channel estimation

Single-user sliding correlator used for implementation

Page 7: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

�=L

Hiibr rbR

Ti

Libb bbR �=

Multiuser channel estimation algorithm

- Training/Tracking bits

- Received signal N - Spreading gain (typically fixed ,e.g: 32) K - Number of users (variable, <=N) - Maximum Likelihood channel estimate

bi

ri

A

brbb RA*R =

N*K2

N*K2br

K2*K2bb

Ni

2Ki

CA

CR

RCr

}1,1{b

ℜ∈∈

−∈

Page 8: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Outline

Background

Channel Estimation - An implementation perspective

VLSI architectures

– Area-constrained, Time-constrained, Area-Time efficient

DSP Comparisons and Conclusions

Page 9: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Iterative scheme for channel estimation

Bit-streaming, method of gradient descent

Stable convergence behavior with µ

Simple fixed-point architecture

T00

TLL

)1i(bb

)i(bb b*bb*bRR −+= −

H00

HLL

)1i(br

)i(br r*br*bRR −+= −

)RR*A(AA )i(br

)i(bb

)1i()1i()i( −µ−= −−

Page 10: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

4 5 6 7 8 9 10 11 1210

-3

10-2

10-1 Comparison of Bit Error Rates (BER)

Signal to Noise Ratio (SNR)

BE

R

Iterative Channel Est. Original Channel Est.

O(K2N)

O(K3+K2N)

Simulations - Static multipath channel

SINR = 0 dB

Paths =3

Training =150 bits

Spreading N = 31

Users K = 15

Page 11: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Outline

Background

Channel Estimation - An implementation perspective

VLSI architectures– Area-constrained, Time-constrained, Area-Time efficient

DSP Comparisons and Conclusions

Page 12: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Design specifications

32 Users (K)

32 spreading code length (N)

Target = 128 Kbps

– 4000 cycles available at 500 MHz

Single cycle addition/multiplication

Page 13: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Task decomposition

IterateCorrelationMatrices (Per Bit)

AO(4K2N,8)

RbrO(2KN,8)

RbbO(2K2,8)

TIME

ChannelEstimate

to Detector

b0(2K,1)

Tracking Window

r0(N,8)

bL(2K,1)

rL(N,8)

L

Page 14: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Architecture design

XNOR gates, UP/DOWN counters

T00

TLL

)1i(bb

)i(bb b*bb*bRR −+= −

H00

HLL

)1i(br

)i(br r*br*bRR −+= −

8-bit adders

)RR*A(AA )i(br

)i(bb

)1i()1i()i( −µ−= −−

8-bit multipliers [Schulte’93]

* Schulte, Swartzlander “Truncated Multiplication with Correction Constant”, Workshop on VLSI Signal Processing,1993

Page 15: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Area-constrained : Min. area, not real- time

b0

bL MUX Counter

Rbb A(i)

DEMUXMUX

MAC

Add/Sub

Add/Sub

Subtract

Subtract

A(i-1)

U/D

Load Store

ji

i j

j jr0rL

bL

b0

16

8

8

88

8 8

1

11

1

1

1

1

1

1

88

88

Rbr

>>8

816

T00

TLL

)1i(bb

)i(bb b*bb*bRR −+= −

H00

HLL

)1i(br

)i(br r*br*bRR −+= − )RR*A(AA )i(

br)i(

bb)1i()1i()i( −µ−= −−

Channel Estimate

Page 16: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Area-constrained : Hardware used

Blocks Quantity Full AdderCells

Complex Total

Counter 1*8 8 - 8

Multiplier 1*8 64 *2 128

Adders 3*8 + 2*16 56 *2 112

Total Area 248FA cells

Total Time(N=K=32)

4K2N 128,000cycles

Page 17: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Time-constrained : Real time, large area

b*bT

b0*b0T

bL

b0

MUX

Rbr

MUX

rL

r0

MUX

Rbb A

Mult

Subtract >>

Subtract

2K*12K*1

2K*1 K(2K-1)*1

K(2K-1)*1

2K2*8

2KN*16

2KN*162KN*8

2K*1

N*8

N*8

N*8

2KN*8

2KN*8

ChannelEstimate

T00

TLL

)1i(bb

)i(bb b*bb*bRR −+= −

H00

HLL

)1i(br

)i(br r*br*bRR −+= −

)RR*A(AA )i(br

)i(bb

)1i()1i()i( −µ−= −−

Page 18: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Time-constrained : Hardware used

Blocks Quantity Full AdderCells

Complex Total

Counter 2K2*8 16K2 - 16K2

Multiplier 4K2N*8 256K2N *2 512K2N

Adders 2KN*16 +2KN*8 +4K2N*16

48KN +64K2N

*2 96KN +128K2N

Total Area(N=K=32)

20,000,000FA cells

Total Time Log2(2K) 6 cycles

Page 19: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Area-Time efficient architecture design

Area - constrained– single 8-bit multiplier– cycles (128,000) [3.81 Kbps, 248 FA Cells]

Time-constrained– 8-bit multipliers– log2(2K) cycles (6) [83.33 Mbps, 20,000,000 FA Cells]

Goal : real-time with minimum areaDifferent parallelism levels for multipliers

N4K2

N4K2

Page 20: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Area-Time efficient : Real-time, min. area

bL*bLT b0*b0

T

bL b0

MUX

MUX

rL

r0

MUX

Mult

Subtract >>

Subtract

2K*1 2K*1

2K*12K*1

2K*1 2K*8

2K*8

1*16

1*161*8

1*1

1*8

N*8

N*8

1*8

Rbr

Counters

StoreLoad

RbbA(i)

DEMUXMUX

A(i-1)

1*8

Adder

1*8

2K*1

2K*8

2K*8

T00

TLL

)1i(bb

)i(bb b*bb*bRR −+= −

H00

HLL

)1i(br

)i(br r*br*bRR −+= −

)RR*A(AA )i(br

)i(bb

)1i()1i()i( −µ−= −−

Channel Estimate

Page 21: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Area-Time efficient : Hardware used

Blocks Quantity Full AdderCells

Complex Total

Counter 2K*8 16K - 16K

Multiplier 2K*8 128K *2 256K

Adders 2K*16 +2*8 + 1*16

32K + 32 *2 64K + 64

Total Area(N=K=32)

10,000FA cells

Total Time 2KN 2,000cycles

Page 22: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Outline

Background

Channel Estimation - An implementation perspective

VLSI architectures– Area-constrained, Time-constrained, Area-Time efficient

DSP Comparisons and Conclusions

Page 23: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

DSP comparisons

Implementation ClockRate

Full AdderCells

Data Rates

C67 DSP 166 MHz - 1.02 KbpsArea 500 MHz 248 3.81 Kbps

: : : :Area-Time 500 MHz 104 256 Kbps

: : : :Time 500 MHz 2x107 83.33 Mbps

DSPs unable to exploit bit-level parallelismInefficient storage of bitsUnable to replace bit-multiplications by add/sub.

Page 24: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Scalability of architectures

Design for maximum number of users in the system

Fewer users– turn off functional units to reduce power

– reconfigure hardware for higher data rates (FPGA)

Investigating K-user design using K/2-user designs.

Investigating DSP extensions

Page 25: Srikrishna Bhashyam, Joseph R. Cavallaro, and …sridhar/research/asap-ppt.pdfArea-Time efficient : Hardware used Blocks Quantity Full Adder Cells Complex Total Counter 2K *8 16K -

Conclusions

New estimation scheme– designed from an implementation perspective– bit-streaming, fixed-point architecture– reduced complexity, same error rate performance

Real-time architecture designs– exploit bit-level parallelism– area-constrained, time-constrained– real-time with minimum area

=> Real-time architectures for base-band signal processing