RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar...
-
Upload
byron-nichols -
Category
Documents
-
view
212 -
download
0
Transcript of RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar...
RICE UNIVERSITY
“Joint” architecture & algorithm designs for baseband signal processing
Sridhar Rajagopal and Joseph R. CavallaroRice Center for Multimedia Communications
This work has been supported by Nokia, TI, TATP and NSF
RICE UNIVERSITY
Single-slide version of my talk
Algorithms
DSP
VLSI
FPGA
IMAGINE
Multiuser channel estimationMultiuser detection
Task-partitioningParallelism Pipelining
Conventional arithmeticOn-line arithmetic
Instruction set extensionsCo-processor support
Functional unit design and usage
DistantPast
RecentPast
Recent andNear Future
RICE UNIVERSITY
Contents
Algorithms for channel estimation and detection
Conventional and on-line arithmetic designs
Programmable architecture design using the
IMAGINE simulator
RICE UNIVERSITY
Estimation - detection algorithms?
Sophisticated, computationally complex algorithms proposed for 3G - 4G standards
Typically need complex operations, huge matrix sizes, matrix inversions
Difficult for hardware implementation and for real-time performance
RICE UNIVERSITY
rbR Hiibr
bbR Tiibb
RA*R bribb
Multiuser channel estimation algorithm
= {+1, -1} : training/tracking bits
= 8-bit integer (complex) : Received signal
N = spreading gain (typically fixed, e.g. 32)
K = number of users (variable, <=N)
= maximum likelihood channel estimate
Cr
RbN
i
2Ki
bi
ri
Ai
RICE UNIVERSITY
Iterative scheme for channel estimation
Bit-streaming : suitable for tracking (window length
L)
Method of gradient descent
Stable convergence behavior
Simple fixed-point VLSI architecture [ASAP 2000]
T00
TLL
)1i(bb
)i(bb b*bb*bRR
H00
HLL
)1i(br
)i(br r*br*bRR
)RR*A(AA )i(br
)i(bb
)1i()1i()i(
RICE UNIVERSITY
Comparisons
Implementation ClockRate
Full AdderCells
Data Rates
C67 DSP 166 MHz - 1.02 KbpsArea 500 MHz 248 3.81 Kbps
: : : :Area-Time 500 MHz 104 256 Kbps
: : : :
Time 500 MHz 2x107 83.33 Mbps
DSPs unable to exploit bit-level parallelism Inefficient storage of bits Replacing multiplications by
additions/subtractions
RICE UNIVERSITY
Multiuser detection innovations
Developed a simple architecture for asynchronous multiuser detection for CDMA [ + , x ]
Bit-streaming reduced latency eliminates window edge computations lower memory requirements
Pipelined stages higher throughput (with more hardware)
RICE UNIVERSITY
Block Pipelined Detector
Variable latency [Worst case (1st bit) D*latency per bit]
2 extra edge bit computations per stage.
11 MF 22
Bits 12-21
TIME
1 MF 12
Bits 2-11
1 PIC 12 11 PIC 22
1 PIC 12 11 PIC 22
1 PIC 12 11 PIC 22
RICE UNIVERSITY
Bit-streaming multiuser detection
d
ddd
d
D
1i
i
1i
1
ˆ
ˆˆˆ
ˆ
m1i
mi
m1i
mi
mi dRdCdLyy ˆˆˆ
KKRC,L, Ky
Savings in memory by D2
1H10
H00
H1
1H01
H10
H00
H1
1H0
1H01
H10
H00
H1
1H00
H0
AAAAAA00
AAAAAAAA0
0AA
0AAAAAAAA
000AAAA
RICE UNIVERSITY
1 2 3 4 5 6 7 8 9 1 0 11 1 2
1 2 3 4 5 6 7 8 9 1 0 11 1 2
1 2 3 4 5 6 7 8 9 1 0 11 1 2
1 2 3 4 5 6 7 8 9 1 0 11 1 2
Pipelining the multiuser detector
Matched Filter
(causal)
PIC - Stage 1
PIC - Stage 2
PIC - Stage 3
TIME
Latency = 2*latency per bit (D/2 speedup over block)
eliminated edge bit computations. [ISCAS 2001]
RICE UNIVERSITY
Contents
Algorithms for channel estimation and detection
Conventional and on-line arithmetic designs
Programmable architecture design using the
IMAGINE simulator
RICE UNIVERSITY
Matched filter with conventional arithmetic
d p = s ig n (A H r )
A Hp ,1 A H
p ,2 A Hp ,N - 1
+
+
+
+
+
A H r
A Hp ,N
* * * *
r0 r1 rN - 1 rN
T ~ log(N) * log(d)
N - dot product sized - precision
RICE UNIVERSITY
Conventional MF using CSAs
A Hp ,1 A H
p ,2 A Hp ,N - 1 A H
p ,N
* * * *
r0 r1 rN - 1 rN
2 * N :2 C o m p re s s o r
C P A
d p = s ig n (A H r )
A H r
s C s C s C s C
s C
T ~ a + log(d+c)
a,c - constants
RICE UNIVERSITY
Key concept in on-line arithmetic
Conventional detection - high precision operations (8-32 bits) followed by testing for sign.
Actual detection dependent only on most significant digits (1-3 bits).
Use MSDF computation to find the sign and avoid computation of the successive digits. [Arith-15]
Detection
RICE UNIVERSITY
Comparisons of arithmetic schemes
0 0 0 R 0 0 0 R
R R
R R
R R
a i * b i
T re e A d d itio nL e v e l 1
T re e A d d itio nR e su lt tO L d
(A ) O n -lin e a rith m etic w ith fu ll p rec is io n
a i * b i
T ree A d d itio nL e v e l 1
T ree A d d itio nR e su lt
tO L = co n s tan t
R R
R R
R R
R R
S ign d e te rm in ed a t th is p o in t. S to p !
(C ) O n -lin e a rith m etic w ith tru n ca ted p rec is io n
(B ) C o n v en tio n a l a rith m etic w ith fu ll p rec is io n
tC O N V lo g(d )
RICE UNIVERSITY
Using on-line arithmetic for detection
Channel-1,+1 -1,+1
-1 -0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Received Signal Amplitude (Normalized)
Tim
e ta
ken
for
ad
dit
ion
(N
orm
aliz
ed)
RICE UNIVERSITY
Equations
0
*2
N
EQP b
OPTe
00
*2*
*
11
*2*
*
11*5.0
N
E
prQ
N
E
prQP bb
OLe
Probability of error for optimal BPSK detection
Probability of error for on-line BPSK detection
r – radix of the number systemp – number of digits
RICE UNIVERSITY
Probability of error using on-line
RICE UNIVERSITY
On-line MF implementation
A Hp ,1 A H
p ,2 A Hp ,N - 1
+
+
+
+
+
A Hp ,N
* * * *
r0 r1 rN - 1 rN
d p = s ig n (A H r )
T ~ c
c - constant
RICE UNIVERSITY
Throughput comparisons
RICE UNIVERSITY
Area comparisons
RICE UNIVERSITY
Implementing higher modulation schemes
1 6 -Q AM (Se e the Si g n and M ag ni tude )
R EA L
I M A G I NA R Y
0 0 0
0 0 1
0 1 0
0 11
11 1
1 0 1
11 0
0 110 1 00 0 111 11 0 1 11 0 0 0 0
wa it u n t il n e x t n o n -ze ro dig it
wa it u n t il n e x t n o n -ze ro dig it
wa it u n t il n e x t n o n -ze ro dig it
1 x x 0 0 0 0 x x
B P SK (J us t s e e the s i g n)
wa it u n t il n e x t n o n -ze ro dig it
RICE UNIVERSITY
Conclusions on arithmetic schemes
CSAs better than straightforward implementation
1.35 - 1.6X speedup for 8-32 bit precision1.64 - 1.14X less area
If reduced precision computations, on-line still better
1.67 - 2.12X speedup over CSA0.64 - 12.73X less area over CSA
RICE UNIVERSITY
Contents
Algorithms for channel estimation and detection
Conventional and on-line arithmetic designs
Programmable architecture design using the
IMAGINE simulator
RICE UNIVERSITY
A programmable architecture?
Flexibility in the algorithm requirementschannel dependent computationschanging algorithms on-the-flyseamless switching between wireless LAN
and wideband CDMA -- RENE.
Simulator needed to test performance of algorithmsextensions/modifications for critical
operations
RICE UNIVERSITY
Algorithms needed for 3G base-band base-station implementation
Equalization FFT Viterbi decoding
Channel estimation Multiuser detection Viterbi/Turbo decoding
Multiple antennas Long spreading codes Space-Time codes
Wireless LAN
W-CDMA
If you felt that life was too easy
RICE UNIVERSITY
The IMAGINE architecture and simulator
IMAGINE is a media signal processor, built at Stanford.
Many common workload features
Good starting point to explore.
Local expertise - Dr. Scott Rixner ([email protected])
RICE UNIVERSITY
IMAGINE architecture
Great for media processing algorithms1024 pt FFT in 7.4 s on a 500 MHz
processor with a 8-cluster (48 units) 3.8W of power
Great for parallel, vector and streaming computations
Performance/extensions to sequential computation kernels such as Viterbi traceback needs to be investigated.
RICE UNIVERSITY
Conclusions
Algorithm steps for designing communication systems
Design hardware-efficient versionsFixed-point implementationDSP implementation - bottlenecksTask partitioning, pipelining, parallelismComputer arithmetic ideas -- VLSI
Integration into a programmable processor