01424535

6
A Comparative Study of MIMO Detection Algorithms for Wideband Spatial Multiplexing Systems Jingming Wang Babak Daneshrad Electrical Engineering Department University of California, Los Angeles, CA 90095 Email: {wangj, babak}@ee.ucla.edu Telephone: (310) 825-7792 Abstract— The implementation of wideband MIMO system posts a major challenge to hardware designers due to the huge processing power required for MIMO detection. To achieve this goal with complete VLSI solution, channel coding and MIMO detection are preferrably separated so that each of them can be fit into a single chip. In this paper, a comparative study is presented regarding various uncoded adaptive and non-adaptive MIMO detection algorithms. Intended to serve as a reference for system designers, this comparison is performed from several different perspectives including theoretical formulation, simu- lated BER/PER performance, and hardware complexity. All the simulations are conducted within MIMO-OFDM framework and with a packet structure similar to that of IEEE 802.11a/g standard. As the comparison results show, the RLS algorithm appears to be an affordable solution for wideband MIMO system targeting at Giga-bit wireless transmission. As a direct result of this work, an ASIC for 25MHz wideband 8 × 8 MIMO-OFDM system using RLS has been designed and fabricated. Index Terms— MIMO systems, signal detection, packet radio, wireless LAN I. I NTRODUCTION In recent years, multiple input - multiple output (MIMO) based wireless communications has received widespread at- tention in the communication community. To date, a majority of the work in this area has been of a theoretical nature [1], [2], [3] and little attention has been paid to the implementation requirements of MIMO systems. Recently the UCLA Wireless Integrated Research (WISR) group embarked on a project to develop a wideband (25MHz) real-time MIMO-OFDM testbed at 5.2GHz RF. The ultimate objective is to develop both system solution and novel VLSI architecture to enable real-time Giga- bps indoor wireless communications. One of the challenges in building a wideband MIMO system is the trememdous processing power required at the receiver side. While coded MIMO schemes offer better performance than separate channel coding and modulation scheme by fully exploring the tradeoff between multiplexing and diversity [4], its hardware complexity can be practically formidable, especially for wideband system with more than 4 antennas on both transmitter and receiver sides. On the other hand, it’s much easier to find a VLSI solution using traditional channel coding schemes such as convolutional code and Turbo MIMO Rx MIMO Detect Rich-Scattering Fading Channel MIMO Tx Channel Coding Channel Decoding Input bits Output bits Fig. 1. Uncoded spatial multiplexing system code for data rate of hundreds of Mbps. For this reason, we start off by considering the uncoded MIMO schemes, also called spatial multiplexing as shown in Fig. 1, and carry out a side-by-side comparative study to evaluate a number of uncoded MIMO detection algorithms from both performance and implementation point of view. This paper is organized as follows. Section II describes the channel model and review a list of well-established adap- tive and non-adaptive algorithms. The adaptive algorithms were originally targeted for equalization and beamforming applications. Here they have been extended specifically for the detection of MIMO signals. The performance of these algorithms are simulated and compared in Section III. To make the results comparable, the simulation is conducted with a packet structure similar to that of the IEEE 802.11a/g standard. Section IV compares the hardware implementation complexity of these algorithms and Section V concludes the paper by identifying RLS as the preferred solution for Giga-bps wireless MIMO systems. II. MIMO DETECTION FOR FLAT-FADING CHANNEL A. MIMO Channel Model Consider a MIMO system where N t different signals are transmitted and arrive at an array of N r (N t N r ) receivers via a rich-scattering flat-fading environment. Grouping all the transmitted and received signals into vectors, the system can be viewed as transmitting an N t × 1 vector signal x through an N r × N t matrix channel H, with N r × 1 Gaussian noise vector v added at the input of the receiver y = Hx + v, (1) where y is the received N r × 1 vector. The (n r ,n t )-th element of H, h nr nt , is the complex channel response from the n t -th IEEE Communications Society / WCNC 2005 408 0-7803-8966-2/05/$20.00 © 2005 IEEE

description

article

Transcript of 01424535

Page 1: 01424535

A Comparative Study of MIMO DetectionAlgorithms for Wideband Spatial Multiplexing

SystemsJingming Wang Babak Daneshrad

Electrical Engineering DepartmentUniversity of California, Los Angeles, CA 90095

Email: {wangj, babak}@ee.ucla.eduTelephone: (310) 825-7792

Abstract— The implementation of wideband MIMO systemposts a major challenge to hardware designers due to the hugeprocessing power required for MIMO detection. To achieve thisgoal with complete VLSI solution, channel coding and MIMOdetection are preferrably separated so that each of them canbe fit into a single chip. In this paper, a comparative study ispresented regarding various uncoded adaptive and non-adaptiveMIMO detection algorithms. Intended to serve as a referencefor system designers, this comparison is performed from severaldifferent perspectives including theoretical formulation, simu-lated BER/PER performance, and hardware complexity. Allthe simulations are conducted within MIMO-OFDM frameworkand with a packet structure similar to that of IEEE 802.11a/gstandard. As the comparison results show, the RLS algorithmappears to be an affordable solution for wideband MIMO systemtargeting at Giga-bit wireless transmission. As a direct result ofthis work, an ASIC for 25MHz wideband 8 × 8 MIMO-OFDMsystem using RLS has been designed and fabricated.

Index Terms— MIMO systems, signal detection, packet radio,wireless LAN

I. INTRODUCTION

In recent years, multiple input - multiple output (MIMO)based wireless communications has received widespread at-tention in the communication community. To date, a majorityof the work in this area has been of a theoretical nature [1],[2], [3] and little attention has been paid to the implementationrequirements of MIMO systems. Recently the UCLA WirelessIntegrated Research (WISR) group embarked on a project todevelop a wideband (25MHz) real-time MIMO-OFDM testbedat 5.2GHz RF. The ultimate objective is to develop both systemsolution and novel VLSI architecture to enable real-time Giga-bps indoor wireless communications.

One of the challenges in building a wideband MIMO systemis the trememdous processing power required at the receiverside. While coded MIMO schemes offer better performancethan separate channel coding and modulation scheme by fullyexploring the tradeoff between multiplexing and diversity[4], its hardware complexity can be practically formidable,especially for wideband system with more than 4 antennason both transmitter and receiver sides. On the other hand,it’s much easier to find a VLSI solution using traditionalchannel coding schemes such as convolutional code and Turbo

MIMO

Rx

MIMO

Detect

Rich-Scattering Fading Channel

MIMO

Tx

Channel

Coding

Channel

Decoding

Input

bits

Output

bits

Fig. 1. Uncoded spatial multiplexing system

code for data rate of hundreds of Mbps. For this reason, westart off by considering the uncoded MIMO schemes, alsocalled spatial multiplexing as shown in Fig. 1, and carry outa side-by-side comparative study to evaluate a number ofuncoded MIMO detection algorithms from both performanceand implementation point of view.

This paper is organized as follows. Section II describes thechannel model and review a list of well-established adap-tive and non-adaptive algorithms. The adaptive algorithmswere originally targeted for equalization and beamformingapplications. Here they have been extended specifically forthe detection of MIMO signals. The performance of thesealgorithms are simulated and compared in Section III. To makethe results comparable, the simulation is conducted with apacket structure similar to that of the IEEE 802.11a/g standard.Section IV compares the hardware implementation complexityof these algorithms and Section V concludes the paper byidentifying RLS as the preferred solution for Giga-bps wirelessMIMO systems.

II. MIMO DETECTION FOR FLAT-FADING CHANNEL

A. MIMO Channel Model

Consider a MIMO system where Nt different signals aretransmitted and arrive at an array of Nr (Nt ≤ Nr) receiversvia a rich-scattering flat-fading environment. Grouping all thetransmitted and received signals into vectors, the system canbe viewed as transmitting an Nt × 1 vector signal x throughan Nr × Nt matrix channel H, with Nr × 1 Gaussian noisevector v added at the input of the receiver

y = Hx + v, (1)

where y is the received Nr×1 vector. The (nr, nt)-th elementof H, hnrnt

, is the complex channel response from the nt-th

IEEE Communications Society / WCNC 2005 408 0-7803-8966-2/05/$20.00 © 2005 IEEE

Page 2: 01424535

transmit antenna to the nr-th receive antenna. x is zero-meanand has a covariance matrix of Rx = E{xx∗} = σ2

xI. v isalso zero-mean and Rv = E{vv∗} = σ2

vI.In frequency-selective fading channels, the channel fre-

quency response hnrntis no longer characterized by a con-

stant, but rather a function of the frequency

y(f) = H(f)x(f) + v(f). (2)

When OFDM modulation is used, the entire channel is dividedinto a number of subchannels. These subchannels are spacedorthogonally to each other such that no inter-carrier interfer-ence (ICI) is present at the subcarrier frequency under perfectsampling and carrier synchronization. When sampled at thesubcarrier frequency of fnc

, the channel model becomes

y(nc) = H(nc)x(nc) + v(nc), nc = −Nc/2, · · · , Nc/2 − 1.(3)

With Nc sufficiently large, the subchannel at each of thesubcarriers can be regarded as flat-fading. Therefore, whenusing OFDM, the MIMO detection over frequency-selectivechannels is transformed into MIMO detection over Nc nar-rowband flat-fading channels. For this reason, we only focuson the MIMO detection algorithms in flat-fading channels inthe rest of the paper.

B. Linear MIMO Detection

A straightforward approach to recover x from y is to use anNt × Nr weight matrix W to linearly combine the elementsof y to estimate x, i.e. x̂ = Wy.

1) Zero-Forcing (ZF): The ZF algorithm attempts to nullout the interference introduced from the matrix channel bydirectly inverting the channel with the weight matrix

WZF = H† = (H∗H)−1H∗. (4)

2) Minimum Mean-Squared Error (MMSE): A drawback ofthe ZF is that nulling out the interference without consideringthe noise could boost up the noise power significantly, which inturn results in performance degradation. To solve this, MMSEminimizes the mean squared-error, i.e. J(W) = E{(x −x̂)∗(x− x̂)}, with respect to W, and the optimum solution is[5], [6]

Wo = RxyR−1y =

(H∗H + σ2

v/σ2xI

)−1H∗. (5)

C. Nonlinear MIMO Detection

1) VBLAST: A popular nonlinear combining approach isthe vertical Bell Labs layered space time algorithm (VBLAST)[3]. It uses the detect-and-cancel strategy similar to that ofdecision-feedback equalizer. Either ZF or MMSE can beused for detecting the strongest signal component used forinterference cancellation . The performance of this procedureis generally better than ZF and MMSE, as will be shown inSection III.

2) Maximum Likelihood (ML): ML detection has the bestperformance among all the MIMO detection algorithms. Itfinds the x that minimizes

||y − Hx|| = e∗yey = (y − Hx)∗(y − Hx), (6)

i.e. the most likely transmitted signal that causes the smallestdifference (squared error) from the received signal. The prob-lem can be solved by enumerating over all possible x andfinding the one that causes the smallest e∗yey. As the signalconstellation M and Nt increase, the computational complex-ity increases exponentially and could become prohibitivelyhigh for practical applications.

D. Linear Adaptive MIMO Detection

Instead of assuming known channel matrix H, which usu-ally requires channel probing before each transmission andthen calculating W in a bursty manner, adaptive algorithmsestimate W directly through iteration via the use of a knowntraining sequence at the beginning of each transmission.

1) Least Mean-Square (LMS): LMS is an estimate of thesteepest descent algorithm [5] and updates W according to

Wi = Wi−1 + µ [xi − Wi−1yi]y∗i , (7)

where µ is the update step size. For LMS to converge in themean-squared sense, i.e. Ji → J∞, µ needs to satisfy 0 <µ < 2/λmax, where λmax is the largest eigenvalue of Ry =σ2xHH∗ + σ2

vI. Therefore, the convergence of LMS dependson both channel condition and signal to noise ratio at the inputof the receiver. The final residual error also depends on thevalue of µ.

2) Recursive Least-Squares (RLS): RLS is the recursivesolution to the exponentially weighted least-squares (LS) prob-lem [5]. The recursive optimal solution at time instant i is

Wi = Wi−1 + (xi − Wi−1yi)y∗i Pi , (8)

Pi = λ−1

[Pi−1 − λ−1Pi−1yiy∗

i Pi−1

1 + λ−1y∗i Pi−1yi

], (9)

where 0 � λ < 1 is the exponential forgetting factor, and

Pi =(∑i

k=0 λi−kyky∗k

)−1

is the inverse of the weightedcorrelation matrix of yi with initial condition P−1 = π0I.The scalar π0 is usually a large positive number and λis very close to 1. Compared to the stochastic estimationproblems given previously which require the signal statisticssuch as correlation matrix, LS problem is deterministic [5],[6]. Therefore, RLS can be used to find the LS solution to anon-stationary process, or simply said, RLS can track non-stationary process in the LS sense. When xi, H, and vi

are all stationary, it is the weighted time-average estimate toMMSE as i → ∞ if Rxy and Ry in (5) are replaced by∑i

k=0 λi−kxky∗k and

∑ik=0 λi−kyky∗

k, respectively.

IEEE Communications Society / WCNC 2005 409 0-7803-8966-2/05/$20.00 © 2005 IEEE

Page 3: 01424535

0 5 10 15 20 25 30 35 4010

−5

10−4

10−3

10−2

10−1

100

SNR (dB)

Bit

Err

or R

ate

(BE

R)

ZFZF−VBLASTα=1

α=2 α=4

1x1 4x4 8x8

1x2

2x4

4x8

1x4 2x8

2x2

2x2

8x8

4x4

2x4

4x8

Fig. 2. BER performance of ZF and ZF-VBLAST

III. COMPARISON OF PERFORMANCE

A. Simulation Setup

Simulations of performance are conducted within MIMO-OFDM framework using 25MHz bandwidth. The packet struc-ture used in simulation can be found in [7], which is similarto the IEEE 802.11a/g standard in frequency domain. In timedomain, each packet has 400 OFDM blocks and a duration of1.28ms excluding the training blocks in adaptive algorithms.4-QAM is assumed unless otherwise noted. The simulationis concluded by calculating uncoded bit/packet error rate(BER/PER) when 400 packets with errors are collected. Butthe total number of packets simulated is no less than 1, 000 andno more than 40, 000. Perfect sampling and carrier frequencyoffset synchronization are assumed throughout the simulation.

The channel is assumed to be quasi-static – constant for anentire packet, but independent among different packets. Eachchannel path is generated independently using the exponentialdecaying Rayleigh fading channel model [8]. The impulseresponse of the channel is composed of equally spaced i.i.d.complex Gaussian taps with a power-delay profile of

P (k) = P0e−kTs/τrms , (10)

where Ts is the sampling period. The last tap is determined by30dB power degradation from the first tap. P0 = 1−e−Ts/τrms

ensures that the channel has unity average energy such thatSNR is uniquely determined by

SNR = Nt · σ2x/σ2

v. (11)

τrms is the RMS delay spread of the channel and is 50nsin our simulations. Perfect channel knowledge is assumedat the receiver for non-adaptive algorithms. For LMS/RLSsimulations, a different pseudo-random training sequence istransmitted before each packet.

0 5 10 15 20 25 30 35 4010

−5

10−4

10−3

10−2

10−1

100

SNR (dB)

Bit

Err

or R

ate

(BE

R)

MMSEMMSE−VBLAST

α=1

α=2

α=4

1x1

2x2 4x4

8x8

1x2 2x4 4x8

1x4

2x8 16x16 4x4 8x8 2x2

4x8 2x4

Fig. 3. BER performance of MMSE and MMSE-VBLAST

B. BER Performance

The BER results are shown in Fig. 2 and Fig. 3. Ingeneral, the BER vs. SNR curves are clustered according toα = Nr/Nt. Different algorithms with the same α tend toyield close BER performance, and the higher α is, the betterperformance because of the diversity gain. Among all thedetection algorithms, the general order from best to worst is:MMSE-VBLAST, ZF-VBLAST, MMSE, ZF, as expected. Atlow SNR, VBLAST, especially ZF-VBLAST, may underper-form ZF/MMSE since the gain from successive interferencecancellation has been largely offset by the error propagationfrom the substantial amount of decision errors. As the SNRincreases, the number of decision errors decreases, and the per-formance gain from interference cancellation when comparedto ZF/MMSE becomes evident. However, in ZF-VBLASTcase, this gain is not significant. For instance, for 2 × 2 atBER = 10−3, ZF-VBLAST is only 2dB better than MMSE,while for 4 × 4, the gain is only 2.5dB. In contrast, MMSE-VBLAST offers much higher gain. Therefore, if VBLAST isto be implemented, MMSE-VBLAST is preferred over ZF-VBLAST. Note that when Nt = 1, the simulated BER ofMMSE matches the theoretical performance of maximum ratiocombining (MRC) in Rayleigh flat-fading channel. Moreover,for Nt = 1 with 4-QAM, ZF/MMSE/VBLAST are effectivelythe same since the corresponding W’s differ only by a positivemultiple.

The difference between ZF and MMSE diminishes as theSNR increases, due to the similarity between (4) and (5), andthe relationship between σ2

x/σ2v and SNR in (11). Such effect

is more obvious when α > 1 or Nt is large. When α > 1,it requires smaller SNR for ZF and MMSE to become closeto each other due to the diversity gain. For the same SNRrange, σ2

x/σ2v is smaller when Nt is large and therefore ZF

and MMSE are closer. For the same reason, at α = 1, MMSE-VBLAST performs significantly better than ZF-VBLAST. Butfor α > 1, at relatively low SNR (when α > 1 is really

IEEE Communications Society / WCNC 2005 410 0-7803-8966-2/05/$20.00 © 2005 IEEE

Page 4: 01424535

0 5 10 15 20 25 30 35 4010

−5

10−4

10−3

10−2

10−1

100

SNR (dB)

Bit E

rror R

ate

(BER

)

4−QAM16−QAM

α=24−QAM

1x1 2x2

2x2

4x4

4x48x8

1x2 2x4 2x44x8

α=216−QAM

α=116−QAM

α=14−QAM

Fig. 4. BER performance of MMSE 4-QAM vs. 16-QAM

needed), this difference becomes much smaller.More antennas can generate more diversity gain, even

though α is kept constant. Take MMSE for example, whenα = 2, BER from best to worst is 4× 8, 2× 4, 1× 2. Similarargument holds for α = 4. For α = 1, this trend is not aspronounced. For VBLAST-based solutions, this is valid in allcases. The only exception to this trend is ZF with α = 1due to its noise enhancement effect. But as the SNR increasesbeyond the range that is shown, ZF yields similar results asMMSE as they converge to each other.

The benefits of using multiple antennas is further shown inFig. 4. For the same throughput rate, using multiple antennasand smaller QAM constellation is more efficient compared tolarger constellation with a smaller number of antennas. FromFig. 4, it is clearly seen that increasing power is not the mostefficient way to increase throughput. At BER = 10−3, it takes5 − 10dB more SNR for Nt × Nr 16-QAM system to reachthe same BER performance as 2Nt × 2Nr 4-QAM.

C. PER Performance

The PER performance curves are shown in Fig. 5. ZF andMMSE yield very close PER performance, and similarly, ZF-VBLAST and MMSE-VBLAST (except for Nt = Nr > 1).For this reason, the curves in Fig. 5 only illustrate MMSEand MMSE-VBLAST. The performance of VBLAST is con-sistently better than ZF/MMSE since for PER to reach below100%, the SNR is already sufficiently high for infrequent errorpropagation. At α = 1, the PER for MMSE increases with thenumber of antennas as compared to the roughly overlappedcurves previously observed in the BER plot.

For τrms = 50ns, the channel selectivity leads to a degrada-tion in PER compared to flat-fading channel. This is becausefor each packet, it’s more likely to see bit errors caused bya deep null in the channel frequency response (correspondingto lower SNR), than in channels with smaller τrms. This isreadily mitigated via interleaved channel coding techniques.On the other hand, the BER stays the same for different τrms

0 5 10 15 20 25 30 35 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SNR (dB)

Pack

et E

rror R

ate

(PER

)

MMSEMMSE−VBLAST

α=1

1x1

2x2 4x4

8x8

α=2

1x2

4x8

2x4

α=4

1x4

2x8

Fig. 5. PER performance of MMSE and MMSE-VBLAST

0 50 100 150 200 250 300 350 400 450 50010

−4

10−3

10−2

10−1

100

Bit

Err

or R

ate

(BE

R)

Number of iterations

SNR=30dBSNR=20dB

LMS, SNR=30dB

RLS, SNR=20dB

RLS, SNR=30dB

1x1

2x2 4x4

8x8

1x1, LMS/RLS, SNR=30dB

Fig. 6. The learning curves of LMS (µ = 0.02) and RLS (λ = 0.99)

as long as the cyclic prefix is sufficiently long compared toτrms. In our simulations, we have used a cyclic prefix lengthof 16 × 40ns= 640ns, long enough for τrms = 50ns.

D. Convergence of LMS/RLS

LMS and RLS are resursive alternatives to the matrixinversion-based solutions. Under the simulation environments,the BER/PER performance of RLS will ultimately convergeto the MMSE results shown before with sufficient trainingwhile that of LMS should get very close when µ is verysmall. What’s key here is the convergence speed, as shown inFig. 6 in the form of ensemble average BER learning curves.As expected, at the same SNR, LMS converges much slowerthan RLS except for 1 × 1 case where the learning curvesactually overlap with each other. The required training lengthdepends on various factors including number of antennas,SNR, and updating factor µ or λ. For RLS, the traininglength is roughly on the order of Nt · Nr. For LMS, it’sabout 10 times longer. At higher SNR, it takes RLS longerto converge because the learning curve at higher SNR has

IEEE Communications Society / WCNC 2005 411 0-7803-8966-2/05/$20.00 © 2005 IEEE

Page 5: 01424535

a deeper BER floor to reach while the initial (also fastest)learning slope for different SNR is similar. Larger µ orsmaller λ can increase the speed of convergence for LMSand RLS, respectively. But on the other hand, they tend toadversely affect the final convergence performance at the sametime. The convergence of LMS also depends on the channelcondition number (eigenvalue spread). RLS guarantees theconvergence at the price of higher hardware complexity andless robustness to quantization effects [5], [6]. Similar to theBER performance, the effect of channel delay spread on theconvergence of LMS/RLS is negligible when the cyclic prefixis long enough.

IV. COMPUTATIONAL COMPLEXITY

The comparison of different MIMO detection algorithmsis not complete without the implementation complexity alsofactored in. Since the hardware cost of each algorithm is highlyimplementation-specific, we try to provide a rough estimationof the required multiplications for each algorithm based on thefollowing assumptions:

• Complex matrix multiplication AL×M ×BM×N requiresL · M · N complex scalar multiplications;

• The matrix inversion operation of matrix AN×N takesO(N3) multiplications [9];

• Each complex multiplication consists of 4 real ones, andeach real multiplication takes one operation to finish;

• Only the highest and second highest order components,e.g. N3 + 2N2 in N3 + 2N2 + N , are counted for therequired multiplications;

• For simplicity, O(N3) is counted as N3 in calculatingthe number of Giga-operation per second (GOPS).

The corresponding results based on these assumptions aregiven in Table I. The GOPS figure is calculated for 25MHzbandwidth and 4-QAM modulation. Due to the over-simplifiednature of these assumptions, the estimation in Table I is onlymeaningful in the order of magnitude sense. The estimationsfor LMS and RLS are per iteration and those for the non-adaptive algorithms are for estimating W and do not takeinto account the cost of estimating the channel matrix H.For all algorithms except ML, there is an additional cost forestimating x̂ = Wy, as shown in the bottom row of the table.This overhead is only significant for LMS and RLS.

From the comparison, ML is easily ruled out for practi-cal purpose. All the non-adaptive algorithms demand higherprocessing power than adaptive algorithms. As a reference,the fastest commercially available DSP from TI runs at 5.7GIPS, or about 1.2 GOPS. For single ASIC solution, LMS andRLS appear to be the only realistic candidates for widebandapplication with 4 to 8 antennas. A closer observation revealsthat, RLS is no longer much more expensive compared toLMS, as conventional wisdom might suggest. This is mainlydue to the fact that in MIMO scenario, the cost of LMS ismultiplied by Nt since there are Nt signals to estimate, whilein RLS, updating Nr×Nr inverse correlation matrix Pi brings

TABLE I

COMPARISON OF COMPUTATIONAL COMPLEXITY

Real Giga Op. per SecondAlgorithm

Multiplications 4 × 4 8 × 8 16 × 16

LMS 8NtNr + 2Nt 3 13 52

RLS 14N2r + 8NtNr + 6Nr 9 36 143

ZF/MMSE 4N3t + 8N2

t Nr 19 154 1,229

ZF/MMSE N4t + 8

3N3

t Nr

-VBLAST +2N3t + 4N2

t Nr33 452 6,622

ML 4NtNrMNt 410 4 × 105 1011

W × y 4NtNr 2 6 26

in no extra cost for 1 < Nt ≤ Nr than Nt = 1. Furthermore,RLS offers the advantage of superior convergence speed andinsensitivity to the eigenvalue spread of the channel [5]. Theabove implementation complexity is estimated on subcarrier-by-subcarrier basis for MIMO-OFDM system. By utilizing thefrequency domain correlation between adjacent subcarriers,the hardware cost can be further reduced by linearly interpo-lating W of non-pilot subcarriers from the estimates on pilotsubcarriers [7].

V. CONCLUSIONS

In this paper, we have provided an overview of variousMIMO detection algorithms for spatial multiplexing systems.The simulated performance of these algorithms is compared,and this comparison is further extended to a first order es-timation of their hardware costs. From these comparisons, itis observed that VBLAST generally outperforms ZF/MMSE,at the cost of significantly higher implementation complexity.In fact, ZF-VBLAST is only slightly better than MMSE(2 ∼ 3dB at uncoded BER of 10−3). The advantages ofVBLAST over ZF/MMSE become much less significant whenα > 1 because of the antenna diversity gain. MMSE-VBLASTperforms much better than ZF-VBLAST. However, in practice,inaccurate estimates of σ2

x/σ2v in (5) as welll as the channel

matrix itself tend to reduce this gain.The study shows that RLS has a much lower computation

intensity than ZF/MMSE/VBLAST and achieves the perfor-mance of MMSE with sufficient training. This is done byspreading out the computation through multiple iterations.RLS is superior to LMS in terms of convergence speed, andits hardware cost is on the same order as LMS. Comparedto ZF/MMSE/VBLAST, RLS doesn’t require explicit channelinformation and subsequent matrix inversion, and can beimplemented using the QR-decomposition based systolic arrayarchitecture [5] [6]. Therefore, it can be concluded that RLSpresents the best performance to complexity metric among thesurveyed algorithms for Giga-bps MIMO wireless systems.Based on these findings, RLS has been chosen for the MIMO

IEEE Communications Society / WCNC 2005 412 0-7803-8966-2/05/$20.00 © 2005 IEEE

Page 6: 01424535

detection in the UCLA MIMO-OFDM testbed [10], [11].In parallel, a MIMO detection chip using QR-decompositionbased RLS systolic array structure and frequency domain lin-ear interpolation [7] has been developed with TSMC 0.18µmtechnology to accomodate up to 8 × 8 antenna configuration,512 OFDM subcarriers in 12.5 MHz bandwidth using a singlechip. This architecture is further scalable in the frequencydomain to support 8×8 transmissions with n×512 subcarriersin n × 12.5 MHz bandwidth through using n identical chips.

REFERENCES

[1] G. J. Foschini and M. J. Gans, “On limits of wireless cmmunications ina fading environment when using multiple antennas,” Wireless PersonalCommunications, vol. 6, pp. 311–335, March 1998.

[2] G. J. Foschini, “Layered space-time architecture for wireless commu-nications in a fading environment using multi-element antennas,” BellLabs Tech. J., vol. 1, no. 2, pp. 41–59, Autumn 1996.

[3] P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela,“V-BLAST: An architecture for realizing very high data rates over therich-scattering wireless channel,” in Proc. ISSSE-98, Italy, Sep. 1998.

[4] L. Zheng and D. Tse, “Diversity and multiplexing: a fundamentaltradeoff in multiple antenna channels,” IEEE Tran. Inform. Th., vol. 49,pp. 1073–96, May 2003.

[5] S. Haykin, Adaptive Filter Theory, 3rd ed. Prentice-Hall, 1996.[6] A. Sayed, Fundamentals of Adaptive Filtering. Wiley-IEEE Press, 2003.[7] J. Wang and B. Daneshrad, “Performance of linear interpolation-based

MIMO detection for MIMO-OFDM systems,” in Proc. IEEE WCNC’04,vol. 2, Atlanta, GA, USA, 2004, pp. 981–986.

[8] N. Chayat, “Tentative criteria for comparison of modulation methods,”IEEE P802.11-97-96, Sep. 1997.

[9] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. JohnHopkins University Press, 1996.

[10] R. Rao and et.Al., “Multi-antenna testbeds for research and educationin wireless communications,” IEEE Commun. Mag., vol. 42, no. 12, pp.S41–50, December 2004.

[11] S. Lang, R. Rao, and B. Daneshrad, “Design and development of a 5.25GHz software defined wireless OFDM communication platform,” IEEECommun. Mag., vol. 42, no. 6, pp. S6–12, June 2004.

IEEE Communications Society / WCNC 2005 413 0-7803-8966-2/05/$20.00 © 2005 IEEE