[IEEE ICC 2012 - 2012 IEEE International Conference on Communications - Ottawa, ON, Canada...

MU-MIMO in LTE: Performance and the challenges for futureenhancement

Sean A. Ramprashad∗ Anass Benjebbour†∗ DOCOMO Innovations, Inc., Palo Alto, CA 94304

† Radio Access Network Development Department, NTT DOCOMO R&D, Yokohama, Japan

Abstract—We consider the performance and improvement ofMulti-User MIMO (MU-MIMO) in 3GPP LTE. In particular,we build on academic guidance, demonstrating the link betweenchannel state information (CSI) accuracy and MU-MIMO perfor-mance, by considering MU-MIMO with practical CSI estimationand feedback within an actual system design under practicalchannels. Link-Level simulations are provided to model the inter-play of such factors in the LTE design. We also propose apractical codebook structure to support increased CSI feedback.

The results show that for some scenarios, in particular thosewith sufficient channel coherence and low antenna correlation,that MU-MIMO can be significantly improved by increasedCSI feedback. Such conditions may be indicative of “small-cell”scenarios, an important scenario for future network capacityenhancement. The results also point to other issues that meritfurther consideration including CSI feedback granularity infrequency, CSI estimation, and improved MU-MIMO processing.

I. INTRODUCTION

Multi-input multi-output (MIMO) has been a key technol-ogy used to improve spectral efficiency in the 3GPP Long-Term Evolution (LTE) system. The LTE design uses a “pre-coding matrix” (PM) codebook to extend Single Input Sin-gle Output (SISO) transmission to MIMO transmission. Inparticular, for Single User-MIMO (SU-MIMO) a 4 transmitantenna codebook was introduced in LTE Release 8 [1]. EachPM is a unitary matrix specifying 4 orthogonal beamformingdirections. The design allows a user to select one of 16 PMchoices using a 4-bit PM Index (PMI), and to flexibly requestrank 1 to rank 4 transmission by a 2-bit rank index (RI).

A SU-MIMO system, however, with Nt transmit antennas(TXAs) at the base-station and Nr receive antennas (RXAs) atthe mobile, has a limit on the number of degrees of freedom(DoFs) of min(Nt, Nr). Given that many user terminals arenow practically limited to Nr = 2 antennas, it means that themaximum transmission rank of deployed SU-MIMO is oftenlimited to 2. Further increases in spectral efficiency (exploitingDoFs up to Nt when Nr < Nt) can be obtained throughMulti-User MIMO (MU-MIMO). Here multiple users areserved simultaneously on the same time-frequency resources,each user simultaneously receiving its own coded stream(s).Such streams are called “layers” in LTE terminology, and wewill use this terminology in the rest of the paper. However,and in contrast to SU-MIMO, since MU-MIMO decodingis done at individual users, without joint decoding acrossthe multiple RXAs spread across multiple users, MU-MIMOplaces more stringent requirements on pre-coding and ChannelState Information (CSI) accuracy compared to SU-MIMO.

In particular, to support efficient MU-MIMO it is necessaryto consider non-orthogonal beamforming, using beamformingvectors (beamvectors) outside the LTE defined unitary pre-coding matrices. Such so-called “non-codebook” based MIMOapproaches, e.g., Linear Zero-Forced Beamforming (LZFB) [2,Chapt. 11], are important since one goal of the pre-coding inMU-MIMO is to transmit to each user in a direction close to(ideally in) the null-space of the channels of other users. Suchnull-spaces are not necessarily orthogonal. By transmitting inthis way interference between the coded layers (inter-layerinterference) intended for different users is controlled. Such“non-codebook” based MIMO is supported by a new pilotstructure in LTE Release 10.

However, even with the support for LZFB, MU-MIMOscheduling and processing of such non-codebook based MU-MIMO approaches are still presently driven by informationprovided through the 4-bit PMI and the 2-bit RI. They are alsodriven by Channel Quality Indicator (CQI) feedback which as-sumes SU-MIMO transmission using orthogonal beamvectors.This common PMI/RI/CQI framework does have a key benefitin that it enables a system to consider dynamic switchingbetween various SU-MIMO and MU-MIMO modes. However,it does have a fundamental drawback for downlink MU-MIMOin some scenarios since the limited PMI/RI/CQI feedbackimplies an implicit quantization of the channel and an implicitassumption on the receiver processing at the mobile. Thisinherently limits the effective accuracy of CSI known to thebasestation, impairing the ability of the MU-MIMO processingto characterize the necessary null-spaces and control inter-layer interference.

This effect of imperfect CSI knowledge at the transmitteron MU-MIMO is a well studied subject, e.g. [3]–[6]. Resultsin these, and other, articles show that for MU-MIMO basedon LZFB the mean level of inter-layer interference scalesproportionally with the mean square error (MSE) betweenthe CSI assumed by the transmitter for LZFB and the CSIexperienced by a given data symbol at transmission. For somescenarios, e.g. the case of CSI between different pairs of TXAsand RXAs being relatively uncorrelated, a 4-bit PMI impliesa large CSI MSE and severely inter-layer interference limitedMU-MIMO [5], [6].

Effective CSI accuracy, however, is not determined onlyby the PMI feedback rate. Indeed, since some users have 2RXAs, and the ability to adjust receive filters, CSI mismatchhas to be considered relative to such adaptation. CSI is also a

Workshop on Telecommunications: From Research to Standards

978-1-4577-2053-6/12/$31.00 ©2012 IEEE 6959

“moving target”, changing across time due to user mobility andacross frequency due to multi-path. Furthermore, CSI, bothwith respect to pairs of TXAs and RXAs has to be estimatedthrough the use of pilots. CSI accuracy is therefore limited bypractical limits on CSI estimation and channel dynamics. Allsuch issues can be very challenging for many of the high-Doppler highly frequency-selective macro-cellular scenariospreviously studied by 3GPP. For these scenarios efficient MU-MIMO is inherently difficult to operate. Recently, however,attention is shifting to smaller cell scenarios. Such scenarios,which are expected to have lower delay-spread and lowermobility, could be better suited to MU-MIMO.

The analytic framework provided for by [3]–[6] provides agood foundation from which to start exploring the improve-ment of MU-MIMO for such scenarios. However, to impactstandardization Link Level Simulations are required to validateideas within the context of the existing LTE system elements,including reference signal (pilot) design, channel estimationand interpolation, link-adaptation, etc. This is needed todemonstrate how both impairments and enhancements interactin the context of a complete system design.

In this paper we hope to begin to bridge the gap betweentheoretical guidance, e.g. in [3]–[6], and standardization. Weshow that MU-MIMO can be significantly improved by in-creased PMI feedback, and can outperform SU-MIMO underchannel assumptions relevant in small-cell scenarios. We alsoshow that feedback delay and granularity in frequency arealso important factors in other scenarios. We demonstrate suchprinciples using link-level (bit-level) simulations conformingto the LTE Release 10 design. This provides a foundation onwhich to understand the many factors affecting the CSI mis-match, and ultimately its impact on MU-MIMO performance.

II. OFDM, CHANNEL ESTIMATION AND FEEDBACK

Before considering MIMO, we review some important el-ements of the LTE design, in particular those that impactCSI accuracy, e.g. CSI estimation and feedback. To sup-port non-orthogonal beamforming LTE Release 10 adopteda new downlink reference signal structure. This structureapplies beamforming vectors to DeModulation Reference Sig-nal (DMRS) pilots allowing for “dedicated” composite (netchannel×beamvector) channel estimation during data trans-mission without users having to know the identity of thebeamvectors being used.

We consider LTE downlink transmission based on thisreference signal structure. A 10 MHz system is considered,consisting of 600 sub-carriers. Sub-carriers are grouped in 50non-overlapping subsets, called Resource Blocks (RBs), of 12carriers each. The unit in time is a 1 msec unit consistingof 14 OFDM symbols. OFDM sequences are created usinga 1024 point FFT over the sub-carriers. A cyclic prefix isapproximately 5 μsec.

The structure of a single RB over a 1 msec unit is illustratedin Figure 1. Here each RB consists of 14 × 12 ResourceElements (REs), with RE “re” centered at a given point(tre, fre) in time and frequency. REs are assigned complex

0 1 2 3 4 5 6 7 8 9 10 11 12 13

11

10

9

8

7

6

5

4

3

2

1

0

CRS (Rel. 8) CSI-RS (Rel. 10)

c1

c2

c4

c3

Subframe (= 1 msec)

12

su

bcarr

iers

(=

18

0 k

Hz)

PDCCH, etc. DMRS (Rel. 9/10)

Data

d1

d3

d4

d2

d1

d3

d4

d2

d1

d3

d4

d2

d1

d3

d4

d2

d1

d3

d4

d2

d1

d3

d4

d2

Reso

urc

e e

lem

en

t (R

E)

Fig. 1. Mapping of resource elements for rank-4 transmission in Release 10.

values, e.g. linear combinations of QAM values, as a functionof the beamvectors and coded streams.

In Link-Level simulations we simulate the turbo encodingand decoding process at the bit-level, using randomly gen-erated information (bit) sequences. For channel coding andmodulation we consider a limited set of 15 Modulation CodingScheme (MCS) combinations as given in [2, Table 10.1]. Wenow discuss CSI specific elements of the RB design.

A. Downlink CSI Estimation between TXAs and RXAs

For downlink transmission in a Frequency Division Duplex(FDD) system, channel state estimation (for each RXA withrespect to each TXA) is supported by CSI-Reference Signal(CSI-RS) REs. These REs are not influenced by beamvectors.Referring to Figure 1, the values assigned to (c1, c2) aretransmitted only on TXAs 1 and 2 using known pilot symbolsand a known orthogonal cover code [7]. This facilitates CSIestimation at each of the RXAs at the mobile with respect toTXAs 1 and 2. Such an estimate between RXA p and TXA qat the pilot location in time and frequency (t, f) is a complexvalue hp,q(t, f). Similarly, complex values (c3, c4) facilitateCSI estimation for antennas 3 and 4.

The CSI values estimated by such pairs of REs provideestimates with respect to different antennas at two differenttime-frequency points in each RB. To make a Nt × Nr

channel estimate H(t, f) for common location (t, f) the CSI-RS estimates need to be interpolated. This is often donevia 2 dimensional Minimum Mean Square Error (2D-MMSE)interpolation using a general model for the power-delay profileof channels that may be encountered, e.g. [8, Appendix A].However, any given model assumption may not conform to thespecific scenario encountered in practice by a user, resultingin another source of CSI mismatch.

To support PMI/RI/CQI determination, we make a Nt ×Nr estimate Hi = H(t∗i , f

∗i ) =

[h1(t

∗i , f

∗i ), h2(t

∗i , f

∗i )]

for

(each) RB i at a reference location (t∗i , f∗i ) near the center of

6960

the RB. Here the vector

h1(t∗i , f

∗i ) = [h1,1(t

∗i , f

∗i ), h1,2(t

∗i , f

∗i ), . . . , h1,Nt

(t∗i , f∗i )]

T

where hp,q(t∗i , f

∗i ) is the interpolated CSI estimate between

RXA p and TXA q.The accuracy of the estimate H(t∗i , f

∗i ) with respect to the

true channel H(t∗i , f∗i ) = [h1(t

∗i , f

∗i ),h2(t

∗i , f

∗i )] at location

(t∗i , f∗i ) depends on many factors, including the signal-to-noise

(SNR) experienced at the CSI-RS REs1. Of course, in general,the CSI itself varies due to frequency selectivity and mobilityin the environment (Doppler). As a result the CSI H(tre, fre)experienced at a data RE, “re”, will in general be differentfrom both H(t∗i , f

∗i ) and H(t∗i , f

∗i ) that are calculated at the

reference location(s). CSI accuracy is therefore limited by theestimation and 2D interpolation of CSI-RS. It is also limitedby delays in time between when CSI is estimated and whenit is used for sending MIMO transmissions.

B. PMI, RI, and CQI Feedback based on Release 10

For a given user, the set of reference channel estimates{Hi}50i=1 are used to select PMI, RI and CQI feedback. Asnoted, LTE Release 8 defines a set of 16 unitary 4×4 Precod-ing Matrices (PMs) for Nt = 4. Define PM m, 1 ≤ m ≤ 16,by

Mm =[mm,1, mm,2, mm,3, mm,4

](1)

where Mm ∈ CNt×Nt and mm,k ∈ C

Nt×1. For each PM, andeach possible rank choice n, 1 ≤ n ≤ 4, a (single) pre-definedsubset Mm,n of the n columns of Mm has been specified byRelease 8 for rank n transmission. The subset is defined by

Mm,n =[mm,a1,n

, mm,a2,n, . . . , mm,an,n

](2)

where the indices {am,ni,j} are pre-defined [1, Chapt. 7].

The PMI, RI and CQI feedback in Release 10, using thisdesign, is based on the assumption of SU-MIMO transmissionusing orthogonal beamvectors from an assumed Mm,n. Toselect the PMI/RI, a rate estimate vector ri,m,n is made foreach RB i for each possible (m,n) PMI/RI combination.Assume without loss in generality a total transmission powerP and additive unit-variance Gaussian white noise on eachRXA. Take the case of n = Nr = 2. Assuming the power issplit equally between the two beamvectors in Mm,2, the rateestimate vector for RB i for n = 2 is given by

ri,m,n = [ri,m,n(1), ri,m,n(2)] (3)

where ri,m,n(b) = log2(1 + sinri,m,n(b))

sinri,m,n(b) = si,m,n,b(b)/(1 +∑

b′ �=bsi,m,n,b(b′)) (4)

si,m,n,b =

[si,m,n,b(1)si,m,n,b(2)

]=

P

nwH

i,m,n,bHH

i Mm,n

1Estimation errors do not have to lead to a loss in DoFs supported by MU-MIMO provided that both estimation and MU-MIMO processing are donecorrectly [3]. However, such processing within the context of the LTE design,with limited knowledge to define the 2D-MMSE interpolation, is challenging.

The estimate above assumes the use of a MMSE receive filterwi,m,n,b = wi,m,n,b/|wi,m,n,b| for decoding stream b. Herewi,m,n,b is the bth row of Wi,m,n,Wi,m,n ∈ C

2×2, where

Wi,m,n=√

P2

(P2 H

H

i Mm,nMH

m,nHi + I2×2

)−1

HH

i Mm,n

and I2×2 is the 2× 2 identity matrix. For n = 1 the equationsabove can be modified appropriately, and the receive filterdefaults to Maximal Ratio Combining (MRC) between RXAs.

We assume subband-based feedback. For this, RBs aredivided into non-overlapping subbands of F contiguous RBs,and RBs in a subband share a common PMI, RI, and CQI.The PMI, RI and CQI selected, PMI∗i1 , RI∗i1 , and CQI∗i1 , for ablock of RBs i1, i1 + 1, . . . , i1 + F − 1 is given by

(PMI∗i1 , RI∗i1 )= (m∗, n∗)

= argmax1≤m≤16

argmax1≤n≤2

∑i1+F−1i=i1

∑nb=1 ri,m,n(b)

CQI∗i1(b) = 21F(∑i1+F−1

i=i1ri,m∗,n∗ (b)), 1 ≤ b ≤ n∗ (5)

As noted, the PMI selection using the Release 8 PM codebookis communicated with 4 bits, and the RI with 2 bits (sincein general ranks up to 4 can be tested for Nr). The CQIfeedback is also quantized. However, in simulations to followwe consider unquantized CQI feedback based on (5).

C. Composite Channel Estimation

The training of composite channels, i.e. the net effect ofa beamvector as seen through a channel, are supported byDMRS REs. These REs are processed by beams. Assume twoQAM values η1, η2 known to both the transmitter and receiver.Also assume that the system is transmitting two streams on twounit-norm beamvectors v1 and v2, vk = [vk,1, . . . , vk,Nt

]T ∈C

Nt×1, and that the pilot for vk is given power Pk. Referringto Figure 1, the values (d1, d2) transmitted on antenna a are[

d1(a)d2(a)

]=

[1 11 −1

] [ √P1v1,aη1√P2v2,aη2

](6)

Such DMRS references can be used, via estimationand 2D interpolation, to estimate the composite channel√Pkh

Hq (tre, fre)vk, k = 1, 2, as seen on RXA q with respect

to beamvector vk at any RE “re”. Similarly, for rank-3 and4 we assume that values (d3, d4) support composite channelestimation for the 3rd and 4th beamvectors2. However, forrank-1 and 2 transmission, (d3, d4) are used as data REs.

III. CHANNEL MODELS AND CORRELATIONS IN CSI

We consider for each pair of TXA and RXA a L-tap modelwhereby the time-varying channel response at time t is:

ω(t; τ) =∑L

p=1 ap(t)δ(τ − τp) (7)

Here t represents absolute time and τ the “auxiliary” timeindex. The values a1(t), . . . , aL(t) are independent and vary intime (continuously) according to a complex Gaussian random

2This is different than MU-MIMO in Release 10 where d1 and d2 areused for all rank transmission but with scrambling for pseudo-orthgonalizationamong the references for beamvectors 1, 2, 3 and 4.

6961

Fig. 2. |E[H(t; f)H(t +Δt; f +Δf )

∗

]|2 of a 1× 1 TU channel.

process defined by the Jakes’ model. The p-th tap has averagepower α2

p and we normalize such that∑L

p=1 α2p = 1.

The Fourier transform of ω(t; τ) defines the frequencyresponse H(t, f). Under the above model, the samples H(t, f)are jointly Gaussian, and the correlation for a shift of Δf Hzand Δt seconds with a Doppler frequency fD (in Hz) is givenby

E [H(t; f)H∗(t+Δt; f +Δf )]

= J0 (2πfDΔt)∑L

p=1α2pe

j2πΔf τp , (8)

where J0(·) is the Bessel function.The values of L, {τp}Lp=1, and {αp}Lp=1 are defined for

the Pedestrian A (PedA) and Typical Urban (TU) models in[2, Tables 20.1 and 20.3]. The PedA and TU models providemodels with low and high frequency selectivity, respectively.To understand the effect and behavior of the correlation values(8) in practice, consider the TU model and the 1× 1 channelbetween a single TXA and single RXA. Assume a Doppler offD = 5.55 Hz (∼ 3 km/h at a carrier frequency of 2 GHz).Figure 2 illustrates |E [H(t; f)H(t+Δt; f +Δf )

∗] |2 for theTU model.

One can immediately see that for the TU model thechannel de-correlates very quickly between RBs (an RB isapproximately 180 kHz in width). This implies that subband-based feedback for F > 1 can be a major source ofCSI error for scenarios similar to that modeled by theTU parameters. In contrast, for the PedA model the value|E [H(t; f)H(t+Δt; f +Δf )

∗] |2 ≥ 0.9 for |Δf | ≤ 480 kHzand |Δt| ≤ 10 msec. This allows for effective use of higherF values, e.g. F = 6 as we will explore.

IV. MIMO SCHEDULING AND LINK ADAPTATION

The (PMI, RI) feedback, (m∗, n∗), available to the stationfor a user in a given subband of F RBs defines n∗ beamvec-tors as columns of Mm∗,n∗ . The CQI feedback provides anestimate of rate that may be supported by each of thesevectors. In 3GPP, an outer-loop adaptation process is usuallyassumed to compensate for a slowly varying mismatch insuch CQI feedback [9]. Here each CQI value CQI∗(b) is

adjusted by an outer-loop adapted user-specific offset valueΔu to give CQI∗(b) − Δu. The value Δu adapts based onprior transmissions, increasing (decreasing) when there is adecoding failure (success) on the first transmission of a HARQprocess [9].

For SU-MIMO the beamvectors used in transmission arethe columns of Mm∗,n∗ . In simulations Nr = 2 and userscan flexibly select rank n∗ = 1 or 2. To consider schedulingin our simulations we simulate a system in which all usershave the same average SNR. Thus for any given schedulinginstance a fair criterion is to select the user with the maximumsupported estimated rate (from outer-loop adjusted CQI valuesand summing the rate over beamvectors used). The adjustedCQI values for this user is used to select the MCS values foreach of the beamvectors. Here each MCS value correspondsto a preset range of CQI values.

For MU-MIMO we assume a scenario where each usersends only rank-1 feedback, i.e. n∗ = 1 for all users.Feedback from a user k, therefore, defines a single precodingvector qk ∈ C

Nt×1. Scheduling exhaustively considers allcombinations of n users, 1 ≤ n ≤ Nt. For a given combi-nation of n users with user-indices k1, k2, . . . , kn the vectorsQ = [qk1

,qk2, . . . ,qkn

] are handled as quantized preferredbeamforming directions. By applying (non-regularized) LZFBto such directions new beamvectors can be defined by

V = [v1, . . . ,vn] =[

1|g1|

g1, . . . ,1|gn|

gn

](9)

where [g1, . . . ,gn] = Q(QHQ)−1 (10)

We note that both V and Q depend on k1, k2, . . . , kn, but wedrop this dependence in the notation for simplicity.

The rank-1 feedback of each user does not provide infor-mation to enable the station to consider how users’ receivefilters (which do MRC for rank-1 reception) may adjust tothe new beamvectors. To model such effects, and assumingthe precoder in (9) and users k1, k2, . . . , kn, the outer-loopadjusted CQI of user kj is adjusted by a multiplicative factor|vH

j qkj|2. This accounts for alignment of the new precoding

direction relative to qkj. However, this is an imperfect CQI

prediction process since it implicitly assumes the MRC doesnot change and the zero-forcing is perfect (which it is notthe case given the imperfect knowledge and variability of theCSI). As a result the scheduling process has an inefficiencydue to the imperfect CQI prediction.

As with SU-MIMO, in a given simulation condition all usershave the same average SNR. Thus for any given schedulinginstance a fair scheduling criterion is to select the best rankn and combination {k1, . . . , kn} of users with the maximumestimated sum rate. MCS values for users are driven by thepredicted CQIs, i.e. by the outer-loop adjusted CQIs scaled bythe corresponding |vH

j qkj|2 factors.

V. SUPPORTING HIGHER PMI FEEDBACK

It is clear that there are many imperfections in MU-MIMOprocessing and scheduling, e.g. the inability to accuratelydetermine post-beamforming CQIs with the limited feedback.

6962

However, we focus on a dominant source of inefficiency forsome scenarios, i.e. that of the limited PMI feedback rate. Asnoted, presently LTE supports a 4-bit PMI feedback whichinherently represents a very coarse quantization of the CSI.It is expected that increasing the PMI feedback rate wouldimprove CSI accuracy and MU-MIMO efficiency.

We now describe a structured PMI design for a B PMI bitcodebook. The design is outlined in more detail in [10].

A. General codebook structure

A rank-1 element m ∈ C4×1, of a B bit codebook, CB,

is represented by a separate quantization of gain and phaseinformation. To support low-complexity storage and low-complexity PMI searches [10], the 4× 1 vector is representedby separate information using two 2-dimensional sub-vectors.Each rank-1 element m has the following structure:

m = [w(i1), w(i2), w(i3), w(i4)]T (11)

where Π = [i1, i2, i3, i4] is a permutation of {1, 2, 3, 4} and

w = [w(1), w(2), w(3), w(4)]T (12)

with subvectors

[w(1), w(2)] =[g(1), ejθ1g(2)

][w(3), w(4)] = ejθ3

[g(3), ejθ2g(4)

]

and

g = [g(1), g(2), g(3), g(4)] g(k) ∈ R, g(k) ≥ 0

For an efficient representation at low PMI rates the gain vector“g” and permutation “Π” are jointly coded using βg bits. Thetwo intra-subvector relative phases θ1 and θ2 are coded usingβp1

and βp2bits respectively, and the inter-subvector phase

θ3 is coded using βp3bits. For a B bit codebook βg + βp1

+βp2

+βp3= B. Furthermore, a simple extension of each rank-1

element to an orthogonal rank-2 element can be given by

m = [w(i1), w(i2), w(i3), w(i4)]T (13)

w =[ej(ν+θ3)

[−e−jθ1g(2), g(1)],[−e−jθ2g(4), g(3)

]]T

where the rank-2 element uses the same values Π, g, θ1, θ2and θ3 as the corresponding rank-1 element, and ν is a phasevalue which can be flexibly set to a known value in the range[0, 2π]. Orthogonal rank-3 and rank-4 elements can also bedefined [10], thus defining a PM structure with B PMI bits.

It should be noted that for any target channel direction q ∈C

4×1, the rank-1 elements of the codebook are designed toquantize the channel direction, not the precise phases of thevector. Specifically, for a target q the optimal rank-1 elementin the codebook is selected by mopt = argmax

m∈CB |mHq|.Thus the optimal selection is not affected by an overall phasescaling of either the target vector or codebook elements. Thisis represented by w(1) in (12) being real and positive.

TABLE IAN EXAMPLE OF A 3-BIT JOINT (g,Π) CODEBOOK

Index Π g = [g(1), g(2), g(3), g(4)]

0 [1, 2, 3, 4] [1/2, 1/2, 1/2, 1/2]1 [1, 2, 3, 4]2 [1, 3, 2, 4] [λ, λ, δ, δ]3 [1, 4, 2, 3] where4 [2, 3, 1, 4] λ =

√ρ/(2 + 2ρ)

5 [2, 4, 1, 3] δ =√

1/(2 + 2ρ)6 [3, 4, 1, 2]

B. Joint gain-permutation and phase quantization

Table I describes joint gain vector g and permutation Πcodebook design for βg = 3. We use a gain parameter ρ = 4(6 dB) in our simulations. The permutations for index values1, . . . 6 represent the full set of 6 possible selections of twoantennas as the pair antennas to get the larger gain values[g(1), g(2)]. Note, there is one free index value (value 7). Inour simulations this is used to indicate that the (PMI,RI) valuesrefer to the Release 8 codebook, thus embedding the Release8 codebook into the design.

For a gain-permutation codebook design with βg = 0 thereis a single (g,Π) combination with Π = (1, 2, 3, 4) and g =[1/2, 1/2, 1/2, 1/2]. This design has a property in commonwith the Release 8 codebook, i.e. that of “constant modulus”elements with |w(i)| = 1 ∀i for all vectors. However, forlarger B, and cases where CSI between antennas have lesscorrelation, such a design becomes inefficient.

The phase codebooks each represent a uniform samplingof phases from [0, 2π], with a sampling granularity definedby the number of bits. For example, for the coding of θ1 theβp1

bit codebook consists of 2βp1 possible values {φ : φ =

ej2πk/2βp1 , k = 0, . . . , 2βp1−1}. Combinations of all possible

parameter values Π, g, θ1, θ2 and θ3 in (11) and (13), and theembedded Release 8 elements3, define all possible rank-1 andrank-2 matrix values Mk,1 and Mk,2. These can be used forthe case Nr = 2 in the PMI/RI/CQI search in Section II-B.

C. Singular Value Decomposition (SVD)-based PMI selection

It should be clear that use of the PMI/RI/CQI searchdescribed in Section II-B with large values B, e.g. B = 12,can be prohibitively complex. An alternative search, rele-vant to channels with sufficient coherence over frequency,is a SVD-based search. Here a covariance estimate Ri1 =1F

∑i1+F−1k=i1

HiHH

i is made for a block of RBs i1, i1 +1, . . . , ii + F − 1. A SVD is performed on Ri1 . For rank-1feedback the right dominant principle eigenvector is quantizeddirectly by the rank-1 elements “Mk,1 = [mk,a1,1

] ∈ CNt×1”

in the PMI codebook. If principle eigenvector is q, the optimalrank-1 element mopt = argmax

mk,a1,1∈CB |mHq|. The PMI

bits are formed accordingly, and CQI values determined basedon mopt. For rank-2 feedback, the two dominant right prin-ciple eigenvectors are quantized using the rank-2 extensionmentioned previously. In the SVD search (both rank-1 and

3To embed the Release 8 4-bit PMI codebook when using βg = 0, 16elements of the larger codebook are replaced with the Release 8 PMs.

6963

rank-2) the optimal element can be determined with a low-complexity search as described in [10]. Here the searchesare made on individual parameters in a sequence, and thesearch complexity scales with the size of individual parametercodebooks, e.g. order of 2βg + 2βp1 + 2βp2 + 2βp3 .

VI. SIMULATION RESULTS

We performed Link Level Simulations using the transmis-sion parameters described in Section II with Nt = 4 andNr = 2 with an assumed carrier frequency of 2 GHz. Channelstates on antennas are assumed to be independent. Channelcoding is performed on Bernoulli equiprobable binary randombit sequences using the turbo coding and the MCS set in[2, Table 10.1]. HARQ uses Incremental Redundancy and thecircular buffer design described in [2]. Outer-loop CQI controlensures a block error rate of ≤ 10% on the first transmission.Scheduled streams are assigned equal transmission power.

Feedback uses a subband size of F = 6, and theCQI/PMI/RI reporting delay is 5 msec. The scheduling delayis 4 msec. There are 8 users reporting PMI/CQI/RI. Users inSU-MIMO simulations can select rank 1 or 2. For MU-MIMO,users send only rank-1 feedback and a maximum of 4 userscan be scheduled. The CSI-RS power-boost (relative to dataREs) is 3 dB. The power ratio of a DMRS RE to a data REis 0 dB, Data-streams are given equal power. Decoding forSU-MIMO uses MMSE decoding. Decoding for MU-MIMOuses MRC.

For feedback we consider the 4-bit PMI Release 8 code-book, and larger 8 and 12 bit PMI codebooks. For the 8 bitPMI codebook βg = 0, βp2

= 2, and βp1= βp3

= 3. For the12 bit PMI codebook βg = βp1

= βp2= βp3

= 3. For theSVD-search ideal rank-1 or rank-2 feedback is also considered(such cases labeled “Ideal SVD”). Here the basestation hasideal knowledge for each user of the principle eigenvector(s)and corresponding CQI value(s). Throughput is the averagethroughput over the 10 MHz as supported by data REs(accounting for overheads in other types of REs) as describedin [11]. The throughputs shown are average values from 1000random initializations of channel states.

A. Ideal static channels: Flat fading and zero Doppler

Figure 3 shows the results for the ideal static channel caseof Rayleigh flat fading and zero Doppler. Here issues of fee-back/scheduling delay and CSI-RS and DMRS interpolationare either not present or can be eliminated under appropriateassumptions. Specifically, for the “Ideal SVD” cases, resultsare also provided assuming ideal CSI-RS and ideal DMRSCSI knowledge at all REs in order to remove CSI estimationand interpolation mismatch. These cases are additionally la-beled “Ideal CSI-RS/DMRS”. Finite PMI feedback cases forB = 4, 8, 12 use the search in Section II-B.

A first comparison to make between SU-MIMO to MU-MIMO is for the cases of ideal feedback and ideal CSI-RS/DMRS. These are labeled “Ideal SVD, Ideal CSI-RS/DMRS”. Here one can clearly see the multiplexing gainprovided by MU-MIMO over SU-MIMO given its ability to

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10

Average Received SNR per RX antenna [dB]

Tot

al th

roug

hput

[Mbp

s]

SU−MIMO: Rel. 8 4 bit PMISU−MIMO: 8 bit PMISU−MIMO: Ideal SVD, Ideal CSI−RS/DMRSMU−MIMO: Rel 8. 4 bit PMIMU−MIMO: 8 bit PMIMU−MIMO: 12 bit PMIMU−MIMO: Ideal SVDMU−MIMO: Ideal SVD, Ideal CSI−RS/DMRS

Fig. 3. Average user throughput for Rayleigh flat fading and zero Doppler.

schedule up to rank 4. However, for the MU-MIMO caselabeled “Ideal SVD” corresponding to ideal feedback but withpractical CSI-RS and DMRS estimation, there is a loss inperformance for MU-MIMO. In fact, and unlike SU-MIMOeven under finite PMI feedback, MU-MIMO saturates inperformance as it becomes inter-layer self-interference limited.In principle this should not happen [3], though the effect canbe much less apparent at lower PMI feedback rates whereCSI errors are dominated by limited PMI feedback. However,this behavior for higher (and perfect) feedback reflects in partmismatch in the CSI-RS interpolation and DMRS channelestimation, both tuned assuming a highly frequency selectivechannels [8, Appendix A].

For 8 and 12 bit PMI feedback, MU-MIMO eventuallybecomes self-interference limited with increased SNR. ForSNRs in the range 2 to 8 dB, MU-MIMO requires about8 bits of PMI feedback to match the performance of SU-MIMO with a 4-bit PMI feedback. Over the same SNR range,MU-MIMO requires about 12 bits of PMI feedback to matchthe performance of SU-MIMO with a ideal SVD feedback.Adjusting the channel estimation/interpolation, improved CQIprediction for MU-MIMO, and accounting for CSI error inLZFB via regularization, could help further improve MU-MIMO performance for a given PMI feedback rate [11].

B. PedA and TU Channels

Figure 4 shows the results for the Pedestrian A model witha Doppler 5.55 Hz. All cases shown consider practical CSI-RS and DMRS estimation. For MU-MIMO with 12 bit PMIfeedback both the PMI/RI/CQI search in Section II-B and theSVD search in Section V-C are considered. Here the benefitof increased PMI feedback for MU-MIMO is quite apparent,with MU-MIMO providing gains over SU-MIMO. Note, MU-MIMO is now also interference limited due to changes in

6964

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


Tot

al th

roug

hput

[Mbp

s]

SU−MIMO: Rel. 8 4 bit PMI, PMI/CQI searchSU−MIMO: 8 bit PMI, PMI/CQI searchMU−MIMO: Rel 8. 4 bit PMI, PMI/CQI searchMU−MIMO: 8 bit PMI, PMI/CQI searchMU−MIMO: 12 bit PMI, PMI/CQI searchMU−MIMO: 12 bit PMI, SVD searchMU−MIMO: Ideal SVD Feedback

Fig. 4. Average user throughput for Pedestrian A and Doppler 5.55 Hz.

the channel with Doppler given the feedback/scheduling delay(even for “Ideal CSI” cases not shown). However, at this lowDoppler such effects can be secondary to those created by thePMI quantization.

Finally, Figure 5 shows the results for the TU modelwith a Doppler 5.55 Hz. Here, as illustrated in Figure 2,channels are not coherent over the F = 6 RBs subband,and F = 6 subband-based feedback creates additional CSImismatch. The effect is more severe on MU-MIMO giveninter-layer interference. In fact, for such channels the use ofthe SVD search makes little sense, though we present it herefor completeness and comparison to PedA. As shown in [5],channels with high frequency selectivity can benefit from usinga smaller F .

VII. CONCLUSION

The paper discusses elements of the 3GPP LTE designrelevant in understanding and improving LTE MU-MIMOperformance. In particular it outlines the issue CSI mismatchand its dependence on CSI estimation and interpolation (forboth CSI-RS and DMRS), PMI feedback, and the time-frequency coherence of the channel. Link Level simulationsare provided which demonstrate the joint effect of all suchfactors on MU-MIMO performance in the context of the LTERelease 10 design.

The results show that MU-MIMO can be significantlyimproved by increased PMI feedback for channels with highercoherence in time and frequency. Such channel conditionsmay be representative of small cells, an important scenarioin enhancing future network capacity. A structured codebookis proposed that incorporates both gain and phase informationto support increased PMI feedback for deployments using less-correlated antennas.

There are still a number of other issues that merit furtherstudy. These include the issue of CSI-RS/DMRS estimation

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5


Tot

al th

roug

hput

[Mbp

s]

SU−MIMO: Rel. 8 4 bit PMI, PMI/CQI searchSU−MIMO: 8 bit PMI, PMI/CQI searchMU−MIMO: Rel 8. 4 bit PMI, PMI/CQI searchMU−MIMO: 8 bit PMI, PMI/CQI searchMU−MIMO: 12 bit PMI, PMI/CQI searchMU−MIMO: 12 bit PMI, SVD searchMU−MIMO: Ideal SVD Feedback

Fig. 5. Average user throughput for Typical Urban and Doppler 5.55 Hz.

(as shown by Fig. 3) and changing feedback granularity infrequency (as suggested by Figs. 2 and 5 and [5]). In fact,for slowly varying channels the DMRS mapping (includingdensity and power boost) can certainly be revised to betterbalance transmission resources. In addition, improved post-beamforming CQI prediction for MU-MIMO, and improvedMU-MIMO processing by regularized LZFB processing [11]can further increase MU-MIMO performance, providing moreevidence that MU-MIMO gains are possible in the context ofan LTE design.

REFERENCES

[1] F. Khan, LTE for 4G Mobile Broadband. Cambridge Univ. Press, 2009.[2] S. Sesia, I. Toufik, and M. Baker, LTE: The UMTS Long Term Evolution,

from theory to practice, 2nd ed. United Kingdom: Wiley, 20011.[3] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO

Achievable Rates With Downlink Training and Channel State Feedback,”IEEE Trans. on Inform. Theory, vol. 56, Jun. 2010.

[4] N. Ravindran and N. Jindal, “Multi-user diversity vs. accurate channelstate information in MIMO downlink channels,” arXiv:0907.1099.

[5] N. Jindal and S. A. Ramprashad, “Optimizing CSI feedback for MU-MIMO: Tradeoffs in channel correlation, user diversity and MU-MIMOefficiency,” VTC’2011 Spring, Budapest, Hungary, July 2011.

[6] S. A. Ramprashad, H. C. Papadopoulos, A. Benjebbour, Y. Kishiyama,N. Jindal, and G. Caire, “Cooperative cellular networks using Multi-User MIMO: Trade-offs, overheads, and interference control acrossarchitectures,” IEEE Comm. Mag., pp. 70–77, May 2011.

[7] 3GPP TS 36.211: Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation.

[8] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P.-O.Borjesson, “OFDM channel estimation by singular value decomposi-tion,” IEEE Trans. on Comm., vol. 46, no. 7, July 1998.

[9] M. Nakamura, Y. Awad, and S. Vadgama, “Adaptive control of linkadaptation for High Speed Downlink Packet Access (HSDPA) in W-CDMA,” Wireless Personal Multimedia Comm., Oct 2002.

[10] NTT DOCOMO, “Structured codebook design for 8 to 12 bit PMIfeedback for DL MU-MIMO enhancement,” 3GPP TSG RAN WG1#67, San Francisco, Nov 2011.

[11] 3GPP, R1-112434, NTT DOCOMO, “Capacity enhancement of DL MU-MIMO with increased PMI feedback bits for small-cells scenario,” 3GPPTSG RAN WG1 #66, Athens, Aug 2011.

6965

[IEEE ICC 2012 - 2012 IEEE International Conference on Communications - Ottawa, ON, Canada...

Documents

Transcript of [IEEE ICC 2012 - 2012 IEEE International Conference on Communications - Ottawa, ON, Canada...