DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, [email protected].
-
Upload
pierce-hardy -
Category
Documents
-
view
214 -
download
0
Transcript of DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, [email protected].
DSP Design in Wireless CommunicationLIANG LIU AND FREDRIK EDMAN, [email protected]
The Evolving Wireless Scene
Range
Data
Rat
e
1m 10m 100m 1km 10km
1Kb
10Kb
100Kb
1Mb
10Mb
100Mb
Cellular (WAN)
3G Cellular
2.5 G Cellular
802.11 (LAN)
802.1a
Bluetooth (PAN)
Sensor networks
More bit/sec
More bit/($·nJ)
Courtesy: Prof. Jan Rabaey, BWRC
Evolution of High-Speed Wireless
1995 2000 2005 2010 2015
10-2
100
102
104
Mb
it/s
2Mb/s802.11
11Mb/s802.11b
54Mb/s802.11ag
600Mb/s802.11n
3.39Gb/s802.11ac 7Gb/s
802.11ad
9.6Kb/sGSM
72Kb/sGPRS
474Kb/sEDGE
2Mb/sHSDPA
84Mb/sHSPA+
150Mb/sLTE
1Gb/sLTE-A
WLANCellular
5G?
CDMA OFDM MIMO MassiveMIMO
BandwidthModulation
MIMO
Same in other communication systems
Algorithms beats Moore beats Chemists
1
10
100
1000
10000
100000
1000000
10000000Algorithmic Complexity
Processor Performance (~Moore’s Law)
Battery Capacity1G
2G
3G
Courtesy: Ravi Subramanian (Morphics)
Design Considerations
Flexible
High speed
Reliable
DESIGN TRADE-OFFS!
Small area
Low power
Low price
Time to market
Digital Hardware Platforms
Efficiency
Fle
xibi
lity
ASIC
GPPDSP
GPU
FPGA
ASIP
Optimizing DSP Implementation
Functionality
HardwarePlatform
DesignSpecification
?
Optimization
Algorithm
Architecture
Module
Circuit
CDMA
CDMA: Code Division Multiple Access
“A UMTS Baseband Receiver Chip for Infrastructure Applications”, Texas Instruments, S. Sriram et al.
Accumulator
UMTS filter in receiver systeman
t
RF ADC
Inband Outband
Reconfig. UMTS filter
De-scramble&
despreaddemod
data
desired signal“Architectural Optimization for Low Power in a Reconfigurable UMTS Filter”, in Proceedings of Wireless Personal Multimedia Communications Symposium (WPMC), Deepak Dasalukunte et al. San Diego, USA
33dB
Adaptive UMTS Filter• minimum 33dB stop band attenuation (3GPP
specification)
• required filter length of 65 taps
In a Bad ChannelD D D D
...but ONLY 5 is needed in a Good one!
OFDM
OFDM: Orthogonal Frequency Division Multiplexing
N-p
oin
t ID
FT
Pa
ralle
l to
se
rial
kx ,0
kx ,1
kNx ,1
CP
ks ,0
ks ,1
kNs ,1
OFDM
FFT/IFFT in OFDM Systems
Large number of subcarriers a large FFT, O(Nlog2N)OFDM:
• DVB-2/4/8k FFT• WLAN IEEE802.11a/g-64 FFT (48+4 subcarriers)• LTE – Long Term Evolution: 2k FFT
X1(k)
X2(k) WNk
X1(k)+WNkX2(k)
-1X1(k)-WN
kX2(k)
• Folding• Pipeline• Parallel
+
FFT: VLSI
Buttfly FIFO Mult
Multi-path delay commutator
50% 50% 50%
Signle-path delay feedback
50% 100% 50%
FFT: VLSI
High-throughput
Feature number
Technology 0.13-μm
Area 1.44mm2
Throughput 1GS/s
Power 39.6mW@409MS/s
“A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735
FFT: Multi-path delay feedback
• Symmetry of twiddle factors
– Reduce number of twiddle factors by 87.5%
– Only 8 possibilities for 64-point FFT
FFT: Twiddle Factors
1
2
3
4
5
6
7
8
• Shift-add for constant multiplier
ROM
ControlMUX
In Out
FFT: Twiddle Factors
• Shift-add for constant multiplier (64-P FFT)
FFT: Twiddle Factors
“A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735
MIMO
TransmitAntennas
ReceiveAntennas
SISO
The Radio Channel
MISO
Single Input Single Output
Multiple Input Single Output(Transmit diversity)
ReceiveAntennas
TransmitAntennas
MIMO
The Radio Channel
SIMO
Single Input Multiple Output(Receive diversity)
Multiple Input Multiple Output(Multiple data streams)
Multiple Antenna System, MIMO
• At least as many receivers as transmitted streams
• Spatial separation at both transmit and receive antennas
• Improve transmission throughput or reliability
SISO
A
MISO
AA
Interference
RL
MIMO!
RL
R
Interference!
L
Understanding MIMO via Audio
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
r = Hs + nS/P
H =
h11 h12 …….. h1N
h21 h22 …….. h2N
hM1 hM2 …….. hMN
. . …….. .
N
M
hij models fading gain between the jth
transmit
and ith
receive antenna
s rTransmitted vector Received vector
MIMO System Model
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
high-complexitySignal processing
r = Hs + n
sS/P
• Recover the transmitted signal s from the received signal r, which contains interference and noise.
Interference
Cancellation!
RL
MIMO Signal Processing - Receiver
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
Receiver Signal Processing
Symbol Detection
Channel Estimation
Matrix Manipulation
r = Hs + n
HH-1^
s = H-1r^ ^S/P
MIMO Signal Processing - Receiver
• Channel estimation: obtain the channel status by training signals
• Matrix manipulation: matrix inversion or decomposition depending on detection algorithm
• Symbol detection: estimate transmitted signal s given channel matrix H and received signal r
• Linear Detection: zero-forcing detection
• Maximum Likelihood (ML) Detection: exhaustive search
• 1.6×107 points per vector detection for 64-QAM, 4×4 MIMO
𝒓=𝑯𝒔+𝒏
��𝒛𝒇=𝑯−𝟏𝒓=𝒔+𝑯−𝟏𝒏
��𝒎𝒍=𝒂𝒓𝒈 max𝒔∈¿𝑸∨¿𝑵𝒑 (𝒔∨𝒓 ,𝑯)¿
¿𝒂𝒓𝒈 min𝒔∈¿𝑸∨¿𝑵 ¿𝒓 −𝑯𝒔∨¿𝟐 ¿¿
MIMO Signal Detection
WLAN 802.11n Example
Modulation 256QAM; 4 Tx antennas; 108 sub-channels, 4ms per OFDM symbol
ML detection 1.159 x 1017 points/sec
Current DSP technology is 1G inst/sec 108 processors!
OR (“Moores Law” ..... processor capability doubles every 18 months)
MUST WAIT 40years!
Mike Faulkner 2005, Victoria Univ.
Today…
Intel i7 CPU: 1011 inst/sec
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
Receiver Signal Processing
ML Detection
Channel Estimation
Matrix Manipulation
H
s S/P
High Complexity ML Detection
ML Detection Sphere Detection
Simplified 2D-case
Limited search space a reduced complexity
Sphere Decoding: Algorithm level optimization
Sphere Decoding: complexity reduced
Chia-Hsiang Yang, University of California, Los Angeles, 2007
Detection Performance Computational Complexity
• Near optimal ML performance with significantly reduced computational complexity (# search points)
5 10 15 20 25 3010
-5
10-4
10-3
10-2
10-1
100
SNR (dB)
BE
R
ML
Tree-PruningTree-Pruning+Reording
5 10 15 20 25 3010
0
101
102
103
104
105
SNR (dB)
Av
era
ge
# o
f S
ea
rch
Po
ints
ML
Tree-PruningTree-Pruning+Reording
44 array16-QAM
Total # search points=65536
Near ML detection with 0.1% computational complexity
• QR-Decomposition:• R upper triangular matrix
𝑯=𝑸𝑹
-1 1
-1 1 -1 1
-1 1 -1 1 -1 1 -1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1
Pi
inci
Root
4th layer
3rd layer
2nd layer
leaf
11 12 13 141 1
2 22 23 24 2
3 333 34
4 444
0
0 0
0 0 0
R R R Ry s
y R R R s
y sR R
y sR
rQyRsys H with||||minarg 2ML
ˆ
-1 1
-1 1 -1 1
-1 1 -1 1 -1 1 -1 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1
inci
Root
4th layer
3rd layer
2nd layer
leaf
K-Best Detection, e.g., K=2
Sphere Decoding: tree-search
Design [1] Design [2] Design [3]
Gate Counter 50K 91K 491K
Throuput 136Mbps 269Mbps 1100Mbps
Sphere Decoding: VLSI
• QR-Decomposition:
– R upper triangular matrix, Q unitary matrix Q => QHQ = I
– Complexity in the order of N3 multiplications
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
Receiver Signal Processing
ML Detection
Channel Estimation
Matrix Manipulation
H
s S/P
QRD (TCAS1-11) MIMOD (ISSCC-09)
Gate Count 111K 114K
Throughput (SC/s) 12.5M 28.125M
Energy (nJ/SC/s) 12.76 5.37
QR Decomposition
Q4Q3Q2Q1H H Q1H Q2Q1H Q3Q2Q1H
Rotate by multiplying with Q1: Move all energy of first column into first element
Repeat until H becomes upper triangular.
QRD Methods
• Given’s Rotation– Works with small 2x2 rotations matrices
– Suitable for small MIMO systems
• Gram Schmidt– Operates on whole columns
– Stability problems in correlated channels
• Householder Transform– Column-wise operation
– Higher stability in correlated channels
• Householder transformation matrix
• H is an orthogonal matrix• Given vector we chose v such that
Householder Transform
Householder Transform: Example
Q4Q3Q2Q1H H Q1H Q2Q1H Q3Q2Q1H
Col 1op
Col2 op
Col3op
Col4op
H4 ... H3 ... H2 .... H1
Input matricesCol 1
opCol2 op
Col3op
Col4op
H4 ... H3 ... H2
Input matricesCol 1
opCol2 op
Col3op
Col4op
H4 ... H3
Input matricesCol 1
opCol2 op
Col3op
Col4op
H4
Input matricesCol 1
opCol2 op
Col3op
Col4op
Input matrices
Householder Transform: VLSI
• Systolic array– Suitable for pipelined data flow for example column wise matrix
operations
Anything we can do at the transmitter?
• If the receiver is not positioned directly between the speakers the received streams will be at different levels
• Equalizing at the receiver side?• Like ZF, noise enhancement
• Balance at the transmitter side• Require accurate channel information at the transmitter!
RL
L + NL, 0.5 R + NR
RL
L + NL, 2(0.5 R + NR)
2RL
L + NL, R + NR
Pre-code
MIMO Precoder: understanding via audio
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
r = Hs + n
S/P
• Zero-forcing pre-coder
𝒓=𝑯𝒙+𝒏=𝑯𝑯−𝟏 𝒔+𝒏
Tx
Tx
Tx
Tx
Data
Rx
Rx
Rx
Rx
r = Hx + n
S/P
pre
-co
din
g
xs
Low-ComplexityReceiver
[𝒓𝟏
𝒓𝟐
𝒓𝟑]=[𝟏 𝟎 𝟎
𝟎 𝟏 𝟎𝟎 𝟎 𝟏] [𝒔𝟏𝒔𝟐𝒔𝟑]+[𝒏𝟏
𝒏𝟐
𝒏𝟑]❑
Is this the best solution?
MIMO Precoder
The corresponding processor for MIMO?
Digital Signal Processors for MIMO
• Flynn’s Taxonomy– Single instruction stream, single data stream (SISD)– Single instruction stream, multiple data streams (SIMD)– Multiple instruction streams, single data stream (MISD)– Multiple instruction streams, multiple data streams (MIMD)
VLIW+SIMD
• VLIW (very long instruction word)– Instruction-level parallelism, e.g., streaming of data
• SIMD (single instruction multiple data)– Data-level parallelism, e.g., vectors
MIMO Processor Examples: Phillips EVP Vector Processor for LTE (2007)
MIMO Processor Examples: EIT Vector Processor for LTE-A (2013)
What is Next?
Goes to ‘LARGE’ Dimension?
Goes to ‘LARGE’ Dimension
• Further Scaling Up?• Size limitation in terminals• Power consumption in portable devices
Cellular Antenna WLAN Antenna
HSPA+ 2×2 802.11n 4×4
LTE 4×4 802.11ac 8×4
LTE-A 8×8 802.11? 16×16
How many antennas can we have?
2 in phone
16 in laptop [1]
24 in 80mm×80mm×80mm [2]
[1] J. B. Andersen, J. O. Nielsen, G. F. Pedersen, and G. Bauch, Channel characteristics of an indoor multiuser environment, COST 2100 TD(07) 009, 2007/Febr/26-28.[2] C-Y. Chiu, J-B. Yan, and R. D. Murch, 24-port and 36-port antenna cubes suitable for MIMO wireless communications, IEEE Trans. on Antennas and Propagation, vol. 56, no. 4, pp. 1170-1176, April 2008.
Massive MIMO or Very-Large MIMO
BS
...
A1
AM
UE1
UE2
UEK
• We think of very-large MIMO (multi-user) system• We mean M>>K>>1• We are looking for M>100 antennas!• We serve 10-20 users concurrently
F. Rusek, D. Persson, B. K. Lau, E.G. Larsson, T.L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and Challenges with Very Large Arrays”, IEEE Signal Processing Magazine, Jan. 2013
Several orders of magnitudemore transmitted-energy efficient and spectrum efficient!
Very-Large Antenna Array
Figure from Erik G. Larsson “Massive MIMO: Fundamentals, Opportunities and Challenges”
128 antenna array in Lund
Simplified processing block diagram1st ver.: LTE-like OFDM-based TRX
System architecture
System components - SDR
USRP RIO 2953R (Universal Software Radio Peripheral
• 2 RF chains
– Center frequency from 1.2 to 6 GHz
– 40MHz BW
• Xilinx Kintex-7 FPGA
– 410K logic cell
– 5Mb RAM
• ~800 MBps bidirectional data streaming
BS hardware setup: side-view
Up to 800 MB/s/direction
2.8 GB/sbidirectional
Time and Freq.synchonizationnetwork
PX
Ie-8
389
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
…
PX
Ie-8
374
PX
Ie-8
389
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
…
PX
Ie-8
374
PX
Ie-8
389
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
PX
Ie-8
374
…
PX
Ie-8
374
PX
Ie-8
389
PX
Ie-667
4T
PX
Ie-8
384
PX
Ie-8
384
PX
Ie-8
384
PX
Ie-7
976
PX
Ie-8
153
LuMaMi – the Lund University Massive MIMO testbed
• World first
• 100 coherent RF chains
• Flexible architecture based on NI platform and Xilinx FPGA
• Supports 10 simultaneous single antenna users in the same time-frequency resource block
• Real time operation in the 3.7 GHz band, 20 MHz bandwidth
• 300kg and 2.5kW...
Benefits by Scaling up M• “Simplified” signal processing
– Linear processing is powerful enough when M>>K» MMSE, Zero-Forcing and Matching Filter
104 multiplications for M=100 and K=10!𝐺𝑍𝐹 𝐺𝐻 (𝑮𝑮𝐻 )−1
Low-compleixty linear pre-coding
• Neumann series based ZF pre-coding– is diagonally dominante
– NS-approximation:
Low-compleixty linear pre-coding
Possible Hardware Platform• Design Challenge
– Massive parallel processing; High data traffic; High memory bandwidth
• Interconnection: Bottleneck for real massive parallelism
– Global interconnects can be millimeters long
– Routing becomes challenging
• 3D Integration
– Increase the number of “nearest neighbors” seen by each processor
– Power and delay reduction by short vertical interconnect
Conclusions• Digital signal processing is evolving at fast pace with new
wireless technologies and applications
• “Optimal” DSP implementation is crucial to bring new DSP algorithm into practice
• “Best” design achieves balanced trade-offs depending on application requirements
• Optimization at earlier stages and do cross-layer (or co-) optimization
• Keep tracking new technologies for both algorithm and implementation
Thanks