Flexible wireless communication architectures
description
Transcript of Flexible wireless communication architectures
![Page 1: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/1.jpg)
RICE UNIVERSITY
Flexible wireless communication architectures
Sridhar Rajagopal
Department of Electrical and Computer EngineeringRice University, Houston TX
Faculty Candidate Seminar – Southern Methodist UniversityApril 23, 2003
This work has been supported in part by NSF, Nokia and Texas Instruments
![Page 2: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/2.jpg)
2RICE UNIVERSITY
Future wireless devices demand flexibility
Multiple algorithms and environments supported in same device
High data rate mobile devices with multimedia
Flexible algorithms: Multiple antennas, complex signal processing
Flexible architectures: High performance (Mbps), low power (mW)
Fast design with structured exploration
Bluetooth/Home Networks
Wireless Cellular
Wireless LAN
![Page 3: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/3.jpg)
3RICE UNIVERSITY
Flexibility needed in different layers
Physical Layer
MAC Layer
Network Layer
Application Layer Puppeteer project at Ricehttp://www.cs.rice.edu/CS/Systems/Puppeteer/
Analog RF
Flexible Algorithms
Mapping
Flexible Architectures
![Page 4: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/4.jpg)
4RICE UNIVERSITY
Research vision: Attain flexibility
Algorithms:Flexibility: support variety of sophisticated
algorithms
Architectures:Flexibility: adapts hardware to algorithms
Fast, structured design exploration
Design me
![Page 5: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/5.jpg)
5RICE UNIVERSITY
Contributions: Algorithms
Multi-user channel estimation:[Jnl. Of VLSI Sig. Proc.’02, ASAP’00] Matrix-inversions Numerical techniques
conjugate-gradient descent for complexity reduction
Multi-user detection: [ISCAS’01] Block-based computation to streaming computations
Pipelining, lower memory requirements
Parallel, fixed-point, streaming VLSI implementations [IEEE Trans. Wireless Comm.’02]
![Page 6: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/6.jpg)
6RICE UNIVERSITY
Contributions: Architectures
Heterogeneous DSP-FPGA system designs: [ICSPAT’00]
Computer arithmetic:[Symp. On Comp. Arith’01]Dynamic truncation in ASICs using on-line arithmeticwith Most Significant Digit First computation
[Ph.D. Thesis]
Scalable Wireless Application-specific Processors (SWAPs)
Rapid, structured architectures with flexibility-performance tradeoffs
![Page 7: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/7.jpg)
7RICE UNIVERSITY
Scalable Wireless Application-specific Processors
Family of flexible programmable processorsClusters of ALUsHigh performance by supporting 100’s of ALUsCan provide customization for various algorithmsAdapts (“swaps”) architecture dynamically for power
+
?
**
+
**
+
**
+
**
…? ? ?
Scale Clusters
ScaleALUs
![Page 8: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/8.jpg)
8RICE UNIVERSITY
Rapid, structured design for SWAPs
Low “complexity”, parallel, fixed point
algorithms
Architecture Exploration ASIC
designapply
DSPdesign
apply
SWAPs+?**
+
**
+
**
+
**
…? ? ?
![Page 9: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/9.jpg)
9RICE UNIVERSITY
Research vision summary
Provide a structured framework to rapidly explore:flexible, high performance, low power architectures
(SWAPs)
Efficient algorithm design for mapping to SWAPs
Understanding of algorithms, DSPs and ASICs used
Flexibility-performance trade-offs
Inter-disciplinary research:Wireless communications, VLSI Signal Processing, Computer
architecture, Computer arithmetic, Circuits, CAD, Compilers
![Page 10: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/10.jpg)
10RICE UNIVERSITY
Talk Outline
Research vision
SWAPs - Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 11: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/11.jpg)
11RICE UNIVERSITY
SWAPs borrow from DSPs
DSPs use : Instruction Level Parallelism (ILP) Subword Parallelism (MMX)
Not enough ALUs for GOPs of computation-- Need 100’s TI C6x has 8 ALUs
Why not more ALUs?Cannot support more registers (area,ports)Difficult to find ILP as ALUs increase
32
Register File
1 ALURF 4 16
![Page 12: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/12.jpg)
12RICE UNIVERSITY
SWAPs borrow from ASICs
Exploit data parallelism (DP)Available in many wireless algorithmsThis is what ASICs do!
int i,a[N],b[N],sum[N]; // 32 bitsshort int c[N],d[N],diff[N]; // 16 bits packed
for (i = 0; i< 1024; ++i)
{
sum[i] = a[i] + b[i];
diff[i] = c[i] - d[i];
}
ILP
DP
Subword
![Page 13: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/13.jpg)
13RICE UNIVERSITY
SWAPs borrow from stream processors
Kernel
Viterbidecoding
StreamInput Data Output Data
Correlator channelestimation
receivedsignal
Matchedfilter
InterferenceCancellation
Decoded bits
Kernels (computation) and streams (communication)
Use local data in clusters providing GOPs support
Imagine stream processor at Stanford [Rixner’01]
Scott Rixner. Stream Processor Architecture, Kluwer Academic Publishers: Boston, MA, 2001.
![Page 14: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/14.jpg)
14RICE UNIVERSITY
SWAPs are multi-cluster DSPs
+++***
InternalMemory
ILP
Memory: Stream Register File (SRF)
DSP(1 cluster)
+++***
+++***
+++***
+++***
…ILP
DP
SWAPsadapt clusters to DP
Identical clusters, same operations.Power-down unused FUs, clusters
![Page 15: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/15.jpg)
15RICE UNIVERSITY
Arithmetic clusters in SWAPs
Intercluster NetworkComm. Unit
Scratchpad (indexed accesses)
SRF
From/To SRF
Cross Point
Distributed Register Files(supports more ALUs)
+
+
+*
*/
+/
+
+
+*
*/
+
/
![Page 16: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/16.jpg)
16RICE UNIVERSITY
Talk Outline
Research vision
SWAPs Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 17: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/17.jpg)
17RICE UNIVERSITY
SWAPs: Physical layer algorithms
Antenna
Channel estimation
Detection DecodingHigher(MAC/
Network/OS)Layers
RF Front-end
Baseband processing
Complex signal processing algorithms with GOPs of computation
![Page 18: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/18.jpg)
18RICE UNIVERSITY
SWAP mapping example: Viterbi decoding
Multiple antenna systems (MIMO systems)Complexity exponential with transmit x receive antennas
Estimation: Linear MMSE, blind, conjugate gradient….
Detection: FFT, (blind) interference cancellation….
Decoding: Viterbi, Turbo, LDPC…. & joint schemes
SWAP flexibility lets you use the best algorithms for the situation
Example for concept demonstration: Viterbi decoding
![Page 19: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/19.jpg)
19RICE UNIVERSITY
Parallel Viterbi Decoding for SWAPs
Add-Compare-Select (ACS) : trellis interconnect : computationsParallelism depends on constraint length (#states)
Traceback: searchingConventional
• Sequential (No DP) with dynamic branching• Difficult to implement in parallel architecture
Use Register Exchange (RE) • parallel solution
ACS Unit
Traceback Unit
Detectedbits
Decodedbits
![Page 20: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/20.jpg)
20RICE UNIVERSITY
Parallel Viterbi needs re-ordering for SWAPs
Exploiting Viterbi DP in SWAPs:Use RE instead of regular traceback Re-order ACS, RE
X(0)
X(1)
X(2)X(3)
X(4)
X(5)
X(6)X(7)
X(8)
X(9)
X(10) X(11)
X(12)
X(13)
X(14) X(15)
X(0)
X(1)
X(2)X(3)
X(4)
X(5)
X(6)X(7)
X(8)
X(9)
X(10) X(11)
X(12)
X(13)
X(14) X(15)
X(0)
X(2)
X(4)X(6)
X(8)
X(10)
X(12)X(14)
X(1)
X(3)
X(5) X(7)
X(9)
X(11)
X(13) X(15)
X(0)
X(1)
X(2)X(3)
X(4)
X(5)
X(6)X(7)
X(8)
X(9)
X(10) X(11)
X(12)
X(13)
X(14) X(15)
DP
vector
Regular ACSACS in SWAPs
![Page 21: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/21.jpg)
21RICE UNIVERSITY
Talk Outline
Research vision
SWAP Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 22: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/22.jpg)
22RICE UNIVERSITY
SWAP architecture design
More clusters better than more ALUs/per cluster (if #clusters > 2)
1. Decide how many clusters Exploit DP
2. Decide what to put within each cluster Maximize ILP with high functional unit efficiency Search design space with “explore” tool
Time-power-area characterization
+?**
+
**
+
**
+
**
…ILP
DP
? ? ?
![Page 23: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/23.jpg)
23RICE UNIVERSITY
Design a SWAP cluster: “Explore”
Auto-exploration of adders and multipliers for “ACS"
1
2
3
4
5
1
2
3
4
5
40
60
80
100
120
140
160
(43,58)
(54,59)
(39,41)
(62,62)
(47,43)
#Multipliers
(40,32)
(70,59)
(65,45)
(49,33)
(39,27)
(80,34)
(73,41)
(61,33)
(48,26)
(39,22)
(50,22)
(85,24)
(76,33)
(60,26)
#Adders
(61,22)
(85,17)
(72,22)
(72,19)
(85,13)
(85,11)
Inst
ruct
ion c
ount
(Adder util%, Multiplier util%)
![Page 24: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/24.jpg)
24RICE UNIVERSITY
“Explore” tool benefits
Instruction count vs. ALU efficiencyWhat goes inside each cluster
Design customized application-specific unitsBetter performance with increased ALU utilization
Explore multiple algorithms turn off functional units not in use for given kernelVdd-gating, clock gating techniques
![Page 25: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/25.jpg)
25RICE UNIVERSITY
Example for SWAP architecture design
Explore Algorithm 1 : 3 adders, 3 multipliers, 32 clusters
Explore Algorithm 2 : 4 adders, 1 multiplier, 64 clusters
Explore Algorithm 3 : 2 adders, 2 multipliers, 64 clusters
Explore Algorithm 4 : 2 adders, 2 multipliers, 16 clusters
Chosen Architecture: 4 adders, 3 multipliers, 64 clusters
ILP
DP
![Page 26: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/26.jpg)
26RICE UNIVERSITY
SWAP flexibility provides power savings
Multiple algorithmsDifferent ALU, cluster requirements
Turning off ALUs ( –add –mul compiler options)Use the right #ALUs from “explore” tool
Turning off clustersData across SRF of all clustersCluster only has access to its own SRFNext kernel may need data from SRF of other
clustersReconfiguration support needs to be provided
![Page 27: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/27.jpg)
27RICE UNIVERSITY
SWAPs provide cluster reconfiguration
SRF
Clusters
Mux-DemuxNetwork
WithStreambuffers
M D X 2 M D X 2
M D X 1
LA T C H LA T C H LA T C H LA T C H
Additional latency (few cycles) due to microcontroller stalls
- Minimal loss in performance
![Page 28: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/28.jpg)
28RICE UNIVERSITY
Cluster reconfiguration for Viterbi
Packet 1Constraint length 7
(16 clusters)
Packet 2Constraint length 9
(64 clusters)
Packet 3Constraint length 5
(4 clusters)
DP Can be turned OFF
![Page 29: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/29.jpg)
29RICE UNIVERSITY
64-bit Rate ½
Packet 1K = 7
Packet 2K = 9
Packet 3K = 5
Kernels(Computation)
No Data Memoryaccesses
Execu
tion T
ime
(cycl
es)
Clusters Memory
SWAPs provide flexibility at negligible overhead
![Page 30: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/30.jpg)
30RICE UNIVERSITY
SWAP exploration for Viterbi decoding
1 10 1001
10
100
1000
Number of clusters
Fre
qu
en
cy n
eed
ed
to a
ttain
real-
tim
e (
in M
Hz)
K = 9K = 7 K = 5Different SWAPs
(Without reconfiguration)Same SWAP
(With reconfiguration)
DSP
Ideal C64x (w/o co-proc) needs ~200 MHz for real-time
Max DP
![Page 31: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/31.jpg)
31RICE UNIVERSITY
SWAPs : Salient features
1-2 orders of magnitude better than a DSP
Any constraint length 10 MHz at 128 Kbps
Same code for all constraint lengths no need to re-compile or load another codeas long as parallelism/cluster ratio is constant
Power savings due to dynamic cluster scaling
![Page 32: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/32.jpg)
32RICE UNIVERSITY
Expected SWAP power consumption
Power model based on [Khailany’03] 64 clusters and 1 multiplier per cluster:
0.13 micron, 1.2 V Peak Active Power: ~9 mW at 1 MHz (DSP ~1 mW) Area: ~53.7 mm2
10 MHz, 128 Kbps with reconfiguration
Exploring the VLSI Scalability of Stream Processors, Brucek Khailany et al, Proceedings of the Ninth Symposium on High Performance Computer Architecture, February 8-12, 2003
0 10 20 30 40 50 60 700102030405060708090
Active Clusters (max 64)P
ow
er (
in m
W)
Viterbi Clusters Used
Peak Power
K = 9 64 ~90 mW
K = 7 16 ~28.57 mWK = 5 4 ~13.8 mW
overhead 0 ~8.1 mW
DSP, K = 9 1 ~200 mW
![Page 33: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/33.jpg)
33RICE UNIVERSITY
Multiuser Estimation-Detection+Decoding
Real-time target : 128 Kbps per user
1 10 10010
100
1000
10000
100000
Number of clusters
Fre
qu
en
cy
ne
ed
ed
to
att
ain
re
al-
tim
e (
in M
Hz)
FASTMEDIUMSLOW
32-user base-station
Mobile
DSP
Ideal C64x (w/o co-proc) needs ~15 GHz for real-time
Fading scenarios
![Page 34: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/34.jpg)
34RICE UNIVERSITY
Expected SWAP power : base-station
32 user base-station with 3 X’s per cluster and 64 clusters: 0.13 micron, 1.2 V Peak Active Power: ~18.19 mW for 1 MHz (increased
X) Area: ~93.4 mm2
Total Peak Base-station power consumption:~18.19 W at 1 GHz for 32 users at 128 Kbps/user
![Page 35: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/35.jpg)
35RICE UNIVERSITY
Talk Outline
Research vision
SWAP Background
Algorithm design for SWAPs
Architecture design for SWAPs
Current and Future Research Goals
![Page 36: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/36.jpg)
36RICE UNIVERSITY
Current research: Flexibility vs. performance
SWAPs: 128 Kbps at ~10-100 mW for ViterbiBorrow DP from ASICs!
suitable for base-stationsFlexibility more important than power
suitable for mobile devicesPower constraints tightercan be customized for further power savings
Handset SWAPs (H-SWAPs) Borrow Task pipelining from ASICs!Application-specific units and specialized comm.
network
![Page 37: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/37.jpg)
37RICE UNIVERSITY
Handset SWAPs: H-SWAPs
Trade Data Parallelism for Task Pipelining
SRF
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
…
DP
SWAPs(max. clusters
and reconfigure)
+++*
+++*
+++*
+++*
LimitedDP
SWAPlet(limit
clusters)
+++*
+++*
+++*
+++*
LimitedDP
++*
++*
++*
++*
LimitedDP
++++
++++
LimitedDP
H-SWAPs(collection of customized
SWAPlets)
![Page 38: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/38.jpg)
38RICE UNIVERSITY
Sample points in architecture exploration
DSPs(1 cluster)
ILPSubword
ILPSubword
DP
SWAPs(multiple)
H-SWAPs(optimized for handsets)
ILPSubword
DP Task PipeliningCustom ALUs
Programmable solutions with increased customization
Performance, Power benefits(with decreasing flexibility)
![Page 39: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/39.jpg)
39RICE UNIVERSITY
Future: Efficient algorithms and mapping
MultipathC hannel
EqualizerMRC Decoder
DetectorDemodulator
Non-C oherent
STC
Beam-forming
C oherentSTC
C hannelEstimator
C hannel
Turbo Equalizer
Multiple antenna systems with 1-2 orders-of-magnitude higher complexity
![Page 40: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/40.jpg)
40RICE UNIVERSITY
Future research: Architectures
Generalized and structured framework and tools Joint algorithm-architecture explorationArea-time-power-flexibility tradeoffs
Potential applications: embedded systems Image and Video processing:
Cameras : variety of compression algorithms
Biomedical applications: Hearing aids: DSP running on body heat*
Sensor networksCompression of data before transmission
*Quote: Gene Frantz, TI Fellow
![Page 41: Flexible wireless communication architectures](https://reader036.fdocuments.us/reader036/viewer/2022081519/56813b04550346895da3a4c4/html5/thumbnails/41.jpg)
41RICE UNIVERSITY
SWAPs: Flexibility, Performance, Power
Need flexibility in future wireless devicesAlgorithms and Architectures
Rapid Exploration for Scalable, Wireless Application-specific ProcessorsStructured approach with flexibility-performance trade-offs
SWAPs - flexibility, high performance and low powerExploit data parallelism like ASICs1-2 orders better performance than DSPsTurn off unused clusters and unused ALUs for low power