Handset architectures
description
Transcript of Handset architectures
RICE UNIVERSITY
Handset architectures
Sridhar Rajagopal
[email protected]://www.ece.rice.edu/~sridhar
ASICs Programmable
The support for this work in part by Nokia, TI and NSF is gratefully acknowledged
RICE UNIVERSITY
2G handsets
ro
ASIC forcompute-intensive
operations(spreading etc.)
DSP for most of the
baseband
microcontroller for higher layers
Evolving Cellular Handset Architectures but a Continuing, Insatiable Desire for DSP MIPs M. L. McMahan, TI Report SPRA650, March 2000
RICE UNIVERSITY
Proposed 3G handsets
DSP for the third generation wireless communications U. Ko, M. McMahan and E. Auslander, International Conference on Computer Design,1999 pp.516 –520Introduction to W-CDMA SoC design approach H. Chen, VIA Technologies, August 2002 www.itpilot.org.tw/provisional/910802/ INTRODUCTION%20TO%20WCDMA%20SOC%20.PDF
Increased number of co-processors as DSPs unable to do most of the baseband
TI VIA
RICE UNIVERSITY
Motivation
How does this scale?Do we need a DSP or should we build ASICs?
If ASICs, how to build better ASICs?If programmable, how to build better DSPs?If both, how do we mix them better?
Answers dependent on level of programmability needed area-time-power architecture tradeoffs
RICE UNIVERSITY
Rice innovations for ASICs and DSPs
ASICs:On-line arithmetic for dynamic truncation
Programmable:Scalable Wireless Application-specific Processors (SWAPs)
Mix and match :Hybrid SWAPs (H-SWAPs)
ASICs Programmable
RICE UNIVERSITY
Outline
On-line arithmetic for dynamic truncation
SWAPs
H-SWAPs
RICE UNIVERSITY
ASIC designs
Finite precision arithmeticFasterLow powerLow area
How to keep finite precision bounded:SaturationTruncation
RICE UNIVERSITY
Keeping precision bounded
Example of truncationMultiplication by in gradient descentSign detection
Example of saturationAvoiding overflowsWhen probability of useful MSBs are low
Tru n catio n(M SBs imp o rtan t)
Satu ratio n(LSBs imp o rtan t)
x xx x x xx xx xx xx xx x
RICE UNIVERSITY
Dynamic precision requirements
Precision needs change with algorithms, SNRAdapt hardware dynamically to save power25-35% power reduction possible
Dynamic saturation vs. dynamic truncationEasy as LSBs first – difficultNo error – significant errorThroughput benefits – no benefits
RICE UNIVERSITY
On-line arithmetic for dynamic truncation
Works Most Significant Digit First
Natural way of truncation
Digit-serial dynamic truncation
Redundant number system error only in LSD
Throughput benefits as digit-serial
RICE UNIVERSITY
Example for sign detection
ai * bi
Tree additionLevel 1
Tree additionResult
= constant = 3*
R R
Sign determined at this point. Stop!
(d) Dynamically truncated on-line arithmetic
R R
R R
R R
tOL-MF tOL
(2 MSDs)(c) Dynamically truncated on-line arithmetic
(without truncation error)
0 0 R R
R R
R R
R R
ai * bi
Tree additionLevel 1
Tree additionResult
deff*tOLtOL-MF
B
B B
B B B
B
B
B
B
B B
Sign determined at this point
Idle(PipelineBubbles)
(a) Truncated conventional arithmetic
Tree additionLevel 1
Tree additionResult
log(d)
ai * bi
tCONV-MF
(b) On-line arithmetic with full precision
0 0 0 R 0 0 0 Rai * bi
TreeadditionLevel 1
Tree additionResult d*tOLtOL-MF
R R
R R
R R
RICE UNIVERSITY
Throughput comparisons
0 5 10 15 20 25 30 3510
1
102
103
Input Precision (in bits)
Tim
e r
eq
uir
ed
(g
ate
de
lays
)
Truncated conventionalTruncated conventional with CSAOn-line (Full precision)Truncated On-line (2 MSDs)Truncated On-line (MSD)
RICE UNIVERSITY
Area comparisons
0 5 10 15 20 25 30 3510
2
103
104
105
106
Input Precision (in bits)
Are
a r
eq
uir
ed
(g
ate
s)
Truncated conventionalTruncated conventional with CSAOn-line (Full precision)Truncated On-line (2 MSDs)Truncated On-line (MSD)
RICE UNIVERSITY
ASIC design conclusion
Details : Predrag
Using on-line arithmetic for dynamic truncation and conventional arithmetic for dynamic saturation, one can design efficient ASICs for handsets.
RICE UNIVERSITY
Outline
On-line arithmetic for dynamic truncation
SWAPs
H-SWAPs
RICE UNIVERSITY
Programmable architectures
Current DSPsNot enough functional units (FUs)
Cannot extend to more FUsLimited Instruction Level Parallelism (ILP)Cannot support more registers (register area increases quadratically with FUs)Compilers: difficult to find ILP as FUs increase
RICE UNIVERSITY
Solution
Exploit data parallelism (DP)Lots available in wireless algorithms
Example:
for (i = 1: 1024)
{
a[i] = b[i] + c[i];
d[i] = b[i] * c[i];
} ILP
DP
RICE UNIVERSITY
DSP vs. SWAPs
+++***
InternalMemory
ILP
Internal Memory
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
…
ILP
DP
DSP(1 cluster)
SWAPs(max. clusters)
RICE UNIVERSITY
SWAPs trade-offs
Same internal memory size as DSPs Dependent on application, not architecture
Needs more area to support more functional unitsArea is not a constraint (power is)
Varying levels of DP in applicationsNeeds reconfiguration!!Need to turn off unused clusters
More parallelism lower clock frequency lower voltage
low power (CV2f + leakage) in spite of larger area
RICE UNIVERSITY
Example: Viterbi Decoding
Add-Compare-Select (ACS) : trellis interconnectRe-order for exploiting DP
Traceback – sequentialUse Register Exchange (RE)
Exploiting DP in programmable architecture implies:Re-order ACS Re-order RE
RICE UNIVERSITY
Re-ordering for parallel Viterbi
X(0)
X(2)
X(4)
X(6)
X(8)
X(10)
X(12)
X(14)
X(1)
X(3)
X(5)
X(7)
X(9)
X(11)
X(13)
X(15)
X(0)
X(1)
X(2)
X(3)
X(4)
X(5)
X(6)
X(7)
X(8)
X(9)
X(10)
X(11)
X(12)
X(13)
X(14)
X(15)
b. Shuffled Trellisa. Trellis
X(0)
X(1)
X(2)
X(3)
X(4)
X(5)
X(6)
X(7)
X(8)
X(9)
X(10)
X(11)
X(12)
X(13)
X(14)
X(15)
X(0)
X(1)
X(2)
X(3)
X(4)
X(5)
X(6)
X(7)
X(8)
X(9)
X(10)
X(11)
X(12)
X(13)
X(14)
X(15)
RICE UNIVERSITY
Viterbi reconfiguration
Packet 1Constraint length 7
(16 clusters)
Packet 2Constraint length 9
(64 clusters)
Packet 3Constraint length 5
(4 clusters)
DP Can be turned OFF
RICE UNIVERSITY
64-bit Packet 1Rate ½ Constraint Length 7
64-bit Packet 2Rate ½ Constraint Length 9
64-bit Packet 3Rate ½ Constraint Length 5
Kernels(Computation)
Memoryaccesses
RICE UNIVERSITY
Viterbi decoding: rate 1/2 at 128 Kbps = 10 MHz
100
101
102
100
101
102
103
Number of clusters
Frequency
needed t
o a
ttain
real-
tim
e (
in M
Hz) Actual K = 9
Actual K = 7
Actual K = 5
Regular codeReconfigurable code
RICE UNIVERSITY
Viterbi decoding: Comparisons
10
10
10
103
DSP C64x (w/o co-proc)
*VITURBO: A reconfigurable architecture for Viterbi and Turbo decoding, M. Vaya, J. R. Cavallaro, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2003, Hong Kong
128 KHz
(1 bit /cycle)
DSP (RE)
SWAP
FPGA
DP
Task PipeliningDedicated interconnect 10
010
110
2
0
1
2
Actual K = 9Actual K = 7Actual K = 5
Virtex II FPGA*
RICE UNIVERSITY
Salient features of this solution
Any constraint length 10 MHz at 128 Kbps
Same code for all constraint lengths no need to re-compile or load another codeas long as parallelism/cluster ratio is constant
Exploiting parallelism at 3 levels for real-time: Instruction Level Parallelism (DSP)Subword Parallelism (DSP)Data Parallelism (SWAP)
RICE UNIVERSITY
Problems
Suitable for handsets? - Not yet!
Still too general Not low power enough!!!
No special customization for the applicationExcept for a fixed-point architectureGeneric instruction setGeneric ALUs (though can be powered down)Generic inter-cluster communication network
RICE UNIVERSITY
Outline
On-line arithmetic for dynamic truncation
SWAPs
Hybrid SWAPs (H-SWAPs)
RICE UNIVERSITY
H-SWAPs
Trade Data Parallelism for Task Pipelining Customize each mini-SWAP
Internal Memory
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
+++***
…
DP
SWAPs(max. clusters
and reconfigure)
+++*
+++*
+++*
+++*
LimitedDP
Mini-SWAP(limit
clusters)
+++*
+++*
+++*
+++*
LimitedDP
++*
++*
++*
++*
LimitedDP
++++
++++
LimitedDP
H-SWAPs(collection of customized
mini-SWAPs)
RICE UNIVERSITY
Work in progress
How to trade-off task vs. data parallelism?
Power estimation for SWAPs (actual numbers)
Comparisons with ASIC solutions in terms of area-time-power
Evaluation of specialized inter-cluster communication
Specialized instructions (ACS) and arithmetic units (on-line)
I am looking for jobs!!!