LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid...
-
Upload
virgil-atkins -
Category
Documents
-
view
214 -
download
0
Transcript of LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid...
![Page 1: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/1.jpg)
LPC Speech Coder on the TI C6x DSP
Mark Anderson, Jeff Burke
EE213A / EE298-2Prof. Ingrid Verbauwhede
![Page 2: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/2.jpg)
Summary Implementation platform
Texas Instruments TMS320C6000 Low-quantity cost US $35 (‘C6211)
Architecture clock frequency 150 MHz (‘C6211)
Throughput 75-80 channels @ 8000 samples/sec
![Page 3: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/3.jpg)
Summary Total energy per sample
1.8 uJ/sample ‘Area’
1.2% of cycle budget per chan. per frame
8.5% of unified memory per channel 25% of unified memory for algorithm
![Page 4: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/4.jpg)
Summary Flexibility of implementation
High; programmable processor with C compiler, GUI debugger & simulator
SegSNR_A: ?
SegSNR_Q: 26 dB (voiced segments)
![Page 5: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/5.jpg)
Architecture overview 256-bit VLIW
Two “clustered” data paths Four functional units in each data path
16x16 multiply Two ALUs Data addressing unit
32-bit instruction for each functional unit
(256 bit “instruction” for 8 func. Units)
![Page 6: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/6.jpg)
Data path diagram
![Page 7: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/7.jpg)
Architecture overview Split register file
Only two cross-paths exists Cluster is limited to one source read
from opposite register file per cycle. Data types
8, 16, 32-bit with 40-bit accumulate 40-bit = register pair
![Page 8: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/8.jpg)
Memory architecture ‘C6211 (US$35) has a cache! 4kB L1 Instruction cache (L1P) 4kB L1 Data cache (L1D) 64kB L2 Unified memory and/or
cache Extra DMA channels
![Page 9: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/9.jpg)
Memory architecture
![Page 10: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/10.jpg)
Design Tools Command-line
Compiler, debugger, simulator Code Composer Studio
Same tools Windows NT GUI 30-day “evaluation” license Draconian copy protection, pulls out
the rug from under you
![Page 11: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/11.jpg)
Design Flow Consolidate Matlab reference into
a single function Matlab rewritten C-style Verified C-style Matlab C prototype created Imported into Code Composer,
optimized & simulated
![Page 12: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/12.jpg)
Fixed-point quantization Input samples
16-bit, normalized to [-1,1) <1.15> format used
Coefficient quantization Hamming window, pre-emphasis, FIR <1.15> format used No noticeable change in
characteristics
![Page 13: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/13.jpg)
Fixed-point quantization Most values 16 bit
Take advantage of 16x16 fast multipliers
Remain close to other class implementations
Add metric for overpowered LPC engine Use # of channels as performance
metric
![Page 14: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/14.jpg)
Fixed-point quantization Energy stored in <5.27>
Prevent overflow, provide precision for low energy segments
Temporary values stored in <10.30> Take advantage of extended precision
Modified autocorrelation used <16.0> All whole numbers
![Page 15: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/15.jpg)
Fixed-Point SNR Matlab simulation of magnitude
truncation Tools again.
SegSNR_A = ? SegSNR_Q = 26 dB
Voiced segments only Sent_female test data
![Page 16: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/16.jpg)
Performance results Initial version: 80,000 CPU cycles/frame Optimization
Take advantage of VLIW, pipelining observe assembly, modify C loops
Use TI’s DSP Library Assembly advantage without assembly
Optimized version: 30,182 cycles/frame Had to stop early, still at least 5K cycles
wasted
![Page 17: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/17.jpg)
Performance Then, the tool license expired. The tool would not install on other
machines. TI responded, but wasn’t too helpful. Moral #1: Avoid the evaluation
version. Moral #2: Give tools away to sell
hardware
![Page 18: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/18.jpg)
Cycle count details
Routine % Cycles/frame
Windowing, pre-emphasis 4.3 1285
Energy calc 0.8 254
Autocorrelation in Levinson-Durbin
8.0 2421
Autocorrelation in pitch detection
51 15334
Algorithm total 95 28561
Total w/ housekeeping 30182
![Page 19: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/19.jpg)
Additional optimizations Use more DSPLIB routines
Autocorrelation Assembly-level optimization
Code size reduction? Reduce number of buffers to reduce
L1D usage per frame
![Page 20: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/20.jpg)
Energy per sample ‘C6211 consumes 1.24W
75% high activity / 25% low activity 1.24W / 80 channels
= 15.5mW/channel 15.5 mJ/sec/channel * 1/8000
= 1.8 uJ / sample
![Page 21: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/21.jpg)
Number of channels
150 x 106 cycles/sec x 0.02 sec/frame= 3.0 x 106 cycles/frame
3.0 x 106 cycles/frame / 30,182 cycles= 99 channels
![Page 22: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/22.jpg)
Memory ‘C6211 Cache complicates
estimates Performance is 85-99% of optimal
for typical applications 30,182 cycles becomes
35,508 cycles/frame for 85% efficiency
=> now support only 86 channels
![Page 23: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/23.jpg)
Memory Try to account for off-chip memory
transfers ~220,000 cycles for 150ns fetches
for 80 channels
=> support 75-80 channels
Unable to verify/simulate because of unexpected tool expiration
![Page 24: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/24.jpg)
Memory L2 usage
~16kB Code size thanks to VLIW 512 32-byte instruction clusters More suited for ‘C6201 & larger processors
Remaining used by data for channels 480 bytes each (8.5% of remaining memory)
L1 usage L1P: Can’t tell because of cache L1D: 2.2kB (~56%)
![Page 25: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/25.jpg)
Tool comments Powerful, easy to use IDE… When it worked.
Licensing problems for eval version Debugging support a bit odd
puts/printf
![Page 26: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.](https://reader030.fdocuments.us/reader030/viewer/2022032600/56649db35503460f94aa34d0/html5/thumbnails/26.jpg)
C6x Conclusions Easily support 75-80 channels of
coding 26 dB fixed-point SNR, 16-bit types VLIW = Large code size Cache on a low-end DSP! Good tools,
but draconian copy protection