Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis-subband approachT.-H. Tsai and Y.-C. YangDepartment of Electrical Engineering and National Central University, Taiwan ROCIEE Proceedings on Computers and Digital Techniques
AbstractAn optimized approach to MPEG layer-3(MP3) audio decoding is presented, with the main theme focused on the synthesis subband. Since the synthesis subband is the most power-consuming component in decoding, a cost-effective architecture is proposed based on a system-design consideration. By means of an algorithm and architecture, the synthesis subband archives a high throughput with reduced memory requirements and hardware complexity. With a two-stage pipeline architecture, it allows 100% hardware utilization and is suitable for low-power implementation. In addition, the chip design in a 0.35um process is also accomplished. It occupies a die area of about 2.7 3.2 mm2 with a transistor count of 157,469 and a low-power dissipation of only 2.92mW
Whats the problemMPEG layer-3(MP3) coding has been widely applied to current digital audio broadcasting and multimedia applicationA cost-effective and low-power implementation will largely reduce the hardware and computation complexityFrom the MP3 decoder point of view, the computational load depends on the realization of a synthesis subband
OutlineIntroduction of synthesis subbandImplementation considerations and analysisProposed method and architectureResults and comparisonConclusion
Introduction(1)Elementary concept of MP3Multirate subband-based coding techniquesIn the encoder, it performs analysis subband filtering with 32 equally spaced filterbanks based on a psychoacoustical modelIn the decoder, it performs synthesis subband filteringMost fast algorithms techniques interpret synthesis subband filtering as a modified discrete cosine transform (MDCT) with some additional windowing operations
Introduction(2)One of the popular methodTranslate DCT into a FFT kernelAdvantageBecause of FFT equations specific symmetric and recursive property, we can reduce the number of multiplications and additionsDisadvantagethese methods have complex control and irregular data flow which will introduce a high hardware costThe proposed designreduced memory requirements and hardware complexityHigh efficiency with 100% hardware utilization using a two-stage pipeline architecture
Introduction(3)MP3 decoding flowHybrid filter bank divided into inverse modified discrete cosine transfer with dynamic windowing and overlap (DWIMDCT), and the synthesis subband filterbank
Start
Get bit streamFind Header
Decode Side information
Decode Scale factors
Decode Huffman data
Requantize Spectrum
Reorder Spectrum
Joint Stereo Processing(if necessary)
Alias reduction
IMDCT and windowing
Sub-band synthesis
Output PCM samples
Introduction(4)Synthesis-subband decoding flow
Implementation analysisDesign targetDelivering the required high performance at the minimum cost and the smallest silicon areaThe performance is determined by real-time constraints
Implementation analysis (cont.)MOPS = Fs C NFsSample frequencyCTotal number of numerical calculations per sampleNnumber of audio channel
Implementation considerationIn synthesis subband, IMDCT can be broken into an FFT, a data shift, preprocessing and post-processingThree considerationsThe initial transformer, the real-number computation is also translated into the complex number computationData shift, preprocessing and post-processing still contain complex multiplicationsFFT algorithms always need many multipliers, and the butterfly recursive process leads to some complex interconnection and routing
Proposed methodNormal IMDCTProposed IMDCT
Require about amount of multiplier-accumulate computationsRequired size for the ram buffer can be reduced to only 512 words per channel( amount of original)
Architecture IMDCTIPQMF
Architecture (cont.)Pipeline architecture
Memory configuration (1)
Memory configuration (2)Data conflicts in IMDCT and IPQMF
Memory configuration (3)Memory data access with pipeline operation
Results and comparison (1)
Results and comparison (2)
Results and comparison (3)
ConclusionBy means of novel algorithm and architecture, the synthesis subband has a better performanceIt also archives a high throughput, with a low-cost memory requirement and hardware complexity
Sub-band samples(32 subband x 18 samples)
0 1 2 16 17
01...3031
IMDCT
0 1 2 62 63
031
3263
6495
96127
128159
160191
16 x 64-bitFIFO= 1024 samples
896927
928959
960991
9921023
0
1
2
14
15
031
031
3263
3263
6495
6495
480511
480511
U vector
D window
x
x
x
x
031
031
031
031
w0
w1
w2
w15
+
+
+
+
=Sum(w0 ~w15)
Top Related