mpeg1 audio
-
Upload
avik-chakraborty -
Category
Documents
-
view
405 -
download
5
Transcript of mpeg1 audio
![Page 1: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/1.jpg)
Multimedia Signals andSystems
MP3 - Mpeg 1,2 layer 1,2,3Polyphase Filterbank
Kunio Takaya
Electrical and Computer Engineering
University of Saskatchewan
March 31, 2008
1
![Page 2: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/2.jpg)
“A review of algorithms for perceptual coding of digital audio
signals”
Painter, T. Spanias, A.
Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ;
http://ieeexplore.ieee.org/iel3/4961/13644/00628010.pdf?arnumber=628010
MP3’ Tech - Encoding engines source codes:
http://www.mp3-tech.org/programmer/encoding.html
“ECE-700 Filterbank Notes.”, Why Filterbanks? Sub-band
Processing:
Phil Schniter, Ohio State Univ. March 10, 2008. 1
http://www.ece.osu.edu/˜ schniter/ee700/handouts/filterbanks.pdf
** Go to full-screen mode now by hitting CTRL-L
2
![Page 3: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/3.jpg)
1 Polyphase Filter Bank
References
1. Phil Schneiter, ECE-700 Filterbank Notes
2. Davis Yen Pan, Digital Audio Compression
3. Davis Pan, A Tutorial on MPEG/Audio Compression
4. CD 11172-3 CODING OF MOVING PICTURES AND
ASSOCIATED AUDIO FOR DIGITAL STORAGE MEDIA AT
UP TO ABOUT 1.5 MBIT/s Part 3 AUDIO
5. Jong-Hwa Kim, Lossless Wideband Audio Compression:
Prediction and Transform, Ph.D. Thesis
3
![Page 4: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/4.jpg)
• In MPEG audio coding, a psychoacoustic model is used to
decide how much quantization error can be tolerated in each
sub-band, while signals below the hearing threshold of a
human listener is discarded.
• In the sub-bands that can tolerate more error, less bits are
used for coding. The quantized subband signals can then be
decoded and recombined to reconstruct (an approximate
version of) the input signal.
• Such processing allows, on average, a 12-to-1 reduction in bit
rate while still maintaining CD quality audio.
• The psychoacoustic model takes into account the spectral
masking phenomenon of the human ear, which says that high
energy in one spectral region will limit the ear’s ability to hear
details in nearby spectral regions. Therefore, when the energy
in one sub-band is high, nearby subbands can be coded with
4
![Page 5: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/5.jpg)
less bits without degrading the perceived quality of the audio
signal.
• The MPEG standard specifies a 32-channels of sub-band
filtering.
5
![Page 6: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/6.jpg)
1.1 Uniform Modulated Filterbank
Polyphase Filterbank
6
![Page 7: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/7.jpg)
Uniform Modulated Filterbank
• A modulated filterbank is composed of analysis branches which
1. modulate the input to center the desired sub-band at DC,
2. lowpass filter the modulated signal to isolate the desired
sub-band, and
3. downsample the lowpass signal.
• The synthesis branches interpolate the sub-band signals by
7
![Page 8: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/8.jpg)
upsampling and lowpass filtering, then modulate each
sub-band back to its original spectral location.
• In an M -branch critically-sampled uniformly-modulated
filterbank, the kth analysis branch extracts the sub-band signal
with center frequency ωk =2π
Mk via modulation and lowpass
filtering with a (one-sided) bandwidth ofπ
Mradians, and then
downsamples the result by factor M .
• The output from the uniform modulated filterbank is
time-domain data of a subband.
8
![Page 9: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/9.jpg)
1.2 Polyphase/DFT Implementation of Uniform
Modulated Filterbank
Uniform Modulated Filterbank
9
![Page 10: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/10.jpg)
The uniform modulated filterbank can be implemented using
polyphase filterbanks and DFTs, resulting in huge computational
savings. Fig. illustrates the equivalent polyphase/DFT structures
for analysis and synthesis.
• The impulse responses of the polyphase filters P`(z) and
P̄`(z)can be defined in the time domain as
p`[m] = p̄`[m] = h[mM + `], where h[n] denotes the impulse
responses of the lowpass filters.
• Recall that the standard implementation performs modulation,
filtering, and downsampling, in that order.
• The polyphase/DFT implementation reverses the order of
these operations; it performs downsampling, then filtering,
then modulation (if we interpret the DFT as a two-dimensional
bank of “modulators”).
10
![Page 11: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/11.jpg)
We derive the polyphase/DFT implementation by exchanging the
order of modulation, filtering, and downsampling.
11
![Page 12: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/12.jpg)
Reversing the modulation and filtering
We start by analyzing the kth filterbank branch, analyzed below.
The first step is to reverse the modulation and filtering operations.
To do this, we define a “modulated filter” Hk(z):
vk[n] =∑
i
h[i]x[n− i]ej 2πM k(n−i) (1)
=
(
∑
i
h[i]e−j2πM kix[n− i]
)
ej2πM kn (2)
=
(
∑
i
hk[i]x[n− i]
)
ej2πM kn (3)
12
![Page 13: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/13.jpg)
• where, hk[i] = h[i]e−j2πM ki is the impulse response of the
modulated filter. The equation above indicates that x[n] is
convolved with the modulated filter and that the filter output
is modulated.
• Now. consider the down sampler. The only modulator outputs
not discarded by the downsampler are those with time index
n = mM . For those outputs, the modulator has the value
ej2πM kmM = 1, and thus it can be ignored. The resulting system
is portrayed as shown in the bottom blockdaigram.
13
![Page 14: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/14.jpg)
14
![Page 15: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/15.jpg)
Reversing the order of filtering anddownsampling.
To apply the Noble identity, we must decompose Hk(z) into a bank
of upsampled polyphase filters. The process to derive polyphase
decimation is explained here:
Hk(z) =∞∑
n=−∞hk[n]z−n =
M−1∑
`=0
∞∑
m=−∞hk[mM + `]z−mM−`
Noting that the `th polyphase filter has impulse response,
hk[mM+`] = h[mM+`]e−j2πM (mM+`) = h[mM+`]e−j
2πM k` = p`[m]e−j
2πM k`
where p`[m] is the `th polyphase filter defined by the original
(unmodulated) lowpass filter H(z) by downsampling M : 1.
15
![Page 16: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/16.jpg)
We now obtain,
Hk(z) =M−1∑
`=0
∞∑
m=−∞p`[m]e−j
2πM k`z−mM−`
=M−1∑
`=0
e−j2πM k`z−`
∞∑
m=−∞p`[m](zM )−m
=M−1∑
`=0
e−j2πM k`z−`P`(z
M ). (4)
16
![Page 17: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/17.jpg)
Derived filterbank structure - downsampler after the polyphase
branches
17
![Page 18: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/18.jpg)
Derived filterbank structure - downsampler before the polyphase
branches
18
![Page 19: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/19.jpg)
• The kth filterbank branch (now containing M polyphase
branches) is illustrated. Because it is a linear operator, the
downsampler can be moved through the adders and the
(time-invariant) scalings e−j2πM k`. Finally, the Noble identity is
employed to exchange the filtering and downsampling.
• Observe that the polyphase outputs fv`[m], ` = 0 · · ·M − 1gare identical for each filterbank branch, while the scalings
fe−j 2πM k`, ` = 0 · · ·M − 1 g are different for each filterbank
branch since they depend on the filterbank branch index k.
19
![Page 20: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/20.jpg)
• Thus, we only need to calculate the polyphase outputs
fv`[m], ` = 0 · · ·M − 1g once. Using these outputs we can
compute the branch outputs via
yk[m] =M−1∑
`=0
v`[m]e−j2πM k` (5)
• From the previous equation it is clear that yk[m] corresponds
to the kth DFT output given the M-point input sequence
fv`[m], ` = 0 · · ·M − 1g. Thus the M filterbank branches can
be computed in parallel by taking an M-point DFT of the M
polyphase outputs as shown.
20
![Page 21: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/21.jpg)
Derived filterbank structure that incorpolates the DFT block
21
![Page 22: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/22.jpg)
1.3 Computational Savings of the Polyphase/DFT
Modulated Filterbank Implementation
Here we consider the analysis bank only; the synthesis bank can be
treated similarly.
standard structure Assume that the lowpass filter H(z) has
impulse response length N . To calculate the sub-band output
vector yk[m], k = 0, · · · ,M − 1 using the standard structure, we
have
1. N multiplications for filter Pi(z) plus one multiply for the
modulator
2. M branches of the filterbank
3. M values to calculate yk[m] for k
Thus, the total number of calculations is M2(N + 1).
22
![Page 23: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/23.jpg)
lowpass/downsampler If we implement the lowpass/downsampler
in each filterbank branch with a polyphase decimator, the
number of multiplications will be,
1. N multiplications for filter Pi(z) for each of M branches,
i.e. N ×M
2. M -point DFT requires M ×M multiplications
Thus, NM +M2 = (M +N)M .
23
![Page 24: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/24.jpg)
FFT If a radix-2 FFT algorithm is used to implement the DFT, we
have approximately,
1. Half size radix-2 FFT performsM
2log2M multiplications.
2. N multiplications for filter Pi(z) for each of M branches,
i.e. N ×M
Thus, the total number of calculations is (MN +M
2log2M).
24
![Page 25: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/25.jpg)
When M = 32 and N = 10, the standard filterbank structure
requires 328704 multiplications, the polyphase/DFT structure
performs 11264 multiplications, and the polyphase/FFT
implementation requires only 400 multiplications.
25
![Page 26: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/26.jpg)
2 The Analysis Subband Filter used by
MPEG-1 Layer-I and II
In MPEG-1 audio encoder, there are two main processing branches
in the block diagram. One branch is the analyzer of psychoacoustic
effects, and the other is the branch of subband analysis filter bank,
which produces the output from each subband (critical band)
frequency shifted to the baseband. Detailed steps of processing in
the branch of subband analysis filter bank is shown in the Figure
below. Corresponding codes in a MATLAB program
Matlab_MPEG_1_2_4.zip are listed in the following. A few lines
from the main program and all of the subroutine
Analysis_subband_filter.m are shown.
26
![Page 27: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/27.jpg)
Block diagram of MPEG1 Layer-II
27
![Page 28: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/28.jpg)
In the flow diagram shown in Fig. 2, the first block shows that a
block of 512 data points are taken into a FIFO (First In First Out)
buffer. The data in the FIFO are processed by a polyphase
filterbank. This FIFO buffer is updated everytime the subband
analysis is completedb by shifting in a set of 32 new data as
illustrated by Fig. ??.
The second block of Fig. 2 applies a low-pass filter function shown
in Fig. ?? to a frame of 512 point data to be sent to the subband
analysis by a polyphase filter bank. This low-pass filter is a band
limiting filter to suppress frequency aliasing. The pass band within
a subband (cut-off frequency) is set to befs64× 0.5824. This filter
function can be designed by the window method of FIR filter
design briefly explained in a section to follow. The designed filter
function is then multiplied by the Blackmann window (not the
Hanning window). The total length of 512 data is then divided into
28
![Page 29: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/29.jpg)
8 segments of 64 data. The alternating sign
f−,+,−,+,−,+,−,+g are attached each segment. This is to shift
the pass-band to the center of a subband.
In order to understand the insight of the processing details, we will
review the concepts of polyphase filter bank and the DCT in the
following sections.
29
![Page 30: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/30.jpg)
Flow Diagram of the MPEG-1 Audio Encoder Layer-I and Layer II
30
![Page 31: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/31.jpg)
Input data for the subband filterbank
31
![Page 32: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/32.jpg)
Window function applied to a frame of 512 point data
32
![Page 33: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/33.jpg)
% Load tables.
[TH, Map, LTq] = Table_absolute_threshold(1, fs, 128); % Threshold in quiet
CB = Table_critical_band_boundaries(1, fs);
C = Table_analysis_window;
% Analysis subband filtering [1, pp. 67].
for i = 0:11,
S = [S; Analysis_subband_filter(x, OFFSET + 32 * i, C)];
end
% -----------------------------------------------
function S = Analysis_subband_filter(Input, n, C)
Common;
nmax = length(Input);
% Check input parameters
if (n + 31 > nmax | n < 1)
error(’Unexpected analysis index.’);
end
% Build an input vector X of 512 elements. The most recent sample
% is at position 512 while the oldest element is at position 1.
% Padd with zeroes if the input signal does not exist.
% ...........................................................
% | 480 samples | 32 samples |
% n-480 n n+31
X = Input(max(1, n - 480):n + 31); % / 32768
33
![Page 34: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/34.jpg)
X = X(:);
X = [zeros(512 - length(X), 1); X];
% Window vector X by vector C. This produces the Z buffer.
Z = X .* C;
% Partial calculation: 64 Yi coefficients
Y = zeros(1, 64);
for i = 1 : 64,
for j = 0 : 7,
Y(i) = Y(i) + Z(i + 64 * j);
end
end
% Calculate the analysis filter bank coefficients
for i = 0 : 31,
for k = 0 : 63,
M(i + 1, k + 1) = cos((2 * i + 1) * (k - 16) * pi / 64);
end
end
% Calculate the 32 subband samples Si
S = zeros(1, 32);
for i = 1 : 32,
for k = 1 : 64,
S(i) = S(i) + M(i, k) * Y(k);
end
end
34
![Page 35: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/35.jpg)
3 Application of Psychoacoustic Principles:
ISO 11172-3 (MPEG-1)
PSYCHOACOUSTIC MODEL 1
• It is useful to consider an example of how the psychoacoustic
principles described thus far are applied in actual coding
algorithms. The ISO/IEC 11172-3 (MPEG-1, layer 1)
psychoacoustic model 1 determines the maximum allowable
quantization noise energy in each critical band such that
quantization noise remains inaudible.
• In one of its modes, the model uses a 512-point DFT for high
resolution spectral analysis (86.13 Hz), then estimates for each
input frame individual simultaneous masking thresholds due to
the presence of tone-like and noise-like maskers in the signal
spectrum. A global masking threshold is then estimated for a
35
![Page 36: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/36.jpg)
subset of the original 256 frequency bins by (power) additive
combination of the tonal and non-tonal individual masking
thresholds.
• This section describes the step-by-step model operations. The
five steps leading to computation of global masking thresholds
are as follows:
1. Spectral Analysis and SPL (Sound Pressure Level)
Normalization
2. Identification of Tonal and Noise Maskers
3. Decimation and Reorganization of Maskers
4. Calculation of Individual Masking Thresholds
5. Calculation of Global Masking Thresholds
36
![Page 37: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/37.jpg)
3.1 Spectral Analysis and SPL Normalization
First, incoming audio samples of b bit integer, s(n), are normalized
according to the FFT length, N , and the number of bits per
sample (signed integer), b, using the relation
x(n) =s(n)
N (2b−1)
Normalization references the power spectrum to a 0-dB maximum.
The normalized input, x(n), is then segmented into 12 ms frames
(512 samples) using a 1/16th overlapped Hann window such that
each frame contains 10.9 ms of new data. A power spectral density
(PSD) estimate, P (k), is then obtained using a 512-point FFT.
X(k) =N−1∑
n=0
x(n)e−j2πnkN
37
![Page 38: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/38.jpg)
X(k) =N−1∑
n=0
x(n)w(n)e−j2πnkN .
The Hanning window (Hann window) defined by
w(n) =1
2
[
1− cos
(
2πn
N
)]
is used to reduce the spectrum leakage from other frequencies to
the analysing frequency.
38
![Page 39: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/39.jpg)
Spectrum of
Rectangular (time) Window
39
![Page 40: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/40.jpg)
Spectrum of the Hanning Window
40
![Page 41: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/41.jpg)
A power spectral density (PSD) estimate, P (k), is then obtained
from X(k) computed by a 512-point FFT (Fast Fourier
Transform), a fast algorithm to compute DFT (Discrete Fourier
Transform). PSD resulting from 512 FFT has 256 spectral
components (harmonics).
P (k) = PN + 10 log10 jX(k)j2 for 0 ≤ k ≤ N
2
where the power normalization term, PN , is the reference sound
pressure level of 96 dB.
41
![Page 42: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/42.jpg)
Problem
Matlab MPEG 1 2 4.zip contains a MATLAB program that sim-
ulates all of MP3 spychoacoustic masking threshold calculations.
A subroutine FFT Analysis.m calculates Power Spectral Density
(PSD). Main program is Test MPEG.m. Apply this program to
a music piece in *.wav of your choice to see its PSD. Slide the
time window of 512 samples to find the first block so that no
zero padding is applied to the analysis. The PSD of “Eine Kleine
Nachtmusik” by Mozart is shown below. The key part of process-
ing in FFT Analysis.m is shown below.
42
![Page 43: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/43.jpg)
% Compute the auditory spectrum using the Fast Fourier Transform.
% The spectrum X is expressed in dB. The size of the transform si 512 and
% is centered on the 384 samples (12 samples per subband) used for the
% subband analysis. The first of the 384 samples is indexed by n:
% ................................................
% | | 384 samples | |
% n-64 n n+383 n+447
% A Hanning window applied before computing the FFT.
%
% Prepare the Hanning window
h = sqrt(8/3) * hanning(FFT_SIZE);
% Power density spectrum
X = max(20 * log10(abs(fft(s .* h)) / FFT_SIZE), MIN_POWER);
% Normalization to the reference sound pressure level of 96 dB
Delta = 96 - max(X);
X = X + Delta;
43
![Page 44: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/44.jpg)
PSD of “Eine Kleine Nachtmusik” by Mozart
44
![Page 45: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/45.jpg)
3.2 Identification of Tonal and Noise Maskers
After PSD estimation and SPL normalization, tonal and non-tonal
masking components are identified.
Tonal maskers
Local maxima in the sample PSD which exceed neighboring
components within a certain bark distance by at least 7 dB are
classified as tonal. Specifically, the tonal set, ST , is defined as
ST =
P (k) such thatP (k) > P (k ± 1)
P (k) > P (k ±∆k) + 7dB
45
![Page 46: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/46.jpg)
where,
∆k ∈
2 2 < k < 63 0.17-5.5 KHz
(2, 3) 63 ≤ k < 127 5.5-11 KHz
(2, · · · , 6) 127 ≤ k < 256 11-20 KHz
Tonal maskers, PTM (k), are computed from the spectral peaks
listed in ST as follows
PTM (k) = 10 log10
+1∑
j=−1
100.1P (k+j) dB
Noise maskers
A single noise masker for each critical band, PNM (k̄), is then
computed from (remaining) spectral lines not within the ±∆k
46
![Page 47: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/47.jpg)
neighborhood of a tonal masker using the sum,
PNM (k̄) = 10 log10
∑
j
100.1P (j) dB
for all P (j) not the member of PTM (k, k ± 1, k ±∆k)
where, k̄ =
u∏
j=l
j
1u−l+1
and l and u are the lower and upper
spectral line boundaries of the critical band, respectively.
47
![Page 48: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/48.jpg)
(1) local maxima
48
![Page 49: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/49.jpg)
(2) tonal components
49
![Page 50: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/50.jpg)
(3) tonal and non-tonal components of Eine Kleine Nachtmusik
50
![Page 51: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/51.jpg)
Problem
A subroutine Find tonal components.m contained in the
MP3 spychoacoustic masking simulation program Mat-
lab MPEG 1 2 4.zip first calculates the local maxima of
Power Spectral Density (PSD). From the obtained local maxima
of PSD, tonal components are calculated based on Equations
described above. Then, non-tonal components and the fre-
quencies of the critical band are calculated. Main program
is Test MPEG.m. Apply this program to a music piece in
*.wav chosen in the previous Problem to show the 3 figures
generated by Find tonal components.m, (1) local maxima, (2)
tonal components, and (3) tonal and non-tonal components.
51
![Page 52: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/52.jpg)
3.3 Decimation and Reorganization of Maskers
In this step, the number of maskers is reduced using two criteria.
First, any tonal or noise maskers below the absolute threshold are
discarded, i.e., only maskers which satisfy
PTM,NM (k) ≥ Tq(k)
are retained, where Tq(k) is the SPL of the threshold in quiet at
spectral line k. Next, a sliding 0.5 Bark-wide window is used to
replace any pair of maskers occurring within a distance of 0.5 Bark
by the stronger of the two.
After the sliding window procedure, masker frequency bins are
52
![Page 53: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/53.jpg)
reorganized according to the subsampling scheme,
PTM,NM (i) =
PTM,NM (k) if i = k
0 if i 6= k
The net effect is 2:1 decimation of masker bins in critical bands
18-22 and 4:1 decimation of masker bins in critical bands 22-25 ,
with no loss of masking components. This procedure reduces the
total number of tone and noise masker frequency bins under
consideration from 256 to 106. An example of decimation for the
equal SPL is shown in the table below.
53
![Page 54: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/54.jpg)
k i decimate
50 50 keep
51 52 zero
52 52 keep
100 100 keep
101 104 zero
102 104 zero
103 104 zero
104 104 keep
54
![Page 55: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/55.jpg)
Problem
A subroutine Decimation.m— contained in the MP3 spychoacous-
tic masking simulation program Matlab MPEG 1 2 4.zip does all
processes of decimination described in this sub-section. Apply
this program to a music piece in *.wav chosen in the previous
Problem to see if any of SPL’s are elimnated due to (1) any tonal
or noise maskers are below the absolute threshold, (2) any pair of
maskers occurring within a distance of 0.5 Bark is replaced by the
stronger of the two. (3) 2:1 decimation of masker bins in critical
bands 18-22 and 4:1 decimation of masker bins in critical bands
22-25.
55
![Page 56: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/56.jpg)
Tonal and non-tonal maskers after decimation. Only one non-tonal
masker SPL under the absolute threshold was eliminated.
56
![Page 57: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/57.jpg)
3.4 Calculation of Individual Masking Thresholds
Having obtained a decimated set of tonal and noise maskers,
individual tone and noise masking thresholds are computed next.
Each individual threshold represents a masking contribution at
frequency bin i due to the tone or noise masker located at bin j
(reorganized during step 3). Tonal masker thresholds, TTM (i, j),
are given by
TTM (i, j) = PTM (j)− 0.275z(j) + SF (i, j)− 6.025 dB
where PTM (j) denotes the SPL of the tonal masker in frequency
bin j, z(j) denotes the Bark frequency of bin j,
57
![Page 58: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/58.jpg)
and the spread of masking from masker bin j to maskee bin i,
SF (i, j), is modeled by the expression,
SF (i, j) =
17∆z − 0.4PTM (j) + 11 −3 ≤ ∆z < −1
(0.4PTM (j) + 6)∆z −1 ≤ ∆z < 0
−17∆z 0 ≤ ∆z < 1
(0.15PTM (j)− 17)∆z − 0.15PTM (j) 1 ≤ ∆z < 8
dB
58
![Page 59: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/59.jpg)
Prototype spreading functions at z=10 as a function of masker level
59
![Page 60: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/60.jpg)
SF (i, j) is a piecewise linear function of masker level, PTM (j), and
Bark maskee-masker separation, ∆z = z(i)− z(j). SF (i, j)
approximates the basilar spreading (excitation pattern) given. As
shown in the figure, the slope of TTM (i, j), decreases with
increasing masker level. This is a reflection of psychophysical test
results, which have demonstrated that the ear’s frequency
selectivity decreases as stimulus levels increase. It is also noted
here that the spread of masking in this particular model is
constrained to a 10-Bark neighborhood for computational
efficiency. This simplifying assumption is reasonable given the very
low masking levels which occur in the tails of the basilar excitation
patterns modeled by SF (i, j).
60
![Page 61: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/61.jpg)
Individual noise masker thresholds, TNM (i, j), are given by
TNM (i, j) = PNM (j)− 0.175z(j) + SF (i, j)− 2.025 dB
where TNM (i, j) denotes the SPL of the noise masker in frequency
bin j, z(j) denotes the Bark frequency of bin j, and SF (i, j) is
obtained by replacing PTM (j) with PNM (j).
Problem
A subroutine Individual masking thresholds.m contained in
the MP3 spychoacoustic masking simulation program Mat-
lab MPEG 1 2 4.zip calculates individaul masking thresholds of
tonal maskers TTM (i, j), and non-tonal maskers TNM (i, j) using
the spreading function SF (i, j). Apply this program to a music
piece in *.wav chosen in the previous Problem to plot the indivi-
daul masking thresholds of a frame.
61
![Page 62: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/62.jpg)
3.5 Calculation of Global Masking Thresholds
In this step, individual masking thresholds are combined to
estimate a global masking threshold for each frequency bin in the
subset given by Eq. 3.4. The model assumes that masking effects
are additive. The global masking threshold, Tg(i), is therefore
obtained by computing the sum,
Tg(i) = 10 log10
(
100.1Tq(i) +L∑
l=1
100.1TTM (i,l) +M∑
m=1
100.1TNM (i,m)
)
dB
where Tq(i) is the absolute hearing threshold for frequency bin i,
TTM (i, l) and TNM (i,m) are the individual masking thresholds,
and L and M are the number of tonal and noise maskers,
respectively, identified previously.
62
![Page 63: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/63.jpg)
In other words, the global threshold for each frequency bin
represents a signal dependent, power additive modification of the
absolute threshold due to the basilar spread of all tonal and noise
maskers in the signal power spectrum. The next Fig. shows global
masking threshold obtained by adding the power of the individual
tonal and noise maskers to the absolute threshold in quiet.
63
![Page 64: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/64.jpg)
Individaul masking thresholds for both tonal and non-tonal
maskers. The global masking threshold is the sum of all individual
masking thresholds.
64
![Page 65: mpeg1 audio](https://reader034.fdocuments.us/reader034/viewer/2022051323/549e2032ac79591a768b463f/html5/thumbnails/65.jpg)
4 End
Rµν −1
2Rδµν =
8πG
c4Tµν
Here Tµν is tensor of energy momentum.
black blue
red magenta
green cyan
yellow
65