AI CSC361: Problem Solving & Search 1 Problem Solving by Searching CSC361.
Sound Processing CSC361/661 Digital Media Spring 2002.
-
Upload
nora-oliver -
Category
Documents
-
view
216 -
download
2
Transcript of Sound Processing CSC361/661 Digital Media Spring 2002.
Sound Processing
CSC361/661
Digital Media
Spring 2002
How Sound Is Produced
Air vibration Molecules in air are disturbed, one bumping
against another An area of high pressure moves through the air
in a wave Thus a wave representing the changing air
pressure can be used to represent sound
How Sound Perceived
The cochlea, an organ in our inner ears, detects sound. The cochlea is joined to the eardrum by three tiny bones. It consists of a spiral of tissue filled with liquid and thousands of
tiny hairs. The hairs get smaller as you move down into the cochlea. Each hair is connected to a nerve which feeds into the auditory
nerve bundle going to the brain. The longer hairs resonate with lower frequency sounds, and the
shorter hairs with higher frequencies. Thus the cochlea serves to transform the air pressure signal
experienced by the ear drum into frequency information which can be interpreted by the brain as sound.
Pulse Code Modulation
PCM is the most common type of digital audio recording.
A microphone converts a varying air pressure (sound waves) into a varying voltage.
Then an analog-to-digital converter samples the voltage at regular intervals.
Each sampled voltage gets converted into an integer of a fixed number of bits.
Digitization of Sound
Sampling– Most humans can’t hear anything over 20 kHz.– The sampling rate must be more than twice the highest
frequency component of the sound (Nyquist Theorem).– CD quality is sampled at 44.1 kHz.– Frequencies over 22.01 kHz are filtered out before sampling is
done. Quantization
– Telephone quality sound uses 8 bit samples.– CD quality sound uses 16 bit samples (65,536 quantization
levels) on two channels for stereo.
Encoder Design
A – B. Apply bandlimiting filter to remove highfrequency components.
C. Sample at regular time intervals.
D. Quantize each sample.
Sampling Error (Undersampling)
If you undersample, one frequency will alias as another.
For CD quality, frequencies above 22.05 kHz are filtered out, and then the sound is sampled at 44.1 kHz.
This is depicted on the next slide. Figure from Multimedia Communications
by Fred Halsall, Addison-Wesley, 2001.
Quantization Interval
If Vmax is the maximum positive and negative signal amplitude and n is the number of binary bits used, then the magnitude of the quantization interval, q, is defined as follows:
For example, what if we have 8 bits and the values range from –1000 to +1000?
n
Vq
2
2 max
Quantization Error (Noise)
Any values within a quantization interval will be represented by the same binary value.
Each code word corresponds to a nominal amplitude value that is at the center of the corresponding quantization interval.
The actual signal may differ from the code word by up to plus or minus q/2, where q is the size of the quantization interval.
QuantizationIntervals andResultingError
Results of Insufficient Quantization Levels
Insufficient quantization levels result from not using enough bits to represent each sample.
Insufficient quantization levels force you to represent more than one sound with the same value. This introduces quantization noise.
Dithering can improve the quality of a digital file with a small sample size (relatively few quantization levels).
Linear Vs. Non-Linear Quantization
In linear quantization, each code word represents a quantization interval of equal length.
In non-linear quantization, you use more digits to represent samples at some levels, and less for samples at other levels.
For sound, it is more important to have a finer-grained representation (i.e., more bits) for low amplitude signals than for high because low amplitude signals are more sensitive to noise. Thus, non-linear quantization is used.
Sound Editing
See Tutorial for– Choosing sampling rate and bit depth – Recording sound
See Studio Plugin Overview for information about multi-track recording
See Noise Reduction Overview for information about noise reduction
Fourier Analysis
Fourier Transform
It is possible to take any periodic function of time x(t) and resolve it into an equivalent infinite summation of sine waves and cosine waves with frequencies that start at 0 and increase in integer multiples of a base frequency = 1/T, where T is the period of x(t).
Mathematically, we can say the same thing with this equation:
This equation does NOT tell how to compute the Fourier transform, that is, how we get the coefficients a1…a and b1…b.
))2sin()2cos()( 001
0tkfbtkfaatx k
kk
Discrete Fourier Transform
We can’t do an infinite summation on a computer. For digitally sampled input we can do the summation
using the same number of frequency samples as there are time input samples.
We can pretend that x(t) is periodic and that the period is the same length as the recording (or sound segment).
The base frequency will be 1/length of recording (or sound segment).
Difference Between Discrete Fourier Transform and Discrete Cosine Transform
The discrete cosine transform uses real numbers. This is all you need for image representation.
The Fourier Transform uses complex numbers, which have a real and an imaginary part.
For an N X N pixel image
the DCT is an array of coefficients
where
N
vy
N
uxpCC
NDCT
N
y xy
N
xvuuv 2
)12(cos
2
)12(cos
2
1 1
0
1
0
where
otherwiseCC
vuforCC
vu
vu
1
0,2
1
NvNupuv 0,0, NvNuDCTuv 0,0,
Recall the definition of the Discrete Cosine Transform
This tells how to compute theDiscrete Cosine Transform.
Versions of the Fourier Transform
Fourier Transform -- infinite summation Discrete Fourier Transformation -- a sum of n waves
derived from n samples; O(n2) complexity Fast Fourier Transform -- a fast version of the Fourier
transform, O(n* log2n) complexity; a disadvantage is that it requires a windowing function
See http://www.dataq.com/applicat/articles/an11.htm, http://www.dataq.com/applicat/articles/an11.htm, and http://www.chipcenter.com/eexpert/bmasta/bmasta001.html
Windowing Functions
Minimizes the effect of phase discontinuities at the borders of segments.
Hanning, Hamming, Blackman, and Blackman-Harris are often used.
Fourier Analysis in CoolEdit
Can be used to filter certain frequencies. The window size and function are adjustable Go to Transform/Filters/FFT to filter
frequencies. Go to Analyze/Frequency Analysis to see an
analysis of the frequency.