1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1
Audio Compression Techniques
MUMT 611, January 2005Assignment 2Paul Kolesnik
2
Introduction
Digital Audio Compression Removal of redundant or otherwise irrelevant
information from audio signal Audio compression algorithms are often referred to as
“audio encoders” Applications
Reduces required storage space Reduces required transmission bandwidth
3
Audio Compression
Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate
Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data
compression / encoding
4
Audio Data Compression
Redundant information Implicit in the remaining informationEx. oversampled audio signal
Irrelevant informationPerceptually insignificantCannot be recovered from remaining
information
5
Audio Data Compression
Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect
reconstruction Lossy Audio Encoding
Removes irrelevant dataResulting signal is similar to original
6
Audio Data Compression
Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal
tract model to compress signalsAudio Compression does not use this
technique due to larger variety of possible signal variations
7
Generic Audio Encoder
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
8
Generic Audio Encoder
Psychoacoustic ModelPsychoacoustics – study of how sounds are
perceived by humansUses perceptual coding
eliminate information from audio signal that is inaudible to the ear
Detects conditions under which different audio signal components mask each other
9
Psychoacoustic Model
Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking
Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain
10
Signal Masking
Threshold cut-off Hearing threshold level
– a function of frequency
Any frequency components below the threshold will not be perceived by human ear
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
11
Signal Masking
Spectral Masking A frequency
component can be partly or fully masked by another component that is close to it in frequency
This shifts the hearing threshold
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
12
Signal Masking
Temporal Masking A quieter sound can
be masked by a louder sound if they are temporally close
Sounds that occur both (shortly) before and after volume increase can be masked
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
13
Spectral Analysis
Tasks of Spectral AnalysisTo derive masking thresholds to determine
which signal components can be eliminatedTo generate a representation of the signal to
which masking thresholds can be applied Spectral Analysis is done through
transforms or filter banks
14
Spectral Analysis
TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT
15
Spectral Analysis
Filter BanksTime sample blocks are passed through a set
of bandpass filtersMasking thresholds are applied to resulting
frequency subband signalsPoly-phase and wavelet banks are most
popular filter structures
16
Filter Bank Structures
Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths
of which are equal over the entire frequency range
The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)
17
Filter Bank Structures
Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for higher frequencies)
This allows for better time resolution (ex. short attacks), but at expense of frequency resolution
18
Noise Allocation
System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be
transmitted Any noise below the threshold is irrelevant
Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for
each frequency component to keep noise under the threshold - this is known as noise allocation
19
Noise Allocation
Pre-echo In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding
This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case
This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)
20
Pre-echo Effect
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
21
Additional Encoding Techniques
Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding
22
Additional Encoding Techniques
Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based
on previous sample values Transmits/stores the difference between the expected
and received value Generates an estimate for the next sample and then
adjusts it by the difference stored for the current sample
Used for additional compression in MPEG2 AAC
23
Additional Encoding Techniques
Coupling / Delta encoding Used in cases where audio signal consists of two or
more channels (stereo or surround sound) Similarities between channels are used for
compression A sum and difference between two channels are
derived; difference is usually some value close to zero and therefore requires less space to encode
This is a case of lossless encoding process
24
Additional Encoding Techniques
Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal
is represented by a simpler symbol, and its value is stored in a look-up table
Implemented using a look-up tables in encoder and in decoder
Provides substantial lossless compression, but requires high computational power and therefore is not very popular
Used by MPEG1 and MPEG2 AAC
25
Encoding - Final Stages
Audio data packed into frames Frames stored or transmitted
26
Conclusion
HTML Bibliographyhttp://www.music.mcgill.ca/~pkoles
Questions