A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by:...

41
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004

Transcript of A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by:...

Page 1: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

A Tutorial on MPEG/Audio Compression

Davis Pan, IEEE Multimedia Journal, Summer 1995

Presented by:Randeep Singh GakhalCMPT 820, Spring 2004

Page 2: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Outline Introduction Technical Overview Polyphase Filter Bank Psychoacoustic Model Coding and Bit Allocation Conclusions and Future Work

Page 3: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Introduction What does MPEG-1 Audio provide?

A transparently lossy audio compression system based on the weaknesses of the human ear.

Can provide compression by a factor of 6 and retain sound quality.

One part of a three part standard that includes audio, video, and audio/video synchronization.

Page 4: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Technical Overview

Page 5: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

MPEG-I Audio Features PCM sampling rate of 32, 44.1, or 48 kHz Four channel modes:

Monophonic and Dual-monophonic Stereo and Joint-stereo

Three modes (layers in MPEG-I speak): Layer I: Computationally cheapest, bit rates > 128kbps Layer II: Bit rate ~ 128 kbps, used in VCD Layer III: Most complicated encoding/decoding, bit rates ~

64kbps, originally intended for streaming audio

Page 6: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Human Audio System (ear + brain) Human sensitivity to sound is non-linear

across audible range (20Hz – 20kHz) Audible range broken into regions where

humans cannot perceive a difference called the critical bands

Page 7: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

MPEG-I Encoder Architecture[1]

Page 8: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

MPEG-I Encoder Architecture Polyphase Filter Bank: Transforms PCM samples

to frequency domain signals in 32 subbands Psychoacoustic Model: Calculates acoustically

irrelevant parts of signal Bit Allocator: Allots bits to subbands according to

input from psychoacoustic calculation. Frame Creation: Generates an MPEG-I compliant

bit stream.

Page 9: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

The Polyphase Filter Bank

Page 10: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Divides audio signal into 32 equal width

subband streams in the frequency domain. Inverse filter at decoder cannot recover

signal without some, albeit inaudible, loss. Based on work by Rothweiler[2].

Standard specifies 512 coefficient analysis window, C[n]

Page 11: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Buffer of 512 PCM samples with 32 new

samples, X[n], shifted in every computation cycle Calculate window samples for i=0…511:

Partial calculation for i=0…63:

Calculate 32 subsamples:

][][][ iXiCiZ

7

0

]64[][j

jiZiY

63

0

]][[][][k

kiMiYiS

Page 12: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Visualization of the filter[1]:

Page 13: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank The net effect:

Analysis matrix:

Requires 512 + 32x64 = 2560 multiplies. Each subband has bandwidth π/32T centered at

odd multiples of π/64T

]64[]64[]][[][63

0

7

0

jiXjiCkiMiSk j

64

)16)(12(cos]][[

kikiM

Page 14: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Shortcomings:

Equal width filters do not correspond with critical band model of auditory system.

Filter bank and its inverse are NOT lossless. Frequency overlap between subbands.

Page 15: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Comparison of filter banks and critical bands [1]:

Page 16: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Polyphase Filter Bank Frequency response of one subband[1]:

Page 17: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model

Page 18: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

The Weakness of the Human Ear Frequency dependent resolution:

We do not have the ability to discern minute differences in frequency within the critical bands.

Auditory masking: When two signals of very close frequency are

both present, the louder will mask the softer. A masked signal must be louder than some

threshold for it to be heard gives us room to introduce inaudible quantization noise.

Page 19: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

MPEG-I Psychoacoustic Models MPEG-I standard defines two models: Psychoacoustic Model 1:

Less computationally expensive Makes some serious compromises in what it

assumes a listener cannot hear Psychoacoustic Model 2:

Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.

Page 20: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model Convert samples to frequency domain

Use a Hann weighting and then a DFT Simply gives an edge artifact (from finite window

size) free frequency domain representation. Model 1 uses 512 (Layer I) or 1024 (Layers II

and III) sample window. Model 2 uses a 1024 sample window and two

calculations per frame.

Page 21: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model Need to separate sound into “tones” and “noise”

components Model 1:

Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency.

Model 2: Calculate “tonality” index to determine likelihood of each

spectral point being a tone based on previous two analysis windows

Page 22: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model “Smear” each signal within its critical band

Use either a masking (Model 1) or a spreading function (Model 2).

Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.

Page 23: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model Calculate a masking threshold for each subband in the

polyphase filter bank Model 1:

Selects minima of masking threshold values in range of each subband

Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT!

Model 2: If subband wider than critical band:

Use minimal masking threshold in subband If critical band wider than subband:

Use average masking threshold in subband

Page 24: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model The hard work is done – now, we just

calculate the signal-to-mask ratio (SMR) per subband SMR = signal energy / masking threshold

We pass our result on to the coding unit which can now produce a compressed bitstream

Page 25: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model (example) Input[1]:

Page 26: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model (example) Transformation to perceptual domain[1]:

Page 27: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model (example) Calculation of masking thresholds[1]:

Page 28: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model (example) Signal-to-mask ratios[1]:

Page 29: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Psychoacoustic Model (example) What we actually send[1]:

Page 30: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Coding and Bit Allocation

Page 31: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Layer Specific Coding Layer specific frame formats[1]:

Page 32: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Layer Specific Coding Stream of samples is processed in groups[1]:

Page 33: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Layer I Coding Group 12 samples from each subband and

encode them in each frame (=384 samples) Each group encoded with 0-15 bits/sample Each group has 6-bit scale factor

Page 34: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Layer II Coding Similar to Layer I except:

Groups are now 3 of 12 samples per-subband = 1152 samples per frame

Can have up to 3 scale factors per subband to avoid audible distortion in special cases

Called scale factor selection information (SCFSI)

Page 35: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Layer III Coding Further subdivides subbands using Modified

Discrete Cosine Transform (MDCT) – a lossless transform

Larger frequency resolution => smaller time resolution possibility of pre-echo

Layer III encoder can detect and reduce pre-echo by “borrowing bits” from future encodings

Page 36: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Bit Allocation Determine number of bits to allot for each

subband given SMR from psychoacoustic model. Layers I and II:

Calculate mask-to-noise ratio: MNR = SNR – SMR (in dB) SNR given by MPEG-I standard (as function of quantization

levels) Now iterate until no bits to allocate left:

Allocate bits to subband with lowest MNR. Re-calculate MNR for subband allocated more bits.

Page 37: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Bit Allocation Layer III:

Employs “noise allocation” Quantizes each spectral value and employs

Huffman coding If Huffman encoding results in noise in excess of

allowed distortion for a subband, encoder increases resolution on that subband

Whole process repeats until one of three specified stop conditions is met.

Page 38: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Conclusions and Future Work

Page 39: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Conclusions MPEG-I provides tremendous compression

for relatively cheap computation. Not suitable for archival or audiophile grade

music as very seasoned listeners can discern distortion.

Modifying or searching MPEG-I content requires decompression and is not cheap!

Page 40: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Future Work MPEG-1 audio lays the foundation for all modern

audio compression techniques Lots of progress since then (1994!) MPEG-2 (1996) extends MPEG audio

compression to support 5.1 channel audio MPEG-4 (1998) attempts to code based on

perceived audio objects in the stream Finally, MPEG-7 (2001) operates at an even

higher level of abstraction, focusing on meta-data coding to make content searchable and retrievable

Page 41: A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

References[1] D. Pan, “A Tutorial on MPEG/Audio Compression”,

IEEE Multimedia Journal, 1995.

[2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New Subband Coding Technique”, Proc of the Int. Conf. IEEE ASSP, 27.2, pp1280-1283, Boston 1983.