Post on 04-Jan-2016
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 1Lecture 10
ECEC 453Image Processing Architecture
Lecture 10, 2/17/2004
MPEG-2, Industrial Strength Video Compression
and FriendsOleh Tretiak
Drexel University
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 2Lecture 10
Lecture Outline Basic Video Coding Features of MPEG-1 Features of H261 MPEG-2 Introduction to MPEG-4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 4Lecture 10
Picture of LayersGOP-1GOP-NGOP-2IBBPBB ... PSlice-1Slice-NSlice-2Sequence LayerGOP layerPicture layermb-1mb-2mb-n012333YCrCbSlice layerMacroblock layerBlock layer
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 5Lecture 10
Video Compression: Picture Types
Group of Pictures: Three types I — intraframe coding only P — predictive coding B — bi-directional coding
IPB12345678
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 6Lecture 10
Typical MPEG coding parameters Typical sequence
IPBBPBBPBBPBBPBB (16 frames)
Picture Average size
Comp-ression
I 156000 6.5P 62000 16.4B 15000 67.6
Compression (GOP) = BitsPerFrameU ×NFramesPerGOP
BitsPerCodedGOPBitsPerCodedGOP=NI frames×(Bits/ Iframe)+NPframes×(Bits/Pframe)+
+NBframes×(Bits/Bframe)
Bits / Iframe =BitsPerFrameU/CI , Bits/ Pframe=BitsPerFrameU/CP
Bits /Bframe=BitsPerFrameU/CB
Compression (GOP) = NFramesPerGOP
NIframes / CI +NPframes /CP +NBframes/CB
= 161/ 6.5 + 5 / 16 .4 +10 / 67 .6
=26.4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 7Lecture 10
Block Diagram of MPEG Decoder
I frameP frame
B frame
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 8Lecture 10
Macroblock Coding: I & P I pictures (almost like JPEG)
Divided into slices and macroblocks No motion compensation Each macroblock can have different quantization DC and AC coded differently, as in JPEG Different coding tables from JPEG
P pictures Divided into slices and macroblocks Option: no motion compensation Option: can code block as inter or intra (like I picture) Can skip macroblock (replace with previous). Great compression
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 9Lecture 10
Coding Image Blocks B pictures
Inter or intra? Forward, backward, interpolational? Code block or skip? Quantization step?
I P B Zero MV Skipped TotalI 3300 3300P 897 8587 5128 568 15180B 60 7356 22845 429 30690
Picture Type
Macroblock typeStatistics for an image sequence
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 11Lecture 10
MPEG-1: ‘1.5’ Mbps Sample rate reduction in spatial and temporal domains Spatial
Block-based DCT Huffman coding (no arithmetic coding) of motion vectors and
quantized DCT coefficients 352 x 340 pixels, 12 bits per pixel, picture rate 30 pictures per second
—> 30.4 Mbps Coded bit stream 1.15 Mbps (must leave bandwidth for audio) Compression 26:1 Quality better than VHS!
Temporal Block-based motion compensation Interframe coding (two kinds)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 12Lecture 10
Video Teleconferencing Comprehensive Standard: H.320 Components of H.320
H.261: Video coding, 64 to 1920 kbits/sec G.722, G.726, G.728: Audio coding from 16 kbits/sec to 64
kbits/sec H.221: Multiplexing of audio and video (frame based rather than
packet based) H.230 and H.242: Handshaking and control H.233: encryption
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 13Lecture 10
Generic Video Telephone System
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 14Lecture 10
H.261 Features Common Interchange Format
Interoperability between 25 fps and 30 fps countries 252 pix/line, 288 line, 30 fps noninterlace Terminal equipment converts frame and line numbers Y Cb Cr components, color sub-sampled by a factor of 2 in both
directions Coding
DCT, 8x8, 4 Y and 2 chrominance per masterblock I and P frames only, P blocks can be skipped Motion compensation optional, only integer compensation (Optional) forward error correction coding
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 15Lecture 10
H.261 vs MPEG-1 Similarities
CIF, SIF, non-interlaced DCT technology
Differences H.261 uses mostly P frames, no B frames H.261 typical bit rates much lower (down to 64 kbits/sec)
Low bit rates achieved by reducing frame rate Simpler motion compensations End-to-end coding delay must be low
Conclusion: Same technology, different design to meet different needs
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 16Lecture 10
MPEG 2i, i = 0, 1 History & Goals Expanding universe of video coding What are MPEG-2 profiles? Features of MPEG-2
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 17Lecture 10
MPEG Home Official web site
(http://www.cselt.it/mpeg/ still works) http://mpeg.telecomitalialab.com/
Information site http://www.mpeg.org/MPEG/ (unchanged)
History MPEG-1, the standard for storage and retrieval of moving pictures and audio on
storage media (approved Nov. 92) MPEG-2, the standard for digital television (approved Nov. 94) MPEG-4 version 1, the standard for multimedia applications (approved Oct. 98),
version 2, (approved Dec. 99) Under development: MPEG-4 versions 3&4 MPEG-7 the content representation standard for multimedia information
search, filtering, management and processing. Started MPEG-21, the multimedia framework.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 18Lecture 10
MPEG Example Film on DVD: 8 Gbytes Playing time: 2 hours Bit rate 8e9 bytes x 8 bits/byte / 7200 seconds ~ 9 Mbits/sec Information? on the web
http://www.microsoft.com/windowsxp/moviemaker/expert/digitalvideo.asp
‘Bit Rate Explained Bit rate describes how much information there is per second in a stream of data. You might have seen audio files described as “128–Kbps MP3” or “64–Kbps WMA.” Kbps stands for “kilobytes per second,” ....’
Site claims that 64 Kbps WMA is as good as 128 Kbps MP3 Ignorance about bits and bytes does not encourage credibility
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 19Lecture 10
MPEG-2 Goals Compatibility with MPEG-1 Good picture quality Flexibility in input format Random access capability (I pictures) Capability for fast forward, fast reverse play, stop frame Bit stream scalability Low delay for 2-way communications (videoconferencing) Resilience to bit errors
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 20Lecture 10
MPEG-2 Implications No reason to restrict to CCIR 601
High resolution can be included (HDTV) No single standard can satisfy all requirements
Family of standards Most applications use a small set of the features
Toolkit approach
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 21Lecture 10
MPEG-2 profiles A profile is a subset of the entire MPEG-2 bit-stream syntax
Simple Main 4:2:2 SNR Spatial High Multiview
Each profile has several levels (resolution quality) Low — MPEG1 Main — CCIR 601 High-1440 (Video Editing) High (HDTV)
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 22Lecture 10
Features of MPEG-2 Support of both non-interlaced and interlaced pictures Color handling
Y Cb Cr color space Several subsampling schemes are used
4:2:0, 4:2:2, 4:4:4 MPEG-2 sequence can be either frames or fields
Both frame prediction and field prediction are supportedThere can be motion between two fields in a frame, so that
frame prediction is more tricky In frame prediction, both fields constitute one picture In field prediction, either field in the previous frame or the
previous field in this frame can be used as referenceRobustified coding of motion vectors to protect against bit
errorsSpecial prediction modes: 16x8, dual-prime
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 23Lecture 10
MPEG-2: DCT and Quantization Two quantizers: one for intra blocks and one for non-intra
blocks Support different quantization blocks for luminance and
chrominance Scalable bit streams
data partitioning, SNR scalability, temporal scalability, spatial scalability
Data partitioning: headers and motion vectors in two bit streams SNR scalability: lower layer provided basic video, other layers
provide enhancements. Basic layer sent with robust modulation Spatial scalability: lower layer provides basic resolution (e. g.,
MPEG-1), upper layer provides detail Temporal scalability: lower layer provides basic (low) frame rate
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 24Lecture 10
MPEG-2: Profiles 4:2:2 profile at Main level
Two Y blocks for each pair of Cb, Cr blocks Distribution format for video production Robust for several compressions and decompressions 720x608, 30 fps 50 Mbit/sec Luminance full raster, chrominance are at full line rate DC precision of intra blocks can be up to 11 bits
Main (4:2:0) profile at Main level Four Y blocks for each pair of Cb, Cr blocks Intended for broadcast quality (actually, is better) 15 Mbit/sec
Main profile at low level Like MPEG-1
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 25Lecture 10
MPEG2 features Schemes for ‘frame’ and field coding. There are two fields in a frame, T (top) B (bottom) Either can be first
Frame prediction for frame pictures What’s there to say?
Field prediction for field pictures Target macroblock is in one field Prediction pixels come from one field Can be the same of different parity as target field
Field prediction for frame pictures Dual prime for P-pictures 16x8 macroblock for field pictures
Motion vectors coded at half-pel resolution
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 26Lecture 10
MPEG2 - Alternate Scan
Zig-zag scan Alternate scan
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 27Lecture 10
MPEG2 — Subsampling Suppose picture is 720x480
4:4:4 Luminance and chrominance @ 720x480
4:2:2 Luminance @ 720x480, chrominance 360x480
4:2:0 Luminance 420x480, chrominance 360x240
Weird terminology
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 28Lecture 10
Low Y ~ 352x240 Cb, Cr ~ 176x120 30 pictures per second +/- 64 pixel displacement, half pixel resolution
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 29Lecture 10
Main (4:2:0) Y ~ 720x480 Cb,Cr ~ 360x240 30 frames per second 4:3, 16:9 aspect ratio Bitrate 15 Mbps (some applications as low as 5 Mbps) Digital television
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 30Lecture 10
High Y 1920x1152 Cb, Cr 960x576 60 frames per second 80 Mbps HDTV
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 31Lecture 10
Low rate Where is it needed? How is it done?
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 32Lecture 10
MPEG-2: DCT and Quantization Two quantizers: one for intra blocks and one for non-intra
blocks Support different quantization blocks for luminance and
chrominance Scalable bit streams
data partitioning, SNR scalability, temporal scalability, spatial scalability
Data partitioning: headers and motion vectors in two bit streams SNR scalability: lower layer provided basic video, other layers
provide enhancements. Basic layer sent with robust modulation Spatial scalability: lower layer provides basic resolution (e. g.,
MPEG-1), upper layer provides detail Temporal scalability: lower layer provides basic (low) frame rate
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 33Lecture 10
MPEG-4Multimedia Standard
Thumbnail Description
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 34Lecture 10
What Is Left for MPEG-4? Initial goals
Coding standards for lower-than-MPEG-1 rates Hidden agenda: Incorporate new coding methods
Wavelet, fractal Revised agenda: Object-based coding
MPEG-4 Architecture Input to coder consist of audio, video, and stored objects Decoder combines encoded objects with local objects Example: send text by sending character codes, receiver uses
character generator.
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 35Lecture 10
EncoderStoredObjects
Muxand
Demux
Audio-VideoObjects
Muxand
Demux
DecoderStoredObjectsCompositor
Schematic Overview of MPEG-4
Image Processing Architecture, © 2001-2004 Oleh Tretiak Page 36Lecture 10
MPEG-4 Ideas Video Object Plane (VOP)
A VOP can be a natural image from video camera or from a graphics database
A VOP can consist of several visual object. Visual objects do not have to have rectangular outline (arbitrary shape)
A scene consists of several VO’s and VOP’s with appropriate compositing
Different VOP’s can have their own motion In principle, a visual scene can be decomposed into video
objects by segmentation. Color and texture can be attributes of visual objects A viewer can manipulate VO’s.