-1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui...

138
-1- 2004. 10. 20. Overview of H.264 / Overview of H.264 / MPEG-4 Part10 MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington

Transcript of -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui...

Page 1: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-1-

2004 10 20

Overview of H264 Overview of H264 MPEG-4 Part10MPEG-4 Part10

Soon-kak Kwon A Tamhankar K R Rao

Dongeui University T-Mobile University of Texas at Arlington

-2-

Contents

1 Introduction

2 Layered Structure

3 Video Coding Algorithm

4 Error Resilience

5 Comparison of Coding Efficiency

6 Conclusions

-3-

Introduction

Scope of Image and Video Coding Standards Only the Syntax and Decoder are standardized

ndash Optimization beyond the obviousndash Complexity reduction for implementation ndash Provides no guarantees of quality

Pre-Processing Encoding

Post-Processingamp Error Recovery

Decoding

Input (image video)

Output (image video)

Scope of Standard

-4-

Introduction Video Coding Standards

2003Advanced Video Coding

2002Multimedia FrameworkMPEG-21

2001Multimedia Content description Interface

MPEG-7

2000Interactive videoMPEG-4

1995DTV SDTV HDTV DVDMPEG-2

1992Video CDMPEG-1

1998 2000VideophoneH263 H263++

1995 2000DTV SDTVH262 H262+

1990Video ConferencingH261

1995-2000FaxJBIG

1992-1999 2000ImageJPEG JPEG2000

YearMain ApplicationsStandard

2004 AugustFidelity Range Extensions

(High profile) Studio editing Post processing Digital cinema

H264MPEG-4 part 10

-5-

Introduction

MPEG-1 Formally ISOIEC 11172-2 (rsquo93) developed by ISOIEC JTC1 SC29

WG11 (MPEG) ndash use is fairly widespread but mostly overtaken by MPEG-2

ndash Superior quality compared to H261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution)

ndash Provides approximately VHS quality between 1-2Mbps using SIF 352x240288 resolution

ndash Additional technical features bull Bi-directional motion prediction (B-pictures)bull Half-pel motion vector resolutionbull Slice-structured codingbull DC-only ldquoDrdquo pictures

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 2: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-2-

Contents

1 Introduction

2 Layered Structure

3 Video Coding Algorithm

4 Error Resilience

5 Comparison of Coding Efficiency

6 Conclusions

-3-

Introduction

Scope of Image and Video Coding Standards Only the Syntax and Decoder are standardized

ndash Optimization beyond the obviousndash Complexity reduction for implementation ndash Provides no guarantees of quality

Pre-Processing Encoding

Post-Processingamp Error Recovery

Decoding

Input (image video)

Output (image video)

Scope of Standard

-4-

Introduction Video Coding Standards

2003Advanced Video Coding

2002Multimedia FrameworkMPEG-21

2001Multimedia Content description Interface

MPEG-7

2000Interactive videoMPEG-4

1995DTV SDTV HDTV DVDMPEG-2

1992Video CDMPEG-1

1998 2000VideophoneH263 H263++

1995 2000DTV SDTVH262 H262+

1990Video ConferencingH261

1995-2000FaxJBIG

1992-1999 2000ImageJPEG JPEG2000

YearMain ApplicationsStandard

2004 AugustFidelity Range Extensions

(High profile) Studio editing Post processing Digital cinema

H264MPEG-4 part 10

-5-

Introduction

MPEG-1 Formally ISOIEC 11172-2 (rsquo93) developed by ISOIEC JTC1 SC29

WG11 (MPEG) ndash use is fairly widespread but mostly overtaken by MPEG-2

ndash Superior quality compared to H261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution)

ndash Provides approximately VHS quality between 1-2Mbps using SIF 352x240288 resolution

ndash Additional technical features bull Bi-directional motion prediction (B-pictures)bull Half-pel motion vector resolutionbull Slice-structured codingbull DC-only ldquoDrdquo pictures

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 3: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-3-

Introduction

Scope of Image and Video Coding Standards Only the Syntax and Decoder are standardized

ndash Optimization beyond the obviousndash Complexity reduction for implementation ndash Provides no guarantees of quality

Pre-Processing Encoding

Post-Processingamp Error Recovery

Decoding

Input (image video)

Output (image video)

Scope of Standard

-4-

Introduction Video Coding Standards

2003Advanced Video Coding

2002Multimedia FrameworkMPEG-21

2001Multimedia Content description Interface

MPEG-7

2000Interactive videoMPEG-4

1995DTV SDTV HDTV DVDMPEG-2

1992Video CDMPEG-1

1998 2000VideophoneH263 H263++

1995 2000DTV SDTVH262 H262+

1990Video ConferencingH261

1995-2000FaxJBIG

1992-1999 2000ImageJPEG JPEG2000

YearMain ApplicationsStandard

2004 AugustFidelity Range Extensions

(High profile) Studio editing Post processing Digital cinema

H264MPEG-4 part 10

-5-

Introduction

MPEG-1 Formally ISOIEC 11172-2 (rsquo93) developed by ISOIEC JTC1 SC29

WG11 (MPEG) ndash use is fairly widespread but mostly overtaken by MPEG-2

ndash Superior quality compared to H261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution)

ndash Provides approximately VHS quality between 1-2Mbps using SIF 352x240288 resolution

ndash Additional technical features bull Bi-directional motion prediction (B-pictures)bull Half-pel motion vector resolutionbull Slice-structured codingbull DC-only ldquoDrdquo pictures

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 4: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-4-

Introduction Video Coding Standards

2003Advanced Video Coding

2002Multimedia FrameworkMPEG-21

2001Multimedia Content description Interface

MPEG-7

2000Interactive videoMPEG-4

1995DTV SDTV HDTV DVDMPEG-2

1992Video CDMPEG-1

1998 2000VideophoneH263 H263++

1995 2000DTV SDTVH262 H262+

1990Video ConferencingH261

1995-2000FaxJBIG

1992-1999 2000ImageJPEG JPEG2000

YearMain ApplicationsStandard

2004 AugustFidelity Range Extensions

(High profile) Studio editing Post processing Digital cinema

H264MPEG-4 part 10

-5-

Introduction

MPEG-1 Formally ISOIEC 11172-2 (rsquo93) developed by ISOIEC JTC1 SC29

WG11 (MPEG) ndash use is fairly widespread but mostly overtaken by MPEG-2

ndash Superior quality compared to H261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution)

ndash Provides approximately VHS quality between 1-2Mbps using SIF 352x240288 resolution

ndash Additional technical features bull Bi-directional motion prediction (B-pictures)bull Half-pel motion vector resolutionbull Slice-structured codingbull DC-only ldquoDrdquo pictures

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 5: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-5-

Introduction

MPEG-1 Formally ISOIEC 11172-2 (rsquo93) developed by ISOIEC JTC1 SC29

WG11 (MPEG) ndash use is fairly widespread but mostly overtaken by MPEG-2

ndash Superior quality compared to H261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution)

ndash Provides approximately VHS quality between 1-2Mbps using SIF 352x240288 resolution

ndash Additional technical features bull Bi-directional motion prediction (B-pictures)bull Half-pel motion vector resolutionbull Slice-structured codingbull DC-only ldquoDrdquo pictures

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 6: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-6-

Introduction

Predictive Coding with B Pictures

I B P B P

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 7: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-7-

Introduction

MPEG-2 H262 Formally ISOIEC 13818-2 amp ITU-T H262 developed

(1994) jointly by ITU-T and ISOIEC SC29 WG11 (MPEG) ndash Now in wide use for DVD and standard amp high-definition DTV (the most commonly used video coding standard)

ndash Primary new technical featuresbull Support for interlaced-scan pictures

ndash Alsobull Various forms of scalability (SNR Spatial Temporal and hybrid)bull I-picture concealment motion vectors

ndash Essentially same as MPEG-1 for progressive-scan pictures and MPEG-1 forward compatibility is required

ndash Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast 6-8Mbps DVD 18Mbps HDTV) picture skipping not easy

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 8: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-8-

Introduction

H263 The Next Generation ITU-T Rec H263 (v1 1995) The next generation of video

coding performance developed by ITU-T ndash the current premier ITU-T video standard (has overtaken H261 as dominant videoconferencing codec)

ndash Superior quality to prior standards at all bit rates (except perhaps for interlaced video)

ndash Wins by a factor of two at very low ratesndash Version 2 (late 1997 early 1998) amp version 3 (2000) later

developed with a large number of new featuresndash Profiles defined early 2001ndash H263+ amp H263++ (Extensions to H263)

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 9: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-9-

Introduction

MPEG-4 Visual Baseline H263 and Many Creative Extras MPEG-4 Visual (formally 14496-2 v1 early 1999)

Contains the H263 baseline design and adds essentially all prior features and many creative new extras

ndash Segmented coding of shapesndash Scalable wavelet coding of still texturesndash Mesh codingndash Face animation codingndash Coding of synthetic and semi-synthetic contentndash 10 amp 12-bit samplingndash More hellipndash v2 (early 2000) amp v3 (early 2001) added later

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 10: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-10-

Introduction

Relationship to Other Standards Same design to be approved in both ITU-T VCEG and ISOIEC

MPEG In ITU-T VCEG this is a new amp separate standard

ndash ITU-T Recommendation H264ndash ITU-T Systems (H32x) is modified to support it

In ISOIEC MPEG this is a new ldquopartrdquo in the MPEG-4 suitendash Separate coded design from prior MPEG-4 visual (Part 2)ndash New part 10 called ldquoAdvanced Video Codingrdquo (AVC ndash similar to ldquoAACrdquo

MPEG-2 as separate audio codec)ndash Not backward or forward compatible with prior standardsndash MPEG-4 Systems File Format modifying to support it

H2220 | MPEG-2 Systems are also be modified to support it IETF working on RTP payload packetization

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 11: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-11-

Introduction

History of H264 MPEG-4 part 10 ITU-T Q6SG16 started work on H26L (L Long Range) July 2001 H26L demonstrated at MPEG (Moving Picture Expert

s Group) call for technology December 2001 ITU-T VCEG (Video Coding Experts Group) and I

SOIEC MPEG started a joint project ndash Joint Video Team (JVT) May 2003 Final approval from ISOIEC and ITU-T The standard is named H264 by ITU-T and MPEG-4 part 10 by I

SOIEC Fidelity Range Extensions (August 2004) Amendment 1 Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 12: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-12-

Introduction

Purpose of H264 MPEG-4 part 10 Higher coding efficiency than previous standards MPEG-124 p

art 2 H261 H263 Simple syntax specifications Seamless integration of video coding into all current protocols More error robustness Various applications like video broadcasting video streaming vi

deo conferencing D-Cinema HDTV Network friendliness Balance between coding efficiency implementation complexity a

nd cost - based on state-of the-art in VLSI design technolgy

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 13: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-13-

Introduction H264 MPEG-4 part 10 Architecture

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 14: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-14-

Introduction Applications of H264 MPEG-4 part 10 A Broad range of applications

for video content including but not limited to the followingVideo Streaming over the internet

CATV Cable TV on optical networks copper etc DBS Direct broadcast satellite video services DSL Digital subscriber line video services DTTB Digital terrestrial television broadcasting cable

modem DSL ISM Interactive storage media (optical disks etc) MMM Multimedia mailing MSPN Multimedia services over packet networks RTC Real-time conversational services (videoconferencing

videophone etc) RVS Remote video surveillance SSM Serial storage media (digital VTR etc) D Cinema Content contribution content distribution studio editin

g post processing

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 15: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-15-

Introduction

Profiles and Levels for particular applications Profile a subset of entire bit stream of syntax different decoder design based on the Profile

ndash Four profiles Baseline Main Extended and High

Streaming Video Extended

Digital Storage MediaTelevision Broadcasting

Main

Video Conferencing Videophone

Baseline

Applications Profile

Content contribution

Content distribution

Studio editing

Post processing

High

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 16: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-16-

Introduction Specific coding parts for the Profiles

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 17: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-17-

Introduction

Common coding parts for the Profilesndash I slice (Intra-coded slice) the coded slice by using

prediction only from decoded samples within the same slice

ndash P slice (Predictive-coded slice) the coded slice by using inter prediction from previously-decoded reference pictures using at most one motion vector and reference index to predict the sample values of each block

ndash CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 18: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-18-

Introduction

Coding parts for Baseline Profilendash Common parts I slice P slice CAVLCndash FMO Flexible macroblock order macroblocks may not

necessarily be in the raster scan order The map assigns macroblocks to a slice group

ndash ASO Arbitrary slice order the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture

ndash RS Redundant slice This slice belongs to the redundant coded data obtained by same or different coding rate in comparison with previous coded data of same slice

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 19: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-19-

Introduction

Coding parts for Main Profilendash Common parts I slice P slice CAVLCndash B slice (Bi-directionally predictive-coded slice) the coded

slice by using inter prediction from previously-decoded reference pictures using at most two motion vectors and reference indices to predict the sample values of each block

ndash Weighted prediction scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice

ndash CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 20: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-20-

Introduction

Coding parts for Extended Profilendash Common parts I slice P slice CAVLCndash SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice ndash SI slice the switched slice similar to coding of an I slice ndash Data partition the coded data is placed in separate data

partitions each partition can be placed in different layer unit

ndash Flexible macroblock order (FMO)ndash Arbitrary slice order (ASO)ndash Redundant slice (RS)ndash B slice ndash Weighted prediction

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 21: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-21-

Introduction Profile specifications

XCABAC

XXInterlaced Coding

XXB Slice

XSPSI Slices

XXError Resilience Tools ndash Flexible MB Order ASO Red Slices

XXXCAVLCUVLC

XXXVariable Block Size (16x16 to 4x4)

XXXfrac14 Pel Motion Compensation

XXXDeblocking Filter

XXXI amp P Slices

ExtendedMainBaseline High

X

X

X

X

X

X

X

X

Data Partitioning X

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 22: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-22-

Introduction

Application requirements

Application Requirements H264 Profiles

MPEG-4 Profiles

Broadcast television Coding efficiency reliability (over a controlled distribution channel) interlace low-complexity decoder

Main ASP (Advanced Simple)

Streaming video Coding efficiency reliability (over a uncontrolled packet-based network channel) scalability

Extended ARTS (Advanced Real Time Simple) or FGS (FineGranular Scalability)

Video storage and playback

Coding efficiency interlace low-complexity encoder and decoder

Main ASP

Videoconferencing Coding efficiency reliability low latency low-complexity encoder and decoder

Baseline SP (Simple)

Mobile video Coding efficiency reliability low latency low-complexity encoder and decoder low power consumption

Baseline SP

Studio distribution Lossless or near-lossless interlace efficient transcoding

MainHigh

Studio Profile

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 23: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-23-

Introduction Level corresponding to processing power and memory

capability of a codec Level number Picture type amp frame rate

1 QCIF 15fps

11 QCIF 30fps

12 CIF 15fps

13 CIF 30fps

2 CIF 30fps

21 HHR 15 or 30fps

22 SDTV 15fps

3 SDTV 720x480x30i720x576x25i 10Mbps(max)

31 1280x720x30p

32 1280x720x60p

4 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 20Mbps(max)

41 HDTV 1920x1080x30i 1280x720x60p 2Kx1Kx30p 50Mbps(max)

42 HDTV 1920x1080x60i 2Kx1Kx60p

5 SHDTVD-Cinema 25Kx2Kx30p

51 SHDTVD-Cinema 4Kx2Kx30p

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 24: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-24-

Introduction Parameter set limits for each Level

Level number

Max macroblock

processing rate (MBs)

Max frame size (M

Bs)

Max decoded picture buffer

size (1024 bytes)

Max videobit rate

(1000 bitss or 1200 bitss)

MaxCPB size

(1000 bits or 1200 bits)

Vertical MV component range

(luma frame samples)

Min compression

ratio

Max number of MVs per two consecutive MB

s

1 1 485 99 1485 64 175 [-64+6375] 2 -

11 3 000 396 3375 192 500 [-128+12775] 2 -

12 6 000 396 8910 384 1 000 [-128+12775] 2 -

13 11 880 396 8910 768 2 000 [-128+12775] 2 -

2 11 880 396 8910 2 000 2 000 [-128+12775] 2 -

21 19 800 792 1 7820 4 000 4 000 [-256+25575] 2 -

22 20 250 1 620 3 0375 4 000 4 000 [-256+25575] 2 -

3 40 500 1 620 3 0375 10 000 10 000 [-256+25575] 2 32

31 108 000 3 600 6 7500 14 000 14 000 [-512+51175] 4 16

32 216 000 5 120 7 6800 20 000 20 000 [-512+51175] 4 16

4 245 760 8 192 12 2880 20 000 25 000 [-512+51175] 4 16

41 245 760 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

42 491 520 8 192 12 2880 50 000 62 500 [-512+51175] 2 16

5 589 824 22 080 41 3100 135 000 135 000 [-512+51175] 2 16

51 983 040 36 864 69 1200 240 000 240 000 [-512+51175] 2 16

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 25: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-25-

Layered Structure

Two Layers Network Abstraction Layer (NAL) Video Coding Layer (VCL) NAL

ndash Abstracts the VCL data ndash hence the name Network lsquoAbstractionrsquo Layer

ndash Header information about the VCL formatndash Appropriate for conveyance by the transport layers or

storage mediandash NAL unit (NALU) defines a generic format for use in both

packet based and bit-streaming systems

VCLndash Core coding layerndash Concentrates on attaining maximum coding efficiency

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 26: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-26-

Layered Structure

Elements of VCL

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 27: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-27-

Layered Structure Supporting picture format 420 chroma

sampling

CIFFormat

QCIFformat

3524

288 lines

360 pels

4

2 2

144 lines

176

180 pels

2 2

144 lines

176

180 pels

1762

144 lines

180 pels

2

1 1

72 lines

88

90 pels

1 1

72 lines

88

90 pels

Y CbCr

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 28: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-28-

Video Coding Algorithm Block diagram for H264 encoder

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFilter

+

-

++

Video Input

BitstreamOutput

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 29: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-29-

Video Coding Algorithm Block diagram for H264 Decoder

MotionCompensation

EntropyDecoding

IntraPrediction

IntraInter ModeSelection

Inverse Quantizationamp Inverse Transform

DeblockingFilter+

+Bitstream Input Video

Output

PictureBuffering

PictureBuffering

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 30: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-30-

VC Algorithm Intra Prediction Exploits Spatial redundancy between adjacent macroblocks

in a frame 4 x 4 luma block

9 prediction modes 8 Directional predictions and 1 DC prediction (vertical 0 horizontal 1 DC 2 diagonal down left 3 diagonal down right 4

vertical right 5 horizontal down 6 vertical left 7 horizontal up 8)

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 1

mode 6

mode 0 mode 5 mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

mode 3 mode 7

samples a b hellip p the predicted ones for the current block above and left samples A B hellip M previously reconstructed ones

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 31: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-31-

VC Algorithm Intra Prediction Example of 4 x 4 luma block

Sample a d predicted by round(I4 + M2 + A4) round(B4 + C2 + D4) for mode 4

Sample a d predicted by round(I2 + J2) round(J4 + K2 + L4) for mode 8

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 4

a b c d

e f g h

i j k l

m n o p

A B C D

I

J

K

L

M E F G H

mode 8

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 32: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-32-

VC Algorithm Intra Prediction 16 x 16 luma

4 prediction modes(vertical 0 horizontal 1 DC 2 plane 3)

Plane works well in smoothly varying luminance

A linear lsquoplanersquo function is fitted to the upper (H) and left side (V) samples

(8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance

Plane

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 33: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-33-

VC Algorithm Intra Prediction

Chroma always operates using full MB prediction(8x8) 420 Format(8x16) 422(16x16) 444

(Similar to 16x16 luma block but different mode order)

4 Prediction modes

(DC 0 Horizontal 1 Vertical 2 Plane 3)

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 34: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-34-

VC Algorithm Inter Prediction Exploits temporal redundancy Prediction of variable block sizes Sub-pel motion compensation Deblocking filter Management of multiple reference pictures

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 35: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-35-

VC Algorithm Inter Prediction Prediction of variable block size

ndash A MB can be partitioned into smaller block sizesndash 4 cases for 16 x 16 MB 4 cases for 8 x 8 Sub-MBndash Large partition size homogeneous areas small detailed areas

Cannot mix the two partitions ie cannot have 16x8 and 4x8 partitionsWhen sub-MB partition (8x8) is selected the (8x8) block can be further

partitioned

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 36: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-36-

VC Algorithm Inter Prediction Sub-pel motion compensation

Better compression performance than integer-pel MC Expense of increased complexity Outperforms at high bit rates and high resolutions

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

motion vector accuracy 14 (6 tap filter)

001

0 10 12 3

MB

16x16

16x8 8x16 8x8

001

0 10 12 3

SubMB

8x8 8x4 4x8 4x4

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 37: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-37-

VC Algorithm Inter Prediction Sub-pel accuracy

A distinct MV can be sent for each sub-MB partition ME can be based on multiple pictures that lie in the past or in the future in display order Reference picture for ME is selected at the MB partition level Sub-MB partitions within the same MB partition must use the same reference picture

Integer position pixels

18 pixels

12 and 14 pixels

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 38: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-38-

VC Algorithm Inter Prediction Half-pel interpolated from neighboring integer-pel samples usin

g a 6-tap Finite Impulse Response filter with weights (1 -5 20 20 -5 1)32

Quarter-pel produced using bilinear interpolation between neighboring half- or integer-pel samples

bb

a cE F I JG

h

d

n

H

m

A

C

B

D

R

T

S

U

M s NK L P Q

fe g

ji k

qp r

aa

b

cc dd ee ff

hh

gg

b = round((E-5F+20G+20H-5I+J)32)a = round((G+b)2)

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 39: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-39-

VC Algorithm Inter Prediction Deblocking filter Adaptive

To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise

Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock adaptively on the several levels (slice block-edge sample)

Vertical edges(chroma)

Vertical edges(luma)

Horizontal edges(luma)

Horizontal edges(chroma)

1616 Macroblock 1616 Macroblock

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 40: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-40-

VC Algorithm Inter Prediction Management of multiple reference pictures

To take care of marking some stored pictures as lsquounusedrsquo and deciding which pictures to delete from the buffer

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

management of multiple reference pictures (short term long term)

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 41: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-41-

VC Algorithm Transform amp Quantization

Transform Integer transform multiplier free additions and shifts in 16-bit

arithmetic Hierarchical structure 4 x 4 Integer DCT + Hadamard transform

0 1 4 5

2 3 6 7

8 9 12 13

10 11 14 15

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

Assignment of the indices of DC (dark samples) to luma 4 x 4 block the numbers 0 1 hellip 15 are the coding order for (4x4) integer DCT transform

(00) (01) (02) hellip (33) are DC coefficients of each 4x4 block

Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT Similarly for the chroma MB size for chroma depends on 420 422 and 444 formats

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 42: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-42-

VC Algorithm Transform

4 x 4 integer DCT X input pixels Y output coef

ficients

Y=(Cf x CfT) Ef

1 2 1

2 5 2a b d

Implies element by element multiplication

00 01 02 03

10 11 12 13

20 21 22 23

30 31 32 33

1 1 1 1 1 2 1 1

2 1 1 2 1 1 1 2

1 1 1 1 1 1 1 2

1 2 2 1 1 2 1 1

x x x x

x x x xY

x x x x

x x x x

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 43: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-43-

4x4 Inverse IntDCT

2 2

2 2

2 2

2 2

2 2

2 4 2 4

2 2

2 4 2 4

ab aba a

ab b ab b

ab aba a

ab b ab b

In both forward and inverse transforms QP (Quantization step) is embedded in matrices E f and Ei

2 2

2 2

2 2

2 2

[ ] [ ]

a ab a ab

ab b ab bY Y

a ab a ab

ab b ab b

Here

X = CiT (Y Ei) Ci

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 44: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-44-

VC Algorithm Transform

Luma DC coefficients for Intra 16x16 MB 16 DC coefficients of 16 (4x4) blocks are transformed

using Walsh Hadamard transform

2

1111

1111

1111

1111

1111

1111

1111

1111

33323130

23222120

13121110

03020100

DDDD

DDDD

DDDD

DDDD

xxxx

xxxx

xxxx

xxxx

YD=

where = rounding to the nearest integer

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 45: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-45-

VC Algorithm Transform

Chroma DC coefficients Intra pediction mode (4x4) IntDCT Walsh Hadamard transform 2 x 2 DC coefficients

YD=

11

11

11

11

1110

0100

DCDC

DCDC

18 19

20 21

22 23

24 25

VU

2x2 DC

AC

16 17

420

For 422 and 444 chroma formats Hadamard block size is increased

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 46: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-46-

VC Algorithm Transform

Block diagram emphasizing transform

Transform ampQuantization

MotionEstimation

MotionCompensation

PictureBuffering

PictureBuffering

EntropyCoding

IntraPrediction

IntraInter ModeDecision

Inverse Quantizationamp Inverse Transform

DeblockingFiltering

+

-

++

Video InputBitstreamOutput

- 4 x 4 integer DCT transform

H =

- Hadamard transform of DC coefficients for 16 x 16 Intra luma and 8 x 8 chroma blocks

1 1 1 12 1 ndash1 ndash21 ndash1 ndash1 11 ndash2 2 ndash1

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 47: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-47-

VC Algorithm Quantization

Multiplication operation for the exact transform is combined with the multiplication of scalar quantization Encoder post-scaling and quantization Decoder inverse quantization and pre-scaling

Qstep

SFroundXY ij

ijij

ijijij SFQstepYX

X quantizer inputY quantizer outputQstep quantization parameter a total of 52 values doubles in size for every increment of 6 in QP 8 for bits per decoded sampleFRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF scaling term

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 48: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-48-

VC Algorithm Transform Quantization

Rescale and Inverse transformIntra (16x16) prediction mode only

Forwardtransform

Post-scalingand

quantization

2x2 or 4x4DC

transform

Chroma or Intra-16 Luma Only

Encoder part

Inputblock

Inverse quantization and

pre-scaling

Inversetransform

2x2 or 4x4DC inversetransform

Chroma or Intra-16 Luma Only

Decoder part

Encoder output decoder input

Outputblock

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 49: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-49-

VC Algorithm Entropy Coding All syntax elements other than residual transform coefficients are en

coded by the Exp-Golomb codes (UVLC) Scan order to read the residual data (quantized transform coefficient

s) zig-zag alternate Context-based Adaptive Variable Length Coding (CAVLC) in All Profile

s Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Pr

ofile

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

a b

0 2 8 12

1 5 9 13

3 6 10 14

4 7 11 15

Zig-zag scan Alternate scan

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 50: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-50-

Exponential Golomb codes (for data elements other than tansform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 51: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-51-

These are variable length codes with a regular construction[M Zeroes] [1] [INFO]

INFO is an M-bit carrying informationThe first codeword as no leading zero or trailing info

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M+1) bitsM = Floor (Log2 [code_num + 1])

INFO = code_num + 1 ndash 2M

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 52: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-52-

Decoding

1 Read in M leading zeroes followed by 12 Read in M-bit INFO field3 Code_num = 2M + INFO ndash 1

(For codeword 0 INFO and M are zero)

CAVLC Codes transform coefficientsCABAC Codes transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 53: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-53-

VC Algorithm Entropy Coding CAVLC handles the zero and +-1 coefficients as the

different manner with the levels of coefficients The total numbers of zeros and +-1 are coded For the other coefficients their levels are coded

Encoding steps step 1 encode the total number of nonzero coefficients and +-1 (trailing

ones) values step 2 encode the sign of each trailing one in reverse order step 3 encode the levels of the remaining non-zero coefficients in reverse

order step 4 encode the total number of zeros before the last coefficient step 5 encode each run of zeros

H264 maintains 11 different sets of codes (4 for of coefficients and 7 for the actual coefficients)

These are adopted to the current stream or context (thus CAVLC)

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 54: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-54-

VC Algorithm Entropy Coding Example of CAVLC

c0 c1 c2 0 1 1 0 ndash1 0 0 hellip 0 0 1 2 3 4 5 6 7 8 9 hellip 16 order

coeff Step 1 encode for no of nonzero total coefficients and 1 or ndash1 (trailing ones) from look-up table

no of nonzero total coefficients = 6 (order 0 1 2 4 5 7) no of trailing ones = 3 (order 4 5 7)

Step 2 encode for sign of trailing one in reverse order- (order 7) + (order 5) + (order 4)

Step 3 encode for level of remaining non-zero coefficients in reverse orderc2 (order 2) c1 c0

Step 4 encode for total no of zeros before the last coefficient 2 (order 3 6)

Step 5 encode for run of zeros in reverse order

1 (order 6-5) 0 (order 4) 1 (order 3-2)

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 55: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-55-

VC Algorithm Entropy Coding CABAC utilizes the arithmetic coding also in order to achieve goo

d compression the probability model for each symbol element is updated Both MV and residual transform coefficients are coded by CABAC

Encoding steps step 1 context modeling Choose a suitable model

step 2 binarization If a symbol is non-binary valued it will be mapped into a sequence of binary decisions called bins

step 3 binary arithmetic coding using probability estimates provided by context modeling

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 56: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-56-

CABAC increases compression efficiency by 10 over CAVLC but computationally more intensive

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 57: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-57-

VC Algorithm B Slice Generalized Bidirectional prediction

Supports not only forwardbackward prediction pair but also forwardforward and backwardbackward pairs

Direct mode Derives reference picture block size and motion vector

data from the subsequent inter picture

Weighted prediction Scaling operation by applying a weighting factor to the

samples of motion-compensated prediction data in P or B slice

Pictures coded using B slices can be used as references for decoding of subsequent pictures in decoding order (with an arbitrary relationship to such pictures in display order)

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 58: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-58-

VC Algorithm B Slice Generalized Bidirectional prediction

Multiple reference pictures mode Two forward references proper for a region just before

scene change Two backward references proper for a region just after

scene change

next pictures

current picture

previous pictures

2 forward MVs

2 backward MVs

1 forward MV +1 backward MV

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 59: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-59-

VC Algorithm B Slice Direct mode

Forward backward pair of bi-directional prediction Prediction signal is calculated by a linear combination of

two blocks that are determined by the forward and backward motion vectors pointing to two reference pictures

List 0 Reference

td

tb

mvCol

mvL0

mvL1

direct-mode partition

co-located partition

List 1 ReferenceCurrent Picture

mvL0 = tb mvCol td mvL1 = ndash (td ndash tb) mvCol td

where mvCol is a MV used in the co-located MB of the subsequent picture

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 60: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-60-

VC Algorithm B Slice Weighted prediction

Different weights of reference signals for gradual transitions from scene to scene ie lsquofade to blackrsquo (the luma samples of the scene gradually approach zero) lsquofade from blackrsquo

Different weighted prediction method for a macroblock of P slice or B slice

A prediction signal p for B slice is obtained by different weights from two reference signals r1 and r2

p = w1 r1 + w2 r2

where w1 and w2 are weighting factors Implicit type the factors are calculated based on the

temporal distance between the pictures Explicit type the factors are transmitted in the slice

header

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 61: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-61-

VC Algorithm SP and SI Slices (Extended profile only)

Switched slice SP slice the specially coded slice for efficient switching

between video streams similar to coding of a P slice SI slice the switched slice similar to coding of an I slice

P(11) P(12) P(13) P(14) P(15)

P(21) P(22) P(23) P(24) P(25)

S(3)

Bitstream A

Bitstream B

Allows bit stream switching and additional functionalities such as random access fast forward

reverse and stream splicing

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 62: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-62-

Error Resilience Parameter setting Flexible macroblock ordering (FMO) Redundant slice methods Switched slice SPSI Data partitioning Arbitrary Slice Order ASO

Only in Extended Profile

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 63: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-63-

Data partitioning slices (Extended profile only)

1 Coded data of a slice is placed in three separate data partitions AB amp C

2 A has slice header and header data for each MB in the splice

3 B has coded residual data for intra and SI slice MBs

4 C has coded residual data for inter coded MB5 Place each partition A B amp C in a separate NAL

unit and transport separately

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 64: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-64-

Error Resilience Parameter setting The sequence parameter set contains all information

related to a sequence of pictures a picture parameter set contains all information

related to all the slices belonging to a single picture The encoder chooses the appropriate picture

parameter set to use by referencing the storage location in the slice header of each coded slice

H264 Encoder

H264 Decoder

Parameter Set 3Video format NTSCMotion Resolution frac14Enc CABACFrame width 11

1 2 3 3 2 1Reliable Parameter Set Exchange

VCL Data transfer with PS 3

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 65: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-65-

Error Resilience FMO Flexible macroblock ordering allows to assign

macroblocks to slices in an order other than the scan order

Assume that all macroblocks of the picture are allocated either to slice group 0 or slice group 1 and the macroblocks in each slice group are dispersed through the picture If the packet containing the information of slice group 1

is lost during transmission then the lost macroblock can be recovered by the error concealment mechanism since every lost macroblock has several spatial neighbors that belong to the other slice

ASO is similar to FMO Randomizes data prior to transmission Errors are distributed more randomly over the video frames rather than in a single block of data

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 66: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-66-

Error Resilience Redundant Slice Redundant slices allow to place one or more

redundant representations of the same macroblocks

For example the primary representation can be coded with a low quantization parameter (hence in good quality) whereas the redundant slice can be coded with a high quantization parameter (hence in a much coarser quality but also utilizing fewer bits)

A decoder reacts to redundant slices by reconstructing only the primary slice if it is available and discarding the redundant slice However if the primary slice is missing the redundant slice can be reconstructed

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 67: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-67-

Comparison of Coding Efficiency Subjective verification test

Comparison of the H264 Baseline Profile (BP) and MPEG-4 part 2 Simple Profile (SP) for the multimedia definition (MD) The numbers in the table indicate the coding efficiency improvement achieved by the H264 where the codecs being compared provide statistically equivalent picture quality The letter lsquoTrsquo indicates that H264 achieved transparency

H264 Baseline Profile achieves a coding efficiency improvement of 2 times or greater in 14 out of 18 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Foreman gt 1x 2x 2x T 2x gt 2x T T

Paris gt 1x 2x 2x 2x 2x T 2x T

Head gt 2x 2x 2x T T

Zoom gt 1x 1x 2x 2x

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 68: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-68-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile (MP) and MPEG-4 Part 2 Advanced Simple Profile (ASP) for the MD

H264 Main Profile achieves a coding efficiency improvement of 2 times or greater in 18 out of 25 statistically conclusive cases

Sequence

Bitrate[kbps] for QCIF Bitrate[kbps] for CIF

24 48 96 192 96 192 384 768

Football 2x 1x 2x 2x gt 1x gt 1x 1x gt 1x

Mobile 2x 1x 2x 2x gt 2x 4x gt 2x T

Husky 2x 2x gt 1x 2x 2x 2x

Tempete 2x 2x gt 2x T 2x 2x T2x T

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 69: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-69-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the Standard Definition (SD)

When compared to MPEG-2 HiQ (real-time High Quality) H264 Main Profile achieves a coding efficiency improvement of 15 times or greater in 8 out of 12 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 18 times or greater in 9 out of 12 statistically conclusive cases

Sequence

Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

15 225 3 4 6 15 225 3 4 6

Football gt 15x gt 13x 13x 15x 2x 18x 13x 15x

Mobile 4x 27x 2x T T gt 4x gt 27x gt 2x T T

Husky gt 15x 13x 1x 13x

15x 27x 2x 18x 2x gt 15x

Tempete T 2x T T T T T 4x T T T T

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 70: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-70-

Comparison of Coding Efficiency Subjective verification test

Comparison of H264 Main Profile and MPEG-2 for the High Definition (HD)

When compared to MPEG-2 HiQ H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 7 out of 9 statistically conclusive cases

When compared to MPEG-2 TM5 H264 Main Profile achieves a coding efficiency improvement of 17 times or greater in 8 out of 9 statistically conclusive cases

Sequence Bitrate[Mbps] for MPEG-2 HiQ Bitrate[Mbps] for MPEG-2 TM5

6 10 20 6 10 20

720 (60p)

Crew 17x 2x T 17x 2x T

Harbour T 33x T T T 17x T T

1080 (30i)

Stockholm Pan

1x 2x

New Mobile amp Calendar

T 2x T T 2x T

1080 (25p)

River Bed gt 17x gt 1x T gt 17x gt 1x T

Vintage Car 17x T 2x T 17x T 2x T

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 71: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-71-

Comparison of Coding Efficiency Objective test

PSNR (between original and reconstructed pictures) and bitrate saving results of lsquoTempetersquo CIF 15Hz sequence for the video streaming application

HLP ndash High Latency Profile ASP ndash Advanced Simple ProfileH26L ndash H264 Main Profile

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 72: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-72-

Comparison of Coding Efficiency Objective test

PSNR and bitrate saving results of lsquoParisrsquo CIF 15Hz sequence for the video conferencing application

CHC ndash Conversational High CompressionSP ndash Simple ProfileASP ndash Advanced Simple ProfileH26L ndash H264 Baseline Profile

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 73: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-73-

Conclusions H264 outperforms over the previous standards Comparison of standards

FeatureStandard MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Macroblock size 16x16 16x16 (frame mode)

16x8 (field mode)

16x16 16x16

Block Size 8x8 8x8 16x16 16x8 8x8

16x16 8x16 16x8 8x8 4x8

8x4 4x4

Transform 8x8 DCT 8x8 DCT 8x8 DCTWavelet

4x4 8x8 Int DCT4x4 2x2

Hadamard

Quantization Scalar quantization

with step size of constant

increment

Scalar quantization with step size of

constant increment

Vector quantization

Scalar quantization with step size increase

at the rate of 125

Entropy coding VLC VLC VLC VLC CAVLC CABAC

Motion Estimation amp Compensation

Yes Yes Yes Yes more flexibleUp to 16 MVs per M

B

Playback amp Random Access

Yes Yes Yes Yes

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 74: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-74-

Conclusions Comparison of standards (continued)

FeatureStandard

MPEG-1 MPEG-2 MPEG-4 part 2 (visual)

H264MPEG-4 part 10

Pel accuracy Integer frac12-pel Integer frac12-pel Integer frac12-pel frac14-pel

Integer frac12-pel frac14-pel

Profiles No 5 8 4

Reference picture one one one multiple

Bidirectional prediction mode

forwardbackward

forwardbackward

forwardbackward

forwardforwardforwardbackward

backwardbackward

Picture Types I P B D I P B I P B I P B SP SI

Error robustness Synchronization amp concealment

Data partitioning FEC

for important packet

transmission

Synchronization Data partitioning Header extension R

eversible VLCs

Data partitioningParameter

settingFlexible

macroblock ordering

Redundant slice Switched slice

Transmission rate Up to 15Mbps 2-15Mbps 64kbps - 2Mbps 64kbps -240Mbps

Compatibility with previous standards

na Yes Yes No

Encoder complexity

Low Medium Medium High

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 75: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-75-

Conclusions

Currently the commercial H264 codecs are widely developed by several companies for replacing complementing existing products Related companies

- UBVideo website httpwwwubvideocom- LSI Logic website httpwwwlsilogiccom- Microsoft website httpwwwmicrosoftcom- Envivio website httpwwwenviviocom - Broadcom website httpwwwbroadcomcom- Nagravision website httpwwwnagravisioncom- Philips website httpwwwphilipscom- Polycom website httpwwwpolycomcom- PixelTools Corporation website httpwwwpixeltoolscom- Amphion website httpwwwamphioncom

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 76: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-76-

Conclusions

Related companies (continued)- Ligos Corporation website httpwwwligoscom- LifeSize website httpwwwlifesizecom- Netvideo website httpwwwnetvideocom- Motorola website httpwwwmotorolacom- Vanguard Software Solutions website httpwwwvsoftscom- STMicroelectronics website httpusstcom- MainConcept website httpwwwmainconceptcom- Impact Labs Inc website httpwwwimpactlabscom- Sorenson media AVC Pro codec (H264)- Blu-Ray Disc Association (BDA) MPEG-4 AVC High Profile and Microsoftrsquos VC-1 vid

eo codec (based on Windows Media Video 9 codec) mandatory (blu-ray Disc BD-ROM specification)

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 77: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-77-

Conclusions

Related group - MPEG website httpwwwmpegorg- JVT website ftpstandardspolycomcom- wwwmpegiforg

Test software httpiphomehhidesuehringtmldownload

- H264AVC JM Software httpbshhide~suehringtmldownload Test sequences

- httpisestanfordeduvideohtml- httpkbscstu-berlinde~stewevcegsequenceshtm- httpwwwitsbldrdocgovvqeg- ftptntuni-hannoverdepubjvtsequences- httptraceeasasueduyuvyuvhtml

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 78: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-78-

Conclusions H264 licensing MPEG LA and Via Licensing are now coordinating the licensi

ng terms decoder-encoder royalties for product manufacturers and participation fees for video streaming services regardless of Profile(s) MPEG LA website httpwwwmpeglacom Via Licensing httpwwwvialicensingcom

FRExtensions to 422 and 444 chroma formats 12 bit resolution for medical imaging Scalable coding Lossless coding for digital cinema application High fidelity coding for the next generation optical discs Extension for various applications H Schwartz D Marpe and T Wiegand ldquo S

NRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004

FINAL STAGES OF APPROVAL Standard systems and file format support specifications Standardizing reference software implementation Standardizing conformance bit streams and specifications

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 79: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-79-

Contacts for Further Information JVT documents and software on open ftp website ftpstandardspolyc

omcomhttpiphomehhidesuehring

JVT reflector subscription httpmailimtcorgcgi-binlyrisplenter=jvt-experts

JVT reflector e-mail jvt-expertsmailimtcorg

JVT management team Chair Gary Sullivan (garysullmicrosoftcom) Co-chair Ajay Luthra (aluthramotorolacom) Co-chair Thomas Wiegand (wiegandhhide)

Dr K R Rao UTA raoutaedu Dr S K Kwon Dongeui University skkwondongeuiackr Ms A Tamhankar T-Mobile arundhatiieeeorg Karstensuehringhhifraunhoferde

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 80: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-80-

References

[1] MPEG-2 ISOIEC JTC1SC29WG11 and ITU-T ldquoISOIEC 13818-2 Information Technology-Generic Coding of Moving Pictures and Associated Audio Information Videordquo ISOIEC and ITU-T 1994 [2] MPEG-4 ISOIEC JTCISC29WG11 ldquoISOIEC 14 4962000-2 Information on Technology-Coding of Audio-Visual Objects-Part 2 Visualrdquo ISOIEC 2000 [3] H263 International Telecommunication Union ldquoRecommendation ITU-T H263 Video Coding for Low Bit Rate Communicationrdquo ITU-T 1998[4] H264 International Telecommunication Union ldquoRecommendation ITU-T H264 Advanced Video Coding for Generic Audiovisual Servicesrdquo ITU-T 2003[5] T Stockhammer M Hannuksela and S Wenger ldquoH26LJVT Coding Network Abstraction Layer and IP-based Transportrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 485-488 Sep 2002

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 81: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-81-

[6] P List A Joch J Lainema G Bjontegaard and M Karczewicz ldquoAdaptive Deblocking Filterrdquo IEEE Trans CSVT Vol 13 pp 614-619 July 2003[7] K R Rao and P Yip Discrete Cosine Transform Academic Press 1990 [8] I EG Richardson H264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia Wiley 2003 [9] H S Malvar A Hallapuro M Karczewicz and L Kerofsky ldquoLow-Complexity Transform and Quantization in H264AVCrdquo IEEE Trans CSVT Vol 13 pp 598-603 July 2003[10] S W Golomb ldquoRun-Length Encodingrdquo IEEE Trans on Information Theory IT-12 pp 399-401 December 1966[11] D Marpe H Schwarz and T Wiegand ldquoContext-Based Adaptive Binary Arithmetic Coding in the H264AVC Video Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 620-636 July 2003

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 82: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-82-

[12] M Flierl and B Girod ldquoGeneralized B Picture and the Draft H264AVC Video-Compression Standardrdquo IEEE Trans CSVT Vol 13 pp 587-597 July 2003[13] M Karczewicz and R Kurceren ldquoThe SP- and SI-Frames Design for H264AVCrdquo IEEE Trans CSVT Vol 13 pp 637-644 July 2003[14] S Wenger ldquoH264AVC Over IPrdquo IEEE Trans CSVT Vol 13 pp 645-656 July 2003[15] ISOIEC JTC1SC29WG11 ldquoReport of The Formal Verification Tests on AVC (ISOIEC14496-10 | ITU-T Rec H264)rdquo MPEG2003N6231 December 2003[16] M Ghanbari ldquoStandard Codecs Image Compression to Advanced Video Codingrdquo Hertz UK IEE 2003[17] A Joch F Kossentini H Schwarz T Wiegand and G J Sullivan ldquoPerformance Comparison of Video Coding Standards using Lagrangian Coder Controlrdquo IEEE ICIP 2002 Rochester New York Vol 2 pp 501-504 Sept 2002

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 83: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-83-

[18] T Wiegand G J Sullivan G Bjontegaard and A Luthra ldquoOverview of the H264AVC Video Coding Standardrdquo IEEE Trans CSVT Vol 13 pp 560-576 July 2003[19] MPEG website httpwwwmpegorg[20] JVT website ftpstandardspolycomcom[21] MPEG LA website httpwwwmpeglacom[22] H264 AVC JM Software httpbshhide~suehringtmldownload[23] UBVideo website httpwwwubvideocom[24] LSI Logic website httpwwwlsilogiccom[25] Microsoft website httpwwwmicrosoftcom[26] Envivio website httpwwwenviviocom[27] PixelTools Corporation website httpwwwpixeltoolscom[28] Nagravision website httpwwwnagravisioncom[29] Philips website httpwwwphilipscom

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 84: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-84-

[30] Polycom website httpwwwpolycomcom[31] MainConcept website httpwwwmainconceptcom[32] Amphion website httpwwwamphioncom[33] Ligos Corporation website httpwwwligoscom[34] LifeSize website httpwwwlifesizecom[35] Broadcom website httpwwwbroadcomcom[36] Netvideo website httpwwwnetvideocom[37] Motorola website httpwwwmotorolacom[38] httpwwwmediawarecom[39] Impact Labs Inc website httpwwwimpactlabscom[40] Vanguard Software Solutions website httpwwwvsoftscom[41] STMicroelectronics website httpusstcom wwwthomsonnet[42] wwwconexantcom (H264 decoder ICs _ HDTV amp SDTV)[43] wwwpixtreecom

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 85: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-85-

[44] BT Exact--httpwwwbtexactbtcom[45] DemoGaFrX--wwwdolbycom[46] Equator--httpwwwequatorcom[47] Moonlight--wwwelecardcom[48] Sand Video--wwwbroadcomcom[49] VideoLocus-httpwwwlsilogiccomtechnologiesindustry_standardsmpeg_based_standards_h_264html[50] WampW Communications (and DSP Research)--httpwwwwwcomscom[51] Cisco Systems -- wwwciscocom[52] Deutsche Telekom-- httpwwwtelekom3deen-phomecc-startseitehtml

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 86: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-86-

[53] FastVDO-- httpwwwfastvdocom[54] Glance Networks---httpwwwglancenet[55] RADVISION-- wwwradvisioncom[56] Sun Microsystems--httpwwwsuncom[57] S Srinivasan et al ldquoWindows media video 9 Overview and applic

ationsrdquo Signal Processing Image Communication vol19 pp 851-875 Oct 2004

[57a] G Sullivan and T Wiegand ldquo Video compression ndash from concepts to H264AVC standardrdquo Proc IEEE vol93 pp 18-31 Jan 2005

[57b] C Gomila ldquo The H 264MPEG -4 AVC video coding standardrdquo Short tutorial EURASIP News Letter vol 15 pp 19-34 June 2004

[58] httpecsituch

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 87: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-87-

[59] N Kamaci and Y Altunbasak ldquo Performance comparison of the emerging H264 video coding standard with the existing standardsrdquo IEEE ICME pp Baltimore MD July 2003[60] H Schwartz D Marpe and T Wiegand ldquo SNRndashscalable extension of H264AVCrdquo ICIP 2004 vol pp Singapore Oct 2004[61] G J Sullivan P Topiwala and A Luthra ldquoThe H264AVC advanced video coding standard Overview and introduction to the fidelity range extensionsrdquo SPIE Conf on applications of digital image processing XXVII vol 5558 pp 53-74 Aug 2004[62] J Ostermann et al ldquo Video coding with H264AVC Tools performance and complexityrdquo IEEE CAS Magazine vol pp7-34 I quarter 2004 [63] W Gao et al ldquo AVS ndash The Chinese next-generation video coding standardrdquo NAB 2004 Las Vegas NV April 2004 [64] httpwwwimtcorgactivity_groups JVT-EXPERTS LIST (FAQ)

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 88: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-88-

[65] H264 AVC reference SOFWARE 93[66] httpiphomehhidesuehringtmldownloadjm93zip[67] S Kumar et al ldquoOverview of error resiliency schemes in

H264AVC standardrdquo JVCIR Special Issue on H264AVC VOL pp June-Aug 2005

[68] wwwstmicroelectronicscom WMV 9 and HD H264AVC decoder chip (STB7100)

[69] a Concept Mainhttpwwwmainconceptcomindex_flashshtmlb Mpegablehttpwwwmpegablecomshowhomehtmlc Moonlighthttpwwwmoonlightcoilcons_xmuxerphp

Moonlightrsquos codec is one of the popular ones in the industry and it supports AAC All the codecs have a trial version for download and also sample video clips are available

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 89: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-89-

[70] ST Thomson Broadcom and Atemehttpwwwatemecomproductsh264php

have decoder chips for H264 Ateme has real time single chip H264 Main profile encoder (FPGA)

[71] Moscow State University has published a study of current implementation of H264 standard including a widely-used implementation of MPEG-4 ASP as a referenceThe study is available at

httpcompressionruvideocodec_comparisonmpeg-4_avc_h264_enhtmlSome of the results and observations in the study may be interesting to H264AVC community

Another interesting test has been performed in December 2004httpwwwdoom9orgcodecs-104-1htm The

methodology is completely different than the one used by the Moscow State UniversityIt features H264 WM9 RV10 VP6 and MPEG-4 ASP

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 90: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-90-

httpwwwavc-allianceorg

httpftp3ituintav-archjvt-site

Httpwwwdvdforumorg29cmtg-resolutionhtmHigh Profile is now officially mandatory for HD DVD Video (DVD - Forum)

httptinyurlcom3u9ww (up to 3 recommendations can be downloaded per year)

httptinyurlcom6dnck (ISOIEC 14493-10 - MPEG-4 part 10 published standard costs CHF 26000 Swiss Franks)

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 91: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-91-

Fidelity Range ExtensionsSlices in a picture are compressed as followsdiams Intra spatial (block based) prediction

o Full-macroblock luma or chroma prediction ndash 4 modes (directions) for predictiono 8x8 (FRExt-only) or 4x4 luma prediction ndash 9 modes (directions) for prediction422 444 Formatsgt 8 bit depths(8x8) integer DCTHVS weighting matricesTransform bypass lossless mode uses prediction and entropy coding of prediction errorsResidual color transformSource editing such as Alpha blendingHigh bit rates [use RGB color format] Y Cg Co

High resolution

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 92: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-92-

diams Inter temporal prediction ndash block based motion estimation and compensation

o Multiple reference pictureso Reference B pictureso Arbitrary referencing ordero Variable block sizes for motion compensationSeven block sizes16x16 16x8 8x16 8x8 8x4 4x8 amp 4x4o 14-sample luma interpolation (14 or 18th-sample

chroma interpolation)o Weighted predictiono Frame or Field based motion estimation for interlaced

scanned video

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 93: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-93-

diams Interlaced coding featureso Frame-field adaptation

Picture Adaptive Frame Field (PicAFF)Choice of compression (frame or field) is selected a the frame levelMacroBlock Adaptive Frame Field (MBAFF)

o Field scandiams Lossless representation capability

o Intra PCM raw sample-value macroblockso Entropy-coded transform-bypass lossless

macroblocks (FRExt-only)

In the MBAFF choice of compression (frame or field) is selected at the two-vertical-pair-MB pair

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 94: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-94-

diams 8x8 (FRExt-only) or 4x4 Integer Inverse Transform (conceptually similar to the well-known DCT)

diams Residual color transform for efficient RGB coding without conversion loss or bit expansion (FRExt-only)

diams Scalar quantization

diams Encoder-specified perceptually weighted quantization scaling matrices (FRExt-only)

diams Logarithmic control of quantization step size as a function of quantization control parameter

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 95: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-95-

diams Deblocking filter (within the motion compensation loop)

diams Coefficient scanningo Zig-Zag (Frame)

o Field (alternate scan)

diams Lossless Entropy codingo Universal Variable Length Coding (UVLC) using Exp-Golomb codes

o Context Adaptive VLC (CAVLC)

o Context-based Adaptive Binary Arithmetic Coding (CABAC)

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 96: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-96-

diams Error Resilience Toolso Flexible Macroblock Ordering (FMO)

o Arbitrary Slice Order (ASO)

o Redundant Slices

diams SP and SI synchronization pictures for streaming and other uses

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 97: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-97-

diams Various color spaces supported (YCbCr of various types YCgCo RGB etc ndash especially in FRExt)

diams 420 422 (FRExt-only) and 444 (FRExt-only) color formats

diams Auxiliary pictures for alpha blending (FRExt-only)

Each slice need not use all these tools Depending upon the subset of these tools a slice can be I P B SP or SI A picture may contain different slice types

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 98: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-98-

Slice

I (Intra)

P (Predicted)

B (Bidirectionally predicted) (Reference for temporal prediction or non-reference)

SP (Switching P)

SI (Switching I)

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 99: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-99-

I ndash Slice(MB in I slice and intra MB in P and B slices)

Spatial intra prediction9 directional modes for (4x4) or (8x8) blocks

Apply (4 x4) or (8x8) IntDCT to Intra prediction errors

Note (8x8) IntDCT for FRExt-only

After (8x8) IntDCT HVS weighting is applied to coefficients (FRExt-only)

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 100: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-100-

Quantized transform coefficients are scanned (zigzag or field) and then entropy coded (CAVLC or CABAC)

PICAFF Field processing similar to frame mode

MBAFF If MB pair in field mode (frame mode) field (frame) neighbors are used for spatial prediction

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 101: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-101-

I Slice (Spatial Prediction)

bull(16x16) Luma amp Corresponding chroma block size for full MB prediction

bull(8x8) luma prediction (FRExt-only)

bull (4x4) Luma prediction

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 102: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-102-

For (16x16) luma full MB prediction has four modes

bull Vertical pels in MB predicted from pels just above of MB

bull Horizontal pels in MB predicted from pels just left of MB

bullDC pels in MB are predicted as average value of the neighboring pels

bullPlanar PredictionAssume MB covers diagonally increasing luma valuesPredictor is formed based upon the planar equation

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 103: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-103-

Chroma spatial prediction (operates on entire MB)

bull420 (8x8) Similar to (16x16) Luma MB prediction bull422 (8x16) Vertical Horizontal DC Planar

bull444 (16x16)

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 104: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-104-

For (8x8) luma intra predictionNine Intra_8x8 prediction modes similar to the nine modes for Intra_4x4

FRExt Only

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 105: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-105-

Integer 8x8 Transform (luma only)

FRExt Only

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 106: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-106-

FRExt OnlyHVS Weighting Matrices

Matrix can be transmitted in SPS and PPS Separate Matrix for 4x4 and 8x8 transforms Separate Matrix for Inter and IntraEncoder can design and use customized scaling matricesThese are to be sent to the decoder at the sequence or picture level

Default matrices

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 107: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-107-

HVS Weighting Matrices

Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization (This itself is a multiplication)

Weighting matrices can be customized separately for

4x4 Intra Y4x4 Intra Cb Cr

4x4 Inter Y4x4 Inter Cb Cr

8x8 Intra Y8x8 Inter Y

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 108: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-108-

Two scans similar to 4x4 transform switched for framefield codingCoefficient scanning is based on the decreasing variances and to

maximize number of zero-valued coefficients along the scan

Frame Zig-Zag Field

FRExt Only

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 109: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-109-

Examples of parameters to be encodedParameters Description

Sequence picture and Headers and parametersslice-layer syntax elements

Macroblock type mb_type Prediction method for each codedmacroblock

Coded block pattern Indicates which blocks within a macroblock contain coded coefficients

Quantiser parameter Transmitted as a delta value from the previous value of QP

Reference frame index Identify reference frame(s) for inter prediction

Motion vector Transmitted as a difference (mvd) from predicted motion vector

Residual data Coefficient data for each 4x4 or 2x2 block

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 110: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-110-

Exponential Golomb Codes (for data elements other than transform coefficients ndash these codes are actually fixed and are also called Universal Variable Length Codes (UVLC))

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 111: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-111-

These are variable length codes with a regular construction[ M Zeros] [ 1 ] [ INFO ]

INFO is an M-bit field carrying information The first codeword has no leading zero or trailing INFO

Code words 1 and 2 have a single-bit INFO field code words 3-6 have a two-bit INFO field and so on

The length of each Exp-Golomb codeword is (2M + 1) bitsM = Floor(log2 [ code_num + 1 ])

INFO = code_num + 1 ndash 2M

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 112: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-112-

Decoding

1 Read in M leading zeros followed by 12 Read M-bit INFO field3 Code_num = 2M + INFO ndash 1

CAVLC Codes transform coefficientsCABAC Code transform coefficients and MV

All other syntax elements are coded with the Exp_Golomb codes

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 113: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-113-

diams DVD Forum High Profile is mandatory for HD DVD players

diams The BD-ROM Video specification of the Blu-ray Disc Association FRExtentions are mandatory

diams The DVB (digital video broadcast) standards for European broadcast television For SD main is mandatory and high is optional For HD High is mandatory

ATSC has preliminarily selected high profile Several other environments may soon embrace it as well in the US and various designs for satellite and cable television

ADOPTIONS

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 114: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-114-

For applications such as content-contribution content-distribution and studio editing and post-processing

Use more than 8 bits per sample of source video accuracy

Use higher resolution for color representation than what is typical in consumer applications (ie 422 or 444 sampling as opposed to 420 chroma sampling format)

Perform source editing functions such as alpha blending (a process for blending of multiple video scenes best known for use in weather reporting where it is used to super- impose video of a newscaster over video of a map or weather-radar scene)

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 115: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-115-

Use very high bit rates

Use very high resolution

Achieve very high fidelity ndash even representing some parts of the video losslessly

Avoid color-space transformation rounding error

Use RGB color representation

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 116: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-116-

diams High profile (HP) supporting 8-bit video with 420 sampling addressing high-end consumer use and otherapplications using high-resolution video without a need for extended chroma formats or extended sample accuracy

diams High 10 profile (Hi10P) supporting 420 video with up to 10 bits of representation accuracy per sample

diams High 422 profile (H422P) supporting up to 422 chroma sampling and up to 10 bits per sample and

High Profiles

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 117: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-117-

diams High 444 profile (H444P) supporting up to 444 chroma sampling up to 12 bits per sample and additionally supporting efficient lossless region coding and an integer residual color transform for coding RGB video while avoiding color-space transformation error

All of these profiles support all features of the Main profile and additionally support an adaptive transform block size and perceptual quantization scaling matrices

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 118: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-118-

FRExt Only

422 MB

444 MB

MB structure in 422 and 444 formats

16 8 8

16

Y Cb Cr

16

16

16 16

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 119: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-119-

RGB Y Cb Cr

Y = KR R + (1 ndash KR ndash KB) G + KB B

KR = 02126 KB = 00722 KR + KB + KG = 1

Y = 02126 R + 07152 G + 00722 B

Cb = 05389 (B ndash Y) Cr = 07874 (R ndash Y)

(ITU-R RecBT601 defines KB=0114 KR=0299)

( )

2(1 )b

B

B YC

K

( )

2(1 )r

R

R YC

K

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 120: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-120-

Rounding error in RGB Y Cb Cr

FRExt Only YCgCo

Cg = Green Chroma Co = Orange ChromaTo further avoid any rounding error add only one bit of precision to chroma samples

1 ( )[ ]

2 21 ( )

[ ]2 2( )

2

g

o

R BY G

R BC G

R BC

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 121: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-121-

In 444 video FRExt has residual color transform

Keep RGB domain (same depth) for input output and stored reference pictures and use the forward and inverse color transformations inside the encoder and decoder for processing of the residual data only

Eliminates color-space conversion error without significantly increasing the overall complexity of the system

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 122: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-122-

Co = (R - B)

t = B + (Co gtgt 1)

Cg = G ndash t

Y = t + (Cg gtgt 1)Where t is an intermediate temporary variable and ldquogtgtrdquo denotes an arithmetic right shift operation

Inverse color space conversion t = Y ndash (Cg gtgt 1)

G + t + Cg

B = t ndash (Co gtgt 1)

R = B + Co

Forward color space conversion

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 123: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-123-

Auxiliary pictures which are extra monochrome pictures sent along with the main video stream and can be used for such purposes as alpha blend compositing (specified as a different category of data than SEI)

Film grain characteristics SEI which allow a model of film grain statistics to be sent along with the video data enabling an analysis-synthesis style of video enhancement wherein a synthesized film grain is generated as a post-process when decoding rather than burdening the encoder with the representation of exact film grain during the encoding process

SEI Supplemental Enhancement Information

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 124: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-124-

Deblocking filter display preference SEI which allows the encoder to indicate cases in which the pictures prior to the application of the deblocking filter process may be perceptually superior to the filtered pictures

Stereo video SEI indicators which allow the encoder to identify the use of the video on stereoscopic displays with proper identification of which pictures are intended for viewing by each eye

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 125: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-125-

lsquoHigherrsquo profile supports all capabilities of the lower ones Also capable of decoding all bit streams encoded for the lower

nested profilesAll high profiles support all features of the main profile

New Profiles in the H264AVC FRExt Amendment

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 126: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-126-

Levels in H264AVC

Level 1b added in FRExt For some 3G wireless environments

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 127: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-127-

Levels in H264AVC

1 If a picture size is smaller than the typical picture size then frame rate can be higher up to a maximum of 172 framessec

2 Horizontal and vertical maximum sizes cannot be more than sqrt[(Total of pixelsframe)x8]

3 If at a given level picture size is less than that in the table of reference frames for ME and MC can be up to 16

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 128: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-128-

To meet more demanding high fidelity applications

Compressed Bit Rate Multipliers for FRExt Profiles

Multipliers for fourth column of table in page 125

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 129: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-129-

24 Framessec film1920x1080 progressive

diams The High profile of FRExt produced nominally better video quality than MPEG-2 when using only one third as many bits (8 Mbps versus 24 Mbps)

diams The High profile of FRExt produced nominally transparent (ie difficult to distinguish from the original video without compression) video quality at only 16 Mbps

[9] T Wedi Y Kashiwagi ldquoSubjective quality evaluation of H264AVC FRExt for HD movie contentrdquo JVT document JVT-L033 July 2004

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 130: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-130-

Courtesy Advanced Technology Group of Motorola BCS

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 131: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-131-

Courtesy Advanced Technology Group of Motorola BCS

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 132: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-132-

Fig 7 (a) ndash (e) Comparison of R-D curves for MPEG-2 (MP2) MPEG-4 ASP (MP4 ASP) and H264AVC (MP4 AVC) I frames were inserted every 15 frames (N=15) and two non-reference B frames per reference I or P frame were used (M=3) Courtesy Advanced Technology Group of Motorola BCS

MP4 ASP yields 15 coding gain over MPEG-2

MPEG-4 AVC yields 20 coding gain over MPEG-2

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 133: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-133-

High profile at 8 Mbps nominally beat MPEG-2 at 24 Mbps

Nominally transparent video quality on 1080p24 at 16 Mbps

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 134: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-134-

(Fast VDO)Sub-optimal uses of B frames and other aspects make the plotted performance conservative for FRExt thus the remark in the figure about potential future performance

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 135: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-135-

High Profile DetailsDeblocking Filter CABAC Signaling

1048707 Deblocking Filterbull Only control of filter is adjusted do not filter 4x4 blocksbull No change to filter operation itself

1048707 CABACbull 61 new contexts and corresponding initialization valuesbull No change to CABAC engine

1048707 Signalingbull 8x8 transform onoff flag at PPS levelbull 8x8 transform onoff flag per macroblock allows adaptive use

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 136: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-136-

High vs Main Profile Summary High Profile contains

Main profile Adaptive MB level switching between 8x8 and 4x4 transform block

sizes Encoder specified perceptual based quantization scaling matrices Encoder specified separate control of each chroma component QP

Coding efficiency impact (measured as average bit-rate reduction) HD Film 12 HD Video (progressive) 12 HD Video (interlace) 4 (only 2 test clips) SD Video (interlace) 6

Complexity impact Implementation beyond Main Profile affects Intra prediction

transform deblocking filter control CABAC decoding No increase in computational requirements Slight increase in memory requirements (CABAC transform)

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 137: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-137-

Licensing of H264AVC Technology

Two patent pools to obtain the license 1 MPEGLA wwwmpeglacom2 Via licensing wwwvialicensingcom

These two patent pools do not guarantee that they cover the entire technology of H264 as participation of a patent owner in a patent pool is voluntary

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3

Page 138: -1- 2004. 10. 20. Overview of H.264 / MPEG-4 Part10 Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington.

-138-

AUDIO coding amp systems

H264 is limited to video Audio coder Bit rates Quality levels and of

channels ndash left to industry and standards groups (ATSC SCTE ARIB DVB etc)

DVB is considering AAC with SBR (AAC plus) ATSC has selected AC-3 plus from Dolby MPEG calls it HE-AAC (HE ndash High efficiency) ATSC SCTE ARIB MPEG etc will continue to use

MPEG-1 Audio MPEG-2 AAC and AC-3