Article About H264

4
www.edn.com December 11, 2003 | edn 73 H.264 DELIVERS TWICE-AS-GOOD COMPRESSION AND ENHANCED QUALITY. A s video goes digital, content producers, dis- tributors, and users are all demanding ever- higher quality and ever-larger display screens— in other words, more megapixels per second. Consider a typical TV-broadcast video stream char- acterized by 24-bit color and 720480-pixel reso- lution refreshing at 30 frames/sec. Uncompressed, it would require a bandwidth of greater than 248 Mbps. High-definition TV requires five times the bandwidth of standard-definition TV. Because the carrying capacity of most communication channels cannot keep up with pixel demand, video compres- sion has been the only option, particularly in the high-definition era. For more than a decade, standards from the ISO MPEG (International Standards Organization Mov- ing Picture Experts Group) and the ITU-T (Inter- national Telecommunications Union Telecommu- nications Committee) have successfully addressed the demand for high compression ratios. MPEG-2 has been the most successful to date, achieving mass-market acceptance in applications such as DVD players, cable- and satellite-digital TV, and set-top boxes. However, operators are continu- ously reducing the operating point, affecting image quality, and consumers are becoming increasingly aware of compression artifacts, such as blocking, ringing, and drifting. More recently, MPEG and ITU have developed MPEG-4 (Advanced Simple Profile) and H.263, respectively, but in the face of new mar- ket demands, they are ready for a successor. HDTV (high-definition TV) and video over IP (Internet Protocol) using an ADSL (asymmetrical- digital-subscriber-line) connection represent a set of bandwidth-hungry terrestrial-broadcast and wired applications. In the broadcast world, the cost of satellite transmission is increasing. It is be- coming increasingly evident that two- times-better compres- sion than MPEG-2 is the most cost-effective way to provide a suf- ficient number of lo- cal channels and to transmit HDTV. The same arguments are true for cable and even more imperative in Internet-content distribution. By all accounts, the HD-DVD will take over where DVD leaves off and start a multibillion-dollar mar- ket for players as long as video-compression tech- nology keeps pace with bandwidth demands. Mean- while, IEEE 802.11e WLAN (wireless-LAN) “hot spots” and in-home wireless networks, in which multiple users share bandwidth, present an even more daunting engineering challenge. Engineers will meet that challenge when they adopt an ITU-T- and ISO MPEG-approved standard. H.264/MPEG-4 AVC (Advanced Video Coding) will deliver a twofold improvement in compression ratio and improved quality. As such, it represents the most significant improvement in coding efficiency and quality since MPEG-2/H.262 (Reference 1). COMPRESSION BASICS Compression essentially identifies and eliminates redundancies in a signal and provides instructions for reconstituting the bit stream into a picture when the bits are uncompressed. The basic types of re- dundancy are spatial, temporal, psycho-visual, and statistical. “Spatial redundancy” refers to the corre- lation between neighboring pixels in, for example, a flat background.“Temporal redundancy” refers to the correlation of a pixel’s position between video frames. Typically, the background of a scene remains static in the absence of camera movement, so that you need not code and decode those pixels for every frame. Psycho-visual redundancy takes advantage of the varying sensitivities of the human visual system. The human eye is much more discriminating re- garding changes in luminance than chrominance, for example, so a system with this feature can discard Video compression’s quantum leap ESTIMATION TRANSFORM QUANTIZATION ENTROPY CODING LOOP FILTER INVERSE TRANSFORM INVERSE QUANTIZATION COMPRESSED VIDEO UNCOMPRESSED VIDEO H.264 computes the differences between actual incoming video and estimated/transformed video, using either motion estimation or intra- frame estimation. So, only the video and the difference appear in the compressed-video stream. designfeature By Didier LeGall, LSI Logic Figure 1

Transcript of Article About H264

Page 1: Article About H264

www.edn.com December 11, 2003 | edn 73

H.264 DELIVERS TWICE-AS-GOOD COMPRESSION

AND ENHANCED QUALITY.

As video goes digital, content producers, dis-tributors, and users are all demanding ever-higher quality and ever-larger display screens—

in other words, more megapixels per second.Consider a typical TV-broadcast video stream char-acterized by 24-bit color and 720�480-pixel reso-lution refreshing at 30 frames/sec. Uncompressed,it would require a bandwidth of greater than 248Mbps. High-definition TV requires five times thebandwidth of standard-definition TV. Because thecarrying capacity of most communication channelscannot keep up with pixel demand, video compres-sion has been the only option, particularly in thehigh-definition era.

For more than a decade, standards from the ISOMPEG (International Standards Organization Mov-ing Picture Experts Group) and the ITU-T (Inter-national Telecommunications Union Telecommu-nications Committee) have successfully addressedthe demand for high compression ratios.

MPEG-2 has been the most successful to date,achieving mass-market acceptance in applicationssuch as DVD players, cable- and satellite-digital TV,and set-top boxes. However, operators are continu-ously reducing the operating point, affecting imagequality, and consumers are becoming increasinglyaware of compression artifacts, such as blocking,ringing, and drifting. More recently, MPEG and ITUhave developed MPEG-4 (Advanced Simple Profile)and H.263, respectively, but in the face of new mar-ket demands, they are ready for a successor.

HDTV (high-definition TV) and video over IP(Internet Protocol) using an ADSL (asymmetrical-digital-subscriber-line) connection represent a set ofbandwidth-hungry terrestrial-broadcast and wiredapplications. In the broadcast world, the cost ofsatellite transmissionis increasing. It is be-coming increasinglyevident that two-times-better compres-sion than MPEG-2 isthe most cost-effectiveway to provide a suf-ficient number of lo-cal channels and totransmit HDTV. The

same arguments are true for cable and even moreimperative in Internet-content distribution.

By all accounts, the HD-DVD will take over whereDVD leaves off and start a multibillion-dollar mar-ket for players as long as video-compression tech-nology keeps pace with bandwidth demands. Mean-while, IEEE 802.11e WLAN (wireless-LAN) “hotspots” and in-home wireless networks, in whichmultiple users share bandwidth, present an evenmore daunting engineering challenge. Engineers willmeet that challenge when they adopt an ITU-T- andISO MPEG-approved standard. H.264/MPEG-4AVC (Advanced Video Coding) will deliver a twofoldimprovement in compression ratio and improvedquality. As such, it represents the most significantimprovement in coding efficiency and quality sinceMPEG-2/H.262 (Reference 1).

COMPRESSION BASICS

Compression essentially identifies and eliminatesredundancies in a signal and provides instructionsfor reconstituting the bit stream into a picture whenthe bits are uncompressed. The basic types of re-dundancy are spatial, temporal, psycho-visual, andstatistical. “Spatial redundancy” refers to the corre-lation between neighboring pixels in, for example,a flat background.“Temporal redundancy” refers tothe correlation of a pixel’s position between videoframes. Typically, the background of a scene remainsstatic in the absence of camera movement, so thatyou need not code and decode those pixels for everyframe. Psycho-visual redundancy takes advantage ofthe varying sensitivities of the human visual system.The human eye is much more discriminating re-garding changes in luminance than chrominance, forexample, so a system with this feature can discard

Video compression’s quantum leap

ESTIMATION TRANSFORM QUANTIZATIONENTROPYCODING

LOOPFILTER

INVERSETRANSFORM

INVERSEQUANTIZATION

COMPRESSEDVIDEO

UNCOMPRESSEDVIDEO

H.264 computes the differences between actual incoming video andestimated/transformed video, using either motion estimation or intra-

frame estimation. So, only the video and the difference appear in the compressed-video stream.

designfeature By Didier LeGall, LSI Logic

F igure 1

Page 2: Article About H264

designfeature Video compression

74 edn | December 11, 2003 www.edn.com

some color-depth information, and view-ers do not recognize the difference. Sta-tistical redundancy uses a more compactrepresentation for elements that fre-quently recur in a video, thus reducingthe overall size of the compressed signal.

Removing temporal redundancies isresponsible for a significant percentageof all the video compression that you canachieve.Although H.264 makes advancesin removing temporal redundancies, it isalso better across the board, thanks to theadoption of innovative techniques.

INTO THE FUTURE WITH H.264

Video-compression schemes todayfollow a common set of interactive op-erations. First, they segment the videoframe into blocks of pixels. Also, theschemes estimate frame-to-frame mo-tion of each block to identify temporal orspatial redundancy, within the frame. Inanother operation, an algorithmic DCT(discrete cosine transform) decorrelatesthe motion-compensated data to pro-duce an expression with the lowest num-ber of coefficients, reducing spatial re-dundancy. The video-compressionscheme then quantizes the DCT coeffi-cients based on a psycho-visual redun-dancy model. Entropy coding then re-moves statistical redundancy, reducingthe average number of bits necessary torepresent the compressed video. Coding,or rate, control—also known as modedecision—comes into play to select themost efficient mode of operation. Figure1 provides an overview of coding.

MOTION ESTIMATION

Estimating the movement of blocks ofpixels from frame to frame and codingthe displacement vector—not the detailsof the blocks themselves—reduce oreliminate temporal redundancy. To start,the compression scheme divides thevideo frame into blocks.Whereas MPEG-2 uses only 16�16-pixel motion-com-pensated blocks, or macroblocks, H.264provides the option of motion compen-sating 16�16-, 16�8-, 8�16-, 8�8-,8�4-, 4�8-, or 4�4-pixel blocks within

each macroblock. The scheme accom-plishes motion estimation by searchingfor a good match for a block from the cur-rent frame in a previously coded frame.The resulting coded picture is a P-frame.

The estimate may also involve com-bining pixels resulting from the search oftwo frames. In this case, the coded pic-ture, or B-frame. (In MPEG-2, these twoframes must be one temporally previousframe and one temporally future frame,whereas H.264 generalizes B-frames, thusremoving this restriction.) Searching isan important aspect of the process be-cause it must try to ascertain the bestmatch for where the block has movedfrom one frame to the next.

To substantially improve the process,you can use subpixel motion estimation,which defines fractional pixels. UnlikeMPEG-2, which offers half-pixel accura-cy, H.264 uses quarter-pixel accuracy forboth the horizontal and the vertical com-ponents of the motion vectors.

H.264 uses P- and B-frames to detectand code periodic motion. Although B-macroblocks often give better perform-ance than P-macroblocks, using them ina traditional manner delays decoding.This delay occurs because H.264 mustdecode the future P-frames before tem-porally decoding preceding B-frames. Byusing multiple frames, H.264 delivers su-perior performance for translational mo-tion and occlusions.

For blocks that are poorly represented

in previously decoded frames, due tosuch actions as camera panning or mov-ing objects uncovering previously unseenbackground, motion compensationyields little significant compression ben-efit. In these instances, H.264 capitalizeson intraframe estimation to eliminatespatial redundancies. By also removingspatial redundancy in the pixel domaininstead of exclusively in the frequencydomain, as its predecessors do, H.264achieves significantly better compressionthat is comparable to that of the JPEG-2000 still-image compression standard.

Intraframe estimation operates at thepixel-block level and attempts to predictthe current block by extrapolating theneighboring pixels from adjacent blocksin a defined set of directions. The methodthen codes the difference between thepredicted block and the actual block. In-traframe estimation is particularly usefulin coding flat backgrounds (Figure 2).

DOMAIN TRANSFORMATION

Perhaps the best known aspect of pre-vious MPEG and H.26x standards is theuse of DCTs to transform the video in-formation that results from motion andintraframe estimation into the frequen-cy domain in preparation for quantiza-tion. The widely used 8�8 DCT assumesa numerically accurate implementation,akin to floating point. This implementa-tion leads to problems when you use mis-matched inverse-DCT implementations

(a) (b)

TABLE 1—COMPARISON OF H.264-ENTROPY-CODING APPROACHESCharacteristics VLC CABACWhere it is used MPEG-2, MPEG-4, ASP H.264/MPEG-4 AVC (high-efficiency option)Probability distribution Static: probabilities never change Adaptive: adjusts probabilities based on actual dataLeverages correlation between symbols No: conditional probabilities ignored Yes: exploits symbol correlations by using "contexts"Noninteger code words No: Low coding efficiency for high-probability symbols Yes: exploits "arithmetic coding," which generates noninte-

ger code words for higher efficiency

Intraframe estimation operates at the pixel-block level and attempts to pre-dict the current block by extrapolating the neighboring pixels from adjacent

blocks in a defined set of different directions (a). It then codes the difference between the predictedblock and the actual block (b).

F igure 2

Page 3: Article About H264

designfeature Video compression

76 edn | December 11, 2003 www.edn.com

in the encoder and the decoder. This mis-match causes “drifting,” which results invisible degradations particularly apparentat low bit rates in streaming applications.

In a significant innovation, H.264 usesa DCT-like 4�4 integer transform totranslate the motion-compensated datainto the frequency domain. A key advan-tage of switching to the new algorithm isthat the smaller block reduces blockingand ringing artifacts. In addition, integercoefficients eliminate the rounding errorsinherent with floating-point coefficients.Rounding errors can cause drifting arti-facts in MPEG-2 and MPEG-4 ASP.

QUANTIZATION

Psycho-visual redundancy comesabout because of the human eye’s acutesensitivity to slow and linear changes(constant low-frequency information)and its relative insensitivity to high-fre-quency information, such as busy tex-tures. The human eye has a lower sensi-tivity to spatial resolution in the chromi-nance signal; in consumer-video appli-cations, therefore, the system commonlysubsamples chrominance signal by a fac-tor of two, both horizontally and verti-cally. The output of the transform stepcompletely represents information for allfrequency levels, providing another op-portunity for compression.

The quantization process eliminateshigh-frequency informa-tion by mapping, or quan-tizing, each DCT coefficientto a discrete set of levels. InH.264, the 4�4 transformexpands to an 8�8 trans-form for chroma-predictedblocks and to a 16�16transform for some luma-predicted blocks by apply-ing second-level 2�2 and4�4 integer transforms,respectively, to the lowestfrequency information towhich the human eye ismost sensitive. You use thesmaller transforms for thequantization of chromasamples because the chro-ma samples are already dec-imated at a 2-to-1 ratio. Thelarger transform for theluma samples reduces fi-delity, but the human eyecannot discern the result.Quantization is also useful

in controlling the bit rate by selectivelyeliminating visual information.

ENTROPY CODING

The entropy-coding stage maps sym-bols representing motion vectors, quan-tized coefficients, and macroblock head-ers into actual bits (Figure 3). In com-pression standards, all entropy codingshares a common goal: to reduce the av-erage number of bits necessary to repre-sent the compressed video. Previousstandards accomplished this task by us-ing VLC (variable-length-code) tables.The goal of VLC tables is to ensure thatyou use shorter code words for more fre-quently occurring symbols, such as smallcoefficient values. But you can also usearithmetic coding instead of VLC tables,and H.264 introduced this concept forthe first time for video compression, eventhough it was an option in the JPEG stan-dard for still images.

In either scheme, before entropy cod-ing can begin, the system serializes thequantized DCT coefficients into a 1-D ar-ray by scanning them in zigzag order. Theresulting serialization places the dc coef-ficient first, and the ac coefficients followin low- to high-frequency order. Becausehigher frequency coefficients tend to bezero (the result of the quantizationprocess), the system uses run-length en-coding to group adjacent zeros, which re-

sults in more efficient entropy coding.In MPEG-2, serialization depends on

whether the coefficients originate froma motion-estimated or an intraframe-es-timated macroblock. In H.264, serializa-tion depends only on whether the coef-ficients originate from samples codedfrom the same video field (a field mac-roblock) or from a frame containingboth the top and bottom video fields (aframe macroblock).

H.264 introduced CABAC (context-adaptive binary-arithmetic coding),which is more efficient than VLC for sym-bol probabilities greater than 50% be-cause it allows you to represent a symbolwith less than one bit. CABAC managesthis task by adapting to the changingprobability distribution of symbols andby exploiting correlations between sym-bols. Table 1 illustrates the differences be-tween VLC and CABAC. H.264 also sup-ports CAVLC (context-adaptive variable-length coding), which is superior to VLCwithout the full cost of CABAC.

DEBLOCKING LOOP FILTER

Introducing more options and modesinto the encoding algorithm also intro-duces more opportunities for disconti-nuities.Artifacts occur, for example, whenyou code adjacent macroblocks using dif-ferent modes. The encoder may make in-dependent decisions to compress macro-

blocks using the motion-esti-mation (interframe) mode,the spatial (intraframe) mode,or skip mode (skipping themacroblocks altogether). As aresult, the pixels adjacent totwo blocks compressed in dif-ferent modes can have differ-ent values, even though theyshould be similar. Artifactscan also occur around blockboundaries by the transfor-mation/quantization processand motion-vector differencesbetween blocks.

To eliminate these artifacts,H.264 defines a deblockingfilter that operates on both16�16-macroblocks and 4�4-block boundaries. In thecase of the macroblocks, thefilter eliminates artifacts re-sulting from motion or in-traframe estimation or differ-ent quantizer scales. In thecase of the smaller blocks, the

The entropy-coding stage maps symbols represent-ing motion vectors, quantized coefficients, and

macroblock headers into actual bits.

C1, C2, C3, C4

C1, C2, C3, C4

ENDING POINT(HIGHESTSPECTRAL

FREQUENCY)

OUTGOING STREAMOF COEFFICIENTS

OUTGOING STREAMOF COEFFICIENTS

RUN-LENGTH ENCODINGTRUNCATES TRAILING ZEROS

HIGHER FREQUENCYCOEFFICIENTS TEND

TO BE ZERO

INCOMING4�4 BLOCK

(INTRAFRAMEESTIMATION)

INCOMING4�4 BLOCK

(MOTIONESTIMATION)

STARTING POINT(LOWEST SPECTRAL

FREQUENCY)

F igure 3

Page 4: Article About H264

designfeature

78 edn | December 11, 2003 www.edn.com

filter removes artifacts that transforma-tion/quantization and motion-vectordifferences between adjacent blockscause. Generally, the loop filter modifiesthe two pixels on either side of theboundary using a content-adaptive,nonlinear filter. (Both the decoder andthe decoding loop that replicates withineach encoder use the deblocking filter—hence, the term “loop filter.”) The resultis not only improved visual quality, butalso, because motion compensation usesdeblocked decoded frames, improvedcoding efficiency.

MODEST COMPLEXITY INCREASE

As mentioned, H.264 achieves com-pression ratios that are two to three timesbetter than those of its immediate pred-ecessors. You must balance these gains,however, against the increase in thecomplexity of the algorithm as well as itsimplementation in silicon. The result isa manageable increase when you com-pare it to MPEG-2; so the trade-offs havebeen positive for the industry. Neverthe-less, implementation is far from trivialand demands a thorough understandingof the standard and the silicon-designand -fabrication process. Striking exam-ples are the new options available forcompression coding, which requires away to fit together macroblocks that youhave compressed using different modes.Implementing this feature and otherH.264 features in silicon requires yearsof experience to achieve the most cost-effective silicon approach.�

Reference1. Côté, Guy, and Lowell Winger, “Re-

cent Advances in Video CompressionStandards,” IEEE Canadian Review,Spring 2002.

Author’s biographyDidier LeGall is vice president of engi-neering and business development for LSILogic’s Broadband Entertainment Divi-sion. He is in charge of the engineering andbusiness-development activities for digi-tal-video products, including DVD, videoperipheral, PVR/DVR, and video produc-tion and broadcasting. He has been in-volved with ISO’s MPEG-standardizationeffort since its inception and served aschairman of the MPEG-Video group un-til 1995. He holds a doctorate in electricalengineering from the University of Cali-fornia—Los Angeles.