Digital Video Image Quality and Perceptual Coding

615

Transcript of Digital Video Image Quality and Perceptual Coding

  • Digital VideoImage Quality

    and Perceptual

    Coding

    2006 by Taylor & Francis Group, LLC

  • Signal Processing and Communications

    Editorial Board

    Maurice G. Ballanger, Conservatoire Nationaldes Arts et Mtiers (CNAM), Paris

    Ezio Biglieri, Politecnico di Torino, Italy

    Sadaoki Furui, Tokyo Institute of Technology

    Yih-Fang Huang, University of Notre Dame

    Nikil Jayant, Georgia Institute of Technology

    Aggelos K. Katsaggelos, Northwestern University

    Mos Kaveh, University of Minnesota

    P. K. Raja Rajasekaran, Texas Instruments

    John Aasted Sorenson, IT University of Copenhagen

    1. Digital Signal Processing for Multimedia Systems, edited by Keshab K. Parhi and Takao Nishitani

    2. Multimedia Systems, Standards, and Networks, edited by Atul Puri and Tsuhan Chen

    3. Embedded Multiprocessors: Scheduling and Synchronization, SundararajanSriram and Shuvra S. Bhattacharyya

    4. Signal Processing for Intelligent Sensor Systems, David C. Swanson5. Compressed Video over Networks, edited by Ming-Ting Sun and Amy R. Reibman6. Modulated Coding for Intersymbol Interference Channels, Xiang-Gen Xia7. Digital Speech Processing, Synthesis, and Recognition: Second Edition,

    Revised and Expanded, Sadaoki Furui8. Modern Digital Halftoning, Daniel L. Lau and Gonzalo R. Arce9. Blind Equalization and Identification, Zhi Ding and Ye (Geoffrey) Li10. Video Coding for Wireless Communication Systems, King N. Ngan, Chi W. Yap,

    and Keng T. Tan11. Adaptive Digital Filters: Second Edition, Revised and Expanded,

    Maurice G. Bellanger12. Design of Digital Video Coding Systems, Jie Chen, Ut-Va Koc, and K. J. Ray Liu13. Programmable Digital Signal Processors: Architecture, Programming,

    and Applications, edited by Yu Hen Hu14. Pattern Recognition and Image Preprocessing: Second Edition, Revised

    and Expanded, Sing-Tze Bow15. Signal Processing for Magnetic Resonance Imaging and Spectroscopy,

    edited by Hong Yan

    2006 by Taylor & Francis Group, LLC

  • 16. Satellite Communication Engineering, Michael O. Kolawole17. Speech Processing: A Dynamic and Optimization-Oriented Approach, Li Deng18. Multidimensional Discrete Unitary Transforms: Representation: Partitioning

    and Algorithms, Artyom M. Grigoryan, Sos S. Agaian, S.S. Agaian19. High-Resolution and Robust Signal Processing, Yingbo Hua, Alex B. Gershman

    and Qi Cheng20. Domain-Specific Processors: Systems, Architectures, Modeling, and Simulation,

    Shuvra Bhattacharyya; Ed Deprettere; Jurgen Teich21. Watermarking Systems Engineering: Enabling Digital Assets Security

    and Other Applications, Mauro Barni, Franco Bartolini22. Biosignal and Biomedical Image Processing: MATLAB-Based Applications,

    John L. Semmlow23. Broadband Last Mile Technologies: Access Technologies for Multimedia

    Communications, edited by Nikil Jayant24. Image Processing Technologies: Algorithms, Sensors, and Applications,

    edited by Kiyoharu Aizawa, Katsuhiko Sakaue and Yasuhito Suenaga25. Medical Image Processing, Reconstruction and Restoration: Concepts

    and Methods, Jiri Jan26. Multi-Sensor Image Fusion and Its Applications, edited by Rick Blum

    and Zheng Liu27. Advanced Image Processing in Magnetic Resonance Imaging, edited by

    Luigi Landini, Vincenzo Positano and Maria Santarelli28. Digital Video Image Quality and Perceptual Coding, edited by

    H.R. Wu and K.R. Rao

    2006 by Taylor & Francis Group, LLC

  • Digital VideoImage Quality

    andPerceptual

    Coding

    edited by

    H.R. Wu and K.R. Rao

    A CRC title, part of the Taylor & Francis imprint, a member of theTaylor & Francis Group, the academic division of T&F Informa plc.

    Boca Raton London New York

    2006 by Taylor & Francis Group, LLC

  • Published in 2006 byCRC PressTaylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

    2006 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group

    No claim to original U.S. Government worksPrinted in the United States of America on acid-free paper10 9 8 7 6 5 4 3 2 1

    International Standard Book Number-10: 0-8247-2777-0 (Hardcover) International Standard Book Number-13: 978-0-8247-2777-2 (Hardcover) Library of Congress Card Number 2005051404

    This book contains information obtained from authentic and highly regarded sources. Reprinted material isquoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable effortshave been made to publish reliable data and information, but the author and the publisher cannot assumeresponsibility for the validity of all materials or for the consequences of their use.

    No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, andrecording, or in any information storage or retrieval system, without written permission from the publishers.

    Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registrationfor a variety of users. For organizations that have been granted a photocopy license by the CCC, a separatesystem of payment has been arranged.

    Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used onlyfor identification and explanation without intent to infringe.

    Library of Congress Cataloging-in-Publication Data

    Digital video image quality and perceptual coding / edited by Henry R. Wu, K.R. Rao.p. cm. -- (Signal processing and communications)

    Includes bibliographical references and index.ISBN 0-8247-2777-01. Digital video. 2. Imaging systems--image quality. 3. Perception. 4. Coding theory. 5. Computer

    vision. I. Wu, Henry R. II. Rao, K. Ramamohan (Kamisetty Ramamohan) III. Series.

    TK6680.5.D55 2006006.6'96--dc22 2005051404

    Visit the Taylor & Francis Web site at

    and the CRC Press Web site at Taylor & Francis Group is the Academic Division of Informa plc.

    2006 by Taylor & Francis Group, LLC

    For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,

    http://www.taylorandfrancis.com

    http://www.crcpress.com

    http://www.copyright.comhttp://www.copyright.comhttp://www.taylorandfrancis.comhttp://www.crcpress.com
  • To those who have pioneered, inspired and persevered.

    2006 by Taylor & Francis Group, LLC

  • Copyrights release for ISO/IEC:All the figures and tables obtained from ISO/IEC used in this book are subject to the follow-ing: The terms and definitions taken from the Figures and Tables ref. ISO/IEC IS11172,ISO/IEC11172-2, ISO/IEC 11172-73, ISO/IEC IS13818-2, ISO/IEC IS13818-6, ISO/IEC JTC1/SC29WG11 Doc N2196, ISO/IEC JTC1 /SC29/WG11 N3536, ISO/MPEG N2502 ISO/IECJTC1/SC29/WG1 Doc. N1595, ISO/IEC JTC SC29/WG11 Doc.2460, ISO/IEC JTC SC29/WG11Doc.3751, ISO/MPEG N2501, ISO/IEC JTC-1 SC29/WG11 M5804, ISO/IEC JTC SC29/WG11,Recomm.H.262 ISO/IEC 13818-2, ISO/IEC IS 13818-1, ISO/MPEG N3746, ISO/IEC Doc.N2502, ISO/IEC JTC1/SC29/WG11, Doc. N2424 are reproduced with permission of Inter-national Organization for Standardization, ISO. These standards can be obtained from any ISOmember and from the Web site of the ISO Central Secretariat at the following address:

    The terms and definitions taken from the Figure ref. ISO/IEC 14496-1:1999, ISO/IEC 14496-2:1999, ISO/IEC 14496-3:1999, ISO/IEC 14496-4:2000, ISO/IEC 14496-5:2000, ISO/IEC 14496-10 AVC: 2003 are reproduced with the permission of the International for Standardization, ISO.These standards can be obtained from any ISO member and from the Web site of ISO Cen-

    editors, authors and Taylor and Francis are grateful to ISO/IEC for giving the permission.

    About Shannon image on the front cover:The original image of Claude E. Shannon, the father of information theory, was provided byBell Laboratories of Lucent Technologies. It was compressed using the JPEG coder that isinformation lossy. Its resolution was 41815685 pixels with white margins around all sides.This digital image had scanning dust.

    The original image was cropped down to 41765680 for compression and the scanning dustwas removed by copying the surrounding pixels over the dust. The resultant image was usedas the original image to produce the compressed Shannon image using the perceptual lossless

    image compression. The PLIC was benchmarked against the JPEG-LS and the JPEG-NLS (d=2).

    The coding error images were produced between the cropped original Shannon image and thecompressed images. In the error images, the red color represents a positive error, the blue anegative error and the white a zero error. The black color is not actually black, it is small valuedred or blue color. Comparing the two error images, it can be appreciated how the PLIC uses thehuman vision model to achieve perceptual lossless coding with a higher compression ratio thanthe benchmarks, whilst maintaining the visual fidelity of the original picture.

    The top image is the original image that can be compressed to 3.179 bpp using the JPEG-LS.The mid-right image is the JPEG-NLS (i.e., JPEG near lossless) compressed at d=2 with Bitrate= 1.424 bpp, Compression Ratio = 5.6180:1, MSE = 1.9689 and PSNR = 45.1885 dB. The mid-left image is the difference image between the original and that compressed by the JPEG-NLS(d=2). The bottom-right is the PLIC compressed with Bitrate = 1.370 bpp, Compression Ratio =5.8394:1, MSE = 2.0420 and PSNR = 45.0303 dB. The bottom-left image is a difference imagebetween the original and that compressed by the PLIC.

    2006 by Taylor & Francis Group, LLC

    http://www.iso.org. Non-exclusive copyright remains with ISO.

    tral Secretariat at the following address: www.iso.org. Non-exclusive copyright with ISO. The

    image coder (PLIC) as described in Chapter 13 with an implementation intended for medical

    http://www.iso.orghttp://www.iso.org
  • Contributors

    Alan C. Bovik, University of Texas at Austin, Austin, Texas, U.S.A.

    Jorge E. Caviedes, Intel Corporation, Chandler, Arizona, U.S.A.

    Tao Chen, Panasonic Hollywood Laboratory, Universal City, California, U.S.A.

    Francois-Xavier Coudoux, Universite de Valenciennes, Valenciennes, Cedex, France.

    Philip J. Corriveau, Intel Media and Acoustics Perception Lab, Hillboro, Oregon,U.S.A.

    Mark D. Fairchild, Rochester Institute of Technology, Rochester, New York, U.S.A.

    Marc G. Gazalet, Universite de Valenciennes, Valenciennes, Cedex, France.

    Jae Jeong Hwang, Kunsan National University, Republic of Korea.

    Michael Isnardi, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

    Ryoichi Kawada, KDDI R&D Laboratories Inc., Japan.

    Weisi Lin, Institute for Infocomm Research, Singapore.

    Jeffrey Lubin, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

    Makoto Miyahara, Japan Advanced Institute of Science and Technology, Japan.

    Ethan D. Montag, Rochester Institute of Technology, Rochester, New York, U.S.A.

    Franco Oberti, Philips Research, The Netherlands.

    Albert Pica, Sarnoff Corporation Inc., Princeton, New Jersey, U.S.A.

    K. R. Rao, University of Texas at Arlington, Arlington, Texas, U.S.A.

    Hamid Sheikh, Texas Instruments, Inc., Dallas, Texas, U.S.A.

    Damian Marcellinus Tan, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.

    Zhou Wang, University of Texas at Arlington, Arlington, Texas, U.S.A.

    Stefan Winkler, Genista Corporation, Montreux, Switzerland.

    ix 2006 by Taylor & Francis Group, LLC

  • x Digital Video Image Quality and Perceptual Coding

    Hong Ren Wu, Royal Melbourne Institute of Technology, Melbourne,Victoria, Australia.

    Zhenghua Yu, National Information Communication Technology Australia (NICTA).

    Michael Yuen, ESS Technology, Inc., Beijing, China

    Jian Zhang, National Information Communication Technology Australia (NICTA).

    2006 by Taylor & Francis Group, LLC

  • Acknowledgments

    The editors, H. R. Wu and K. R. Rao, would like to thank all authors of this handbookfor their contributions, efforts and dedication without which this book would not havebeen possible.

    The editors and the contributors have received assistance and support from many ofour colleagues that has made this handbook, Digital Video Image and Perceptual Cod-ing, possible. The generous assistance and support includes valuable information andmaterials used in and related to the book, discussions, feedback, comments on and proofreading of various parts of the book, recommendations and suggestions that shaped thebook as it is. Special thanks are due to the following persons:

    M. Akgun Communications Research Center, CanadaJ. F. Arnold Australian Defence Force AcademyB. Baxter Intel CorporationJ. Cai Nanyang Technological UniversityN. Corriveau Spouse of P. CorriveauS. Daly Sharp Laboratories of AmericaM. Frater Australian Defence Force AcademyN. G. Kingsbury University of CambridgeL. Lu IBM T. J. Watson Research CenterZ. Man Nanyang Technological UniversityS. K. Mitra University of California, Santa BarbaraK. N. Ngan The Chinese University of Hong KongE. P. Simoncelli New York UniversityC.-S. Tan Royal Melbourne Institute of TechnologyA. Vincent Communications Research Center, CanadaM. Wada KDDI R&D LaboratoriesB. A. Wandell Stanford UniversityS. Wolf Institute for Telecommunication SciencesD. Wu Royal Melbourne Institute of TechnologyC. Zhang Nanyang Technological UniversityZ. Zhe Monash UniversityAll members VQEG

    A. C. Bovik acknowledges the support by the National Science Foundation undergrant CCR-0310973.

    xi 2006 by Taylor & Francis Group, LLC

  • xii Digital Video Image Quality and Perceptual Coding

    H. R. Wu and Z. Yu acknowledge the support by Australian Research Council undergrant A49927209.

    Assistance and support to this book project which H. R. Wu received from MonashUniversity where he lectured from 1990 to 2005, and from Nanyang Technological Uni-versity where he spent his sabbatical from 2002 to 2003, are gratefully acknowledged.

    Special thanks go to David Wu of Royal Melbourne Institute of Technology for hisassistance in producing the final LATEX version of this handbook and the compressedShannon images shown on the front cover of the book.

    H. R. Wu and K. R. Rao would like to express their sincere gratitude to B. J. Clark,publishing consultant at CRC Press LLC, who initiated this book project and whoseprofessional advice and appreciation of efforts which were involved in this undertakingmade the completion of this book possible. Sincere thanks also go to Nora Konopka,our Publisher, and Jessica Vakili, our Project Coordinator, at CRC Press LLC for theirpatience, understanding and unfailing support that helped see this project through. Weare most grateful to our Project Editor, Susan Horwitz, whose professional assistancehas made significant improvement to the books presentation. The work by NicholasYancer on the back cover of the book and the brilliant cover design by Jonathan Pennellare greatly appreciated.

    Last but not least, without the patience and forbearance of our families, the prepara-tion of this book would have been impossible. We greatly appreciate their constant andcontinuing support and understanding.

    2006 by Taylor & Francis Group, LLC

  • Preface

    The outset of digital video image coding research is commonly acknowledged [Cla95]to be around 1950, marked by Goodalls paper on television by pulse code modula-tion (or PCM) [Goo51, Hua65], Cutlers patent on differential quantization of com-munication signals (commonly known as differential pulse code modulation or DPCMfor short) [Cut52], Harrisons paper on experiments with linear prediction in televi-sion [Har52], and Huffmans paper on a method for the construction of minimum re-dundancy codes (commonly known as Huffman coding) [Huf52]; notwithstanding thatsome of the pioneering work on fundamental theories, techniques and concepts in digi-tal image and video coding for visual communications can be traced back to Shannonsmonumental work on the mathematical theory of communication in 1948 [Sha48], Ga-bors 1946 paper on theory of communication [Gab46] and even as early as the late1920s when Kell proposed the principle of frame difference signal transmission in aBritish patent [Kel29, SB65, Sey63, Gir03]. While international standardization of dig-ital image and video coding [RH96] might be considered by many as the end of an eraor, simply, research in the area, for others it presents new challenges and signals the be-ginning of a new era, or more precisely, it is high time that we addressed and, perhaps,solved a number of long standing open problems in the field.

    A brief review of the history and the state-of-the-art of research in the field will revealthe fundamental concepts, principles and techniques used in image data compressionfor storage and visual communications. An important goal that was set fairly earlyby forerunners in image data compression is to minimize statistical (including sourcecoding, spatio-temporal and inter-scale) and psychovisual (or perceptual) redundanciesof the image data to either comply with a certain storage or communications bandwidthrestrictions or limitations with the best possible picture quality, or to provide a certainpicture quality service with the lowest possible amount of data or bit rate [Sey62]. Ithelped to set the course and to raise a series of issues widely researched, which haveinspired and, in many ways, frustrated generations of researchers in the field. Some ofthese issues and associated problems are better researched, understood and solved thanothers.

    Using information theory and optimization techniques, we understand reasonablywell the definition of statistical redundancy and what is the theoretical lower bound setby Shannons entropy in lossless image and video coding [Sha48, JN84]. We have statis-tically modelled natural image data fairly well, which has led to various optimal or sub-

    xiii 2006 by Taylor & Francis Group, LLC

  • xiv Digital Video Image Quality and Perceptual Coding

    optimal compression techniques in the least mean square sense [Sey62, Cla85, NH95].We routinely apply the rate-distortion theory with the mean squared error (MSE) as adistortion measure in design of constant bit rate coders. We have pushed the perfor-mance of a number of traditional compression techniques, such as predictive and trans-form coding, close to their limit in terms of decorrelation and energy packing efficien-cies. Motion compensated prediction has been thoroughly investigated for inter-framecoding of video and image sequences, leading to a number of effective and efficientalgorithms used in practical systems.

    In model-, object- or segmentation-based coding, we have been trying to balancebit allocations between coding of model parameters and that of the residual image, butwe have yet to get it right. Different from classical compression algorithms, techniquesbased on matching pursuit, fractal transforms and projection on to convex sets are re-cursive, and encode transform or projection parameters instead of either pixel or trans-form coefficient values. Nevertheless, they have failed so far to live up to their greatexpectations in terms of rate-distortion performance in practical coding systems andapplications. We have long since realized that much higher compression ratios can beachieved than what is achievable by the best lossless coding techniques or the theoreticallower bound set by the information theory without noticeable distortion when viewedby human subjects. Various adaptive quantization and bit allocation techniques and al-gorithms have been investigated to incorporate some of the aspects of the human visualsystems (HVS) [CS77, CP84, Cla85], most of which focus on spatial contrast sensitivityand masking effects. Various visually weighted distortion measures have been also ex-plored in either performance evaluation [JN84] or rate-distortion optimization of imageor video coders [Tau00].

    Limited investigations have been conducted in constant quality coder design, im-peded by the lack of a commonly acceptable quality metric which correlates well withsubjective or perceived quality indices, such as the mean opinion score (MOS) [ITU98].Long has the question been asked, Whats wrong with mean-squared error? [Gir84],as well as its derivatives such as the peak signal to noise ratio (PSNR), as the qualityor distortion measure. Nonetheless, obtaining credible and widely acceptable alterna-tive perceptual based quantitative quality and/or impairment metrics have so far eludedus till most recently [LB82, VQE00]. Consequently, attempts and claims of provid-ing users with guaranteed or constant quality visual services have been by and largeunattainable or unsubstantiated. Lacking HVS-based quantitative quality or impairmentmetrics, more often than not we opt for a much higher bit rate for quality critical visualservice applications than what is necessary, resulting in users carrying extra costs; andjust as likely a coding strategy may reduce a particular type of coding distortions orartifacts at the expense of manifesting or enhancing other types of distortions. One ofthe most challenging questions begging for an answer is how to define psychovisualredundancy for lossy image and video coding, if it can ever be defined quantitatively ina similar way to the statistical redundancy defined for lossless coding. It would help

    2006 by Taylor & Francis Group, LLC

  • Preface xv

    to set the theoretical lower bound for lossy image data coding at just noticeable levelcompared with the original.

    This book attempts to address two of the above raised issues which may form acritical part of theoretical research and practical system development in the field, i.e.,HVS based perceptual quantitative quality/impairment metrics for digitally coded pic-tures (i.e., images and videos), and perceptual picture coding. The book consists of three

    Part I comprises the first three chapters, covering a number of fundamental concepts,theory, principles and techniques underpinning issues and topics addressed by this book.

    Rao provides an introduction to digital picture compression, covering basic issues andtechniques along with popular coding structures, systems and international standards forcompression of images and videos.

    Fundamentals of Human Vision and Vision Modeling are presented by Montag

    lated to the HVS and its applications presented in Parts II and III on perceptual qual-ity/impairment metrics, image/video coding and visual communications. The most re-cent achievements and findings in vision research are included, which are relevant todigital picture coding engineering practice.

    Various digital image and video coding/compression algorithms and systems intro-duce highly structured coding artifacts or distortions, which are different from those intheir counterpart analog systems. It is important to analyze and understand these cod-ing artifacts in either subjective and objective quality assessment of digitally encoded

    presented by Yuen of various coding artifacts in digital pictures coded using well knowntechniques.

    Part II of this book consists of eight chapters dealing with a range of topics regardingpicture quality assessment criteria, subjective and objective methods and metrics, testingprocedures, and development of international standards activities in the field.

    on subjective assessment methods and techniques, experimental design, internationalstandard test methods for digital video images in contrast to objective assessment meth-ods, highlighting a number of critical issues and findings. Commonly used test videosequences are presented. The chapter also covers test criteria, test procedures and re-lated issues for various applications in digital video coding and communications. Al-though subjective assessment methods have been well documented in the literature andstandardized by the international standards bodies [ITU98], there has been a renewed

    2006 by Taylor & Francis Group, LLC

    Chapter 1, Digital Picture Compression and Coding Structure, by Hwang, Wu and

    and Fairchild in Chapter 2 which forms foundations of materials and discussions re-

    images or video sequences. In Chapter 3, a comprehensive classification and analysis is

    Chapter 4, Video Quality Testing by Corriveau, provides an in-depth discussion

    parts, i.e., Part I, Fundamentals; Part II, Testing and Quality Assessment of DigitalPictures and Part III, Perceptual Coding and Postprocessing.

  • xvi Digital Video Image Quality and Perceptual Coding

    interest in and research publications on various issues with subjective test methods andnew methods, approaches or procedures which may further improve the reliability ofsubjective test data.

    A comprehensive and up-to-date review is provided by Winkler on Perceptual Video

    square error (MSE) and the PSNR, and HVS based metrics as reported in the litera-ture [YW00, YWWC02] as well as by international standards bodies such as VQEG[VQE00]. It discusses factors which affect human viewers assessment on picture qual-ity, classification of objective quality metrics, and various approaches and models usedfor metrics design.

    Scale. It provides insights into the idea and concept behind the PQS which was in-troduced by Miyahara, Kotani and Algozi in [MKA98], an extension of the methodpioneered by Miyahara in 1988 [Miy88]. It examines applications of PQS to variousdigital picture services, including super HDTV, extra high quality images, and cellularvideo phones, in the context of international standards and activities.

    Wang, Bovik and Sheikh present a detailed account on Structural Similarity Based

    ric is devised to complement the traditional error sensitive picture assessment methods,by targeting at perceived structural information variation, an approach which mimicshigh level functionality of the HVS. Quality prediction accuracy of the metric is eval-uated with significant lower computational complexity than vision model based qualitymetrics.

    Vision Model Based Digital Video Impairment Metrics introduced recently are

    In contrast with traditional vision modeling and parameterization method used in visionresearch, the vision model used in the impairment metrics are parameterized and opti-mized using subjective test data provided the VQEG where original and distorted videosequences were used instead of simple test patterns. Detailed descriptions of impairmentmetric implementations are provided with performance evaluation which have showedgood agreements with the MOS obtained via subjective evaluations.

    Computational Models for Just-Noticeable Difference are reviewed and closely

    as well as a practical users guide for related techniques. JND estimation techniquesin both DCT subband domain and image pixel domain are discussed along with issuesregarding conversions between the two domains.

    ity Metric for Degraded and Enhanced Video. The concept of virtual reference isintroduced and defined. It highlights the importance of assessing picture quality en-

    2006 by Taylor & Francis Group, LLC

    Quality Metrics in Chapter 5, including both traditional measures, such as the mean

    In Chapter 6, Miyahara and Kawada discuss the Philosophy of Picture Quality

    Image Quality Assessment in Chapter 7. The structural similarity based quality met-

    described by Yu and Wu in Chapter 8 for blocking and ringing impairment assessments.

    examined by Lin in Chapter 9. It provides a systematic introduction to the field to date

    In Chapter 10, Caviedes and Oberti investigate issues with No-Reference Qual-

  • Preface xvii

    hancement as well as degradation in visual communications services and applications inabsence of original pictures. A framework for the development of no-reference qualitymetric is described. An extensive description is provided on the no-reference overallquality metric (NROQM) which the authors have developed for digital video qualityassessment.

    activities highlighting its goals, test plans, major findings, and future work and direc-tions.

    coder designs based on the HVS, and post-filtering, restoration, error correction and con-cealment techniques which paly an increasing role in improvement of perceptual picturequality by reduction of perceived coding artifacts and transmission errors. A number

    foveated perceptual coding. A noticeable feature of these new perceptual coders is thatthey use much more sophisticated vision models resulting in significant visual perfor-mance improvement. Discussions are included in these chapters on possible new codingarchitectures based on vision model as compared with existing statistically based cod-ing algorithms and architectures predominant in current software and hardware productsand systems.

    Chapter 12 by Pica, Isnardi and Lubin examines critical issues associated with HVSBased Perceptual Video Encoders. It covers an overview of perceptual based approaches,possible architectures and applications, and future directions. Architectures which sup-port perceptual based video encoding are discussed for an MPEG-2 compliant encoder.

    Tan and Wu present Perceptual Image Coding in Chapter 13, which provides acomprehensive review of HVS based image coding techniques to date. The review cov-ers traditional techniques where various HVS aspects or simple vision models are usedfor coder designs. Until most recently, this traditional approach has dominated researchon the topic with numerous publications, which forms one of, at least, four approachesto perceptual coding design. The chapter describes a perceptual distortion metric basedimage coder and a vision model based perceptual lossless coder along with detaileddiscussions on model calibration and coder performance evaluation results.

    Chapter 14 by Wang and Bovik investigates novel Foveated Image and Video Cod-ing techniques, which they introduced most recently. It provides an introduction to thefoveation feature of the HVS, a review of various foveation techniques that have beenused to construct image and video coding systems, and detailed descriptions of examplefoveated picture coding systems.

    Processing in Image Compression. Various image restoration and processing tech-

    2006 by Taylor & Francis Group, LLC

    In Chapter 11, Corriveau presents an overview of Video Quality Experts Group

    of new perceptual coders introduced in recent years are presented in Chapters 12, 13and 14, including rate-distortion optimization using perceptual distortion metrics and

    Chapter 15 by Chen and Wu discusses the topic of Artifact Reduction by Post-

    The next six chapters form Part III of this book, focusing on digital image and video

  • xviii Digital Video Image Quality and Perceptual Coding

    niques have been reported in recent years to eliminate or to reduce picture coding arti-facts introduced in the encoding or transmission process to improve perceptual imageor video picture quality. It becomes widely accepted that these post-filtering algorithmsare an integral part of a compression package or system from a rate-distortion optimiza-tion standpoint. This chapter focuses on reduction of blocking and ringing artifacts inorder to improve the visual quality of reconstructed pictures. A DCT domain deblock-ing technique is described with a fast implementation algorithm after a review of codingartifacts reduction techniques to date.

    Color bleeding is a prominent distortion associated with color images encoded by

    a novel approach to Reduction of Color Bleeding in DCT Block-Coded Video, whichthey introduced recently. This post-processing technique is devised after a thoroughanalysis of the cause of color bleeding. The performance evaluation results have demon-strated marked improvement in perceptual quality of reconstructed pictures.

    Issues associated with Error Resilience for Video Coding Service are investigated

    and concealment methods. Significant improvement in terms of visual picture qualityhas been demonstrated by using a number of techniques presented.

    challenges of the field which may be beneficial to the readers for future research.

    Performance measures used to evaluate objective quality/impairment metrics against

    We hope that readers will enjoy reading this book as much as we have enjoyedwriting it and find materials provided in it useful and relevant to their work and studiesin the field.

    H. R. Wu K. R. RaoRoyal Melbourne Institute of Technology, University of Texas at Arlington,Australia U.S.A.

    2006 by Taylor & Francis Group, LLC

    block DCT based picture coding systems. Coudoux and Gazalet present in Chapter 16

    by Zhang in Chapter 17. It provides an introduction to error resilient coding techniques

    Chapter 18, the final chapter of the book, highlights a number of critical issues and

    subjective test data are discussed in Appendix A.

  • Preface xix

    References

    [Cla85] R. J. Clarke. Transform Coding of Images. London: Academic Press, 1985.

    [Cla95] R. J. Clarke. Digital Compression of Still Images and Video. London: AcademicPress, 1995.

    [CP84] W.-H. Chen and W. K. Pratt. Scene adaptive coder. IEEE Trans. Commun., COM-32:225232, March 1984.

    [CS77] W.-H. Chen and C. H. Smith. Adaptive coding of monochrome and color images.IEEE Trans. Commun., COM-25:12851292, November 1977.

    [Cut52] C. C. Cutler. Differential Quantization of Communication Signals, U.S. PatentNo.2,605,361, July 1952.

    [Gab46] D. Gabor. Theory of communication. Journal of IEE, 93:429457, 1946.

    [Gir84] B. Girod. Whats wrong with mean-squared error? In A. B. Watson, Ed., DigitalImages and Human Vision, 207220. Cambridge, MA: MIT Press, 1984.

    [Gir03] B. Girod. Video coding for compression and beyond, keynote. In Proceedings ofIEEE International Conference on Image Processing, Barcelona, Spain, Septem-ber 2003.

    [Goo51] W. M. Goodall. Television by pulse code modulation. Bell Systems TechnicalJournal, 28:3349, January 1951.

    [Har52] C. W. Harrison. Experiments with linear prediction in television. Bell SystemsTechnical Journal, 29:764783, 1952.

    [Hua65] T. S. Huang. PCM picture transmission. IEEE Spectrum, 2:5763, December1965.

    [Huf52] D. A. Huffman. A method for the construction of minimum redundancy codes.IRE Proc., 40:10981101, 1952.

    [ITU98] ITU. ITU-RBT. 500-9, methodology for the subjective assessment of the qualityof television pictures. ITU-RBT, 1998.

    [JN84] N. S. Jayant and P. Noll. Digital Coding of Waveforms Principles and Applica-tions to Speech and Video. Upper Saddle River, NJ: Prentice Hall, 1984.

    [Kel29] R. D. Kell. Improvements Relating to Electric Picture Transmission Systems,British Patent No.341,811, 1929.

    [LB82] F. J. Lukas and Z. L. Budrikis. Picture quality prediction based on a visual model.IEEE Transactions on Communications, COM-30:16791692, July 1982.

    [Miy88] M. Miyahara. Quality assessments for visual service. IEEE Communications Mag-azine, 26(10):5160, October 1988.

    [MKA98] M. Miyahara, K. Kotani, and V. R. Algazi. Objective picture quality scale (pqs)for image coding. IEEE Transactions on Communications, 46(9):12151226, Sep-tember 1998.

    [NH95] A. N. Netravali and B. G. Haskell. Digital Pictures Representation, Compres-sion and Standards. New York: Plenum Press, 2nd ed., 1995.

    2006 by Taylor & Francis Group, LLC

  • xx Digital Video Image Quality and Perceptual Coding

    [RH96] K. R. Rao and J. J. Hwang. Techniques and Standards for Image, Video and AudioCoding. Upper Saddle River, NJ: Prentice Hall, 1996.

    [SB65] A. J. Seyler and Z. L. Budrikis. Detail perception after scene changes in televisionimage presentations. IEEE Trans. on Information Theory, IT-11(1):3143, January1965.

    [Sey62] A. J. Seyler. The coding of visual signals to reduce channel-capacity requirements.Proc. IEE, pt. C, 109(1):676684, 1962.

    [Sey63] A. J. Seyler. Real-time recording of television frame difference areas. Proc. IEEE,51(1):478480, 1963.

    [Sha48] C. E. Shannon. A mathematical theory of communication. Bell System TechnicalJournal, 27:379623, 1948.

    [Tau00] D. Taubman. High performance scalable image compression with ebcot. IEEETrans. Image Proc., 9:11581170, July 2000.

    [VQE00] VQEG. Final Report from the Video Quality Experts Group on the Validation ofObjective Models of Video Quality Assessment. VQEG, March 2000. Available

    [YW00] Z. Yu and H. R. Wu. Human visual systems based objective digital video qualitymetrics. In Proceedings of Internetional Conference on Signal Processing 2000 of16th IFIP World Computer Congress, 2:10881095, Beijing, China, August 2000.

    [YWWC02] Z. Yu, H. R. Wu, S. Winkler, and T. Chen. Vision model based impairment metricto evaluate blocking artifacts in digital video. Proc. IEEE, 90(1):154169, January2002.

    2006 by Taylor & Francis Group, LLC

    from ftp.its.bldrdoc.gov.

    ftp://ftp.its.bldrdoc.gov
  • Contents

    List of Contributors ix

    Acknowledgments xi

    Preface xiii

    I Picture Coding and Human Visual System Fundamentals 1

    1 Digital Picture Compression and Coding Structure 3

    1.1 Introduction to Digital Picture Coding . . . . . . . . . . . . . . . . . . 3

    1.2 Characteristics of Picture Data . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.1 Digital Image Data . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2.2 Digital Video Data . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.2.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.3 Compression and Coding Techniques . . . . . . . . . . . . . . . . . . 12

    1.3.1 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.3.2 Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . 13

    1.3.3 Transform Coding . . . . . . . . . . . . . . . . . . . . . . . . 14

    1.3.3.1 Discrete cosine transform (DCT) . . . . . . . . . . . 14

    1.3.3.2 Discrete wavelet transform (DWT) . . . . . . . . . . 18

    1.4 Picture Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    1.4.1 Uniform/Nonuniform Quantizer . . . . . . . . . . . . . . . . . 21

    1.4.2 Optimal Quantizer Design . . . . . . . . . . . . . . . . . . . . 21

    1.4.3 Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 24

    1.5 Rate-Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    1.6 Human Visual Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2006 by Taylor & Francis Group, LLC

  • xxii Digital Video Image Quality and Perceptual Coding

    1.6.1 Contrast Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    1.6.2 Spatial Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 28

    1.6.3 Masking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    1.6.4 Mach Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    1.7 Digital Picture Coding Standards and Systems . . . . . . . . . . . . . . 31

    1.7.1 JPEG-Still Image Coding Standard . . . . . . . . . . . . . . . 31

    1.7.2 MPEG-Video Coding Standards . . . . . . . . . . . . . . . . . 36

    1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    2 Fundamentals of Human Vision and Vision Modeling 45

    2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    2.2 A Brief Overview of the Visual System . . . . . . . . . . . . . . . . . 45

    2.3 Color Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    2.3.1 Colorimetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    2.3.2 Color Appearance, Color Order Systems and ColorDifference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    2.4 Luminance and the Perception of Light Intensity . . . . . . . . . . . . . 55

    2.4.1 Luminance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    2.4.2 Perceived Intensity . . . . . . . . . . . . . . . . . . . . . . . . 57

    2.5 Spatial Vision and Contrast Sensitivity . . . . . . . . . . . . . . . . . . 59

    2.5.1 Acuity and Sampling . . . . . . . . . . . . . . . . . . . . . . . 60

    2.5.2 Contrast Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 62

    2.5.3 Multiple Spatial Frequency Channels . . . . . . . . . . . . . . 64

    2.5.3.1 Pattern adaptation . . . . . . . . . . . . . . . . . . . 65

    2.5.3.2 Pattern detection . . . . . . . . . . . . . . . . . . . . 65

    2.5.3.3 Masking and facilitation . . . . . . . . . . . . . . . . 66

    2.5.3.4 Nonindependence in spatial frequency and orientation 68

    2.5.3.5 Chromatic contrast sensitivity . . . . . . . . . . . . . 70

    2.5.3.6 Suprathreshold contrast sensitivity . . . . . . . . . . 71

    2.5.3.7 Image compression and image difference . . . . . . . 74

    2.6 Temporal Vision and Motion . . . . . . . . . . . . . . . . . . . . . . . 75

    2.6.1 Temporal CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    2006 by Taylor & Francis Group, LLC

  • Contents xxiii

    2.6.2 Apparent Motion . . . . . . . . . . . . . . . . . . . . . . . . . 77

    2.7 Visual Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    2.7.1 Image and Video Quality Research . . . . . . . . . . . . . . . . 80

    2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

    3 Coding Artifacts and Visual Distortions 87

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    3.2 Blocking Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    3.2.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 90

    3.2.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 90

    3.3 Basis Image Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    3.3.1 Visual Significance of Each Basis Image . . . . . . . . . . . . . 92

    3.3.2 Predictive Coded Macroblocks . . . . . . . . . . . . . . . . . . 92

    3.3.3 Aggregation of Major Basis Images . . . . . . . . . . . . . . . 93

    3.4 Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    3.5 Color Bleeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    3.6 Staircase Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    3.7 Ringing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    3.8 Mosaic Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    3.8.1 Intraframe Coded Macroblocks . . . . . . . . . . . . . . . . . 100

    3.8.2 Predictive-Coded Macroblocks . . . . . . . . . . . . . . . . . . 101

    3.9 False Contouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    3.10 False Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    3.11 MC Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    3.12 Mosquito Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    3.12.1 Ringing-Related Mosquito Effect . . . . . . . . . . . . . . . . 108

    3.12.2 Mismatch-Related Mosquito Effect . . . . . . . . . . . . . . . 109

    3.13 Stationary Area Fluctuations . . . . . . . . . . . . . . . . . . . . . . . 110

    3.14 Chrominance Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    3.15 Video Scaling and Field Rate Conversion . . . . . . . . . . . . . . . . 113

    3.15.1 Video Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    3.15.2 Field Rate Conversion . . . . . . . . . . . . . . . . . . . . . . 115

    2006 by Taylor & Francis Group, LLC

  • xxiv Digital Video Image Quality and Perceptual Coding

    3.16 Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    3.16.1 Line Repetition and Averaging . . . . . . . . . . . . . . . . . . 117

    3.16.2 Field Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    3.16.3 Motion Adaptivity . . . . . . . . . . . . . . . . . . . . . . . . 118

    3.16.3.1 Luminance difference . . . . . . . . . . . . . . . . . 118

    3.16.3.2 Median filters . . . . . . . . . . . . . . . . . . . . . 118

    3.16.3.3 Motion compensation . . . . . . . . . . . . . . . . . 119

    3.17 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    II Picture Quality Assessment and Metrics 123

    4 Video Quality Testing 125

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    4.2 Subjective Assessment Methodologies . . . . . . . . . . . . . . . . . . 126

    4.3 Selection of Test Materials . . . . . . . . . . . . . . . . . . . . . . . . 126

    4.4 Selection of Participants Subjects . . . . . . . . . . . . . . . . . . . 128

    4.4.1 Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    4.4.2 Non-Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    4.4.3 Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    4.5 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    4.5.1 Test Chamber . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

    4.5.2 Common Experimental Mistakes . . . . . . . . . . . . . . . . . 131

    4.6 International Test Methods . . . . . . . . . . . . . . . . . . . . . . . . 132

    4.6.1 Double Stimulus Impairment Scale Method . . . . . . . . . . . 132

    4.6.2 Double Stimulus Quality Scale Method . . . . . . . . . . . . . 137

    4.6.3 Comparison Scale Method . . . . . . . . . . . . . . . . . . . . 141

    4.6.4 Single Stimulus Methods . . . . . . . . . . . . . . . . . . . . . 142

    4.6.5 Continuous Quality Evaluations . . . . . . . . . . . . . . . . . 143

    4.6.6 Discussion of SSCQE and DSCQS . . . . . . . . . . . . . . . 145

    4.6.7 Pitfalls of Different Methods . . . . . . . . . . . . . . . . . . . 147

    4.7 Objective Assessment Methods . . . . . . . . . . . . . . . . . . . . . . 150

    2006 by Taylor & Francis Group, LLC

  • Contents xxv

    4.7.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

    4.7.2 Requirement for Standards . . . . . . . . . . . . . . . . . . . . 151

    4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

    5 Perceptual Video Quality Metrics A Review 155

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    5.2 Quality Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    5.3 Metric Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

    5.4 Pixel-Based Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    5.5 The Psychophysical Approach . . . . . . . . . . . . . . . . . . . . . . 160

    5.5.1 HVS Modeling Fundamentals . . . . . . . . . . . . . . . . . . 160

    5.5.2 Single-Channel Models . . . . . . . . . . . . . . . . . . . . . . 163

    5.5.3 Multi-Channel Models . . . . . . . . . . . . . . . . . . . . . . 164

    5.6 The Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . 165

    5.6.1 Full-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 166

    5.6.2 Reduced-Reference Metrics . . . . . . . . . . . . . . . . . . . 167

    5.6.3 No-Reference Metrics . . . . . . . . . . . . . . . . . . . . . . 167

    5.7 Metric Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

    5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

    5.7.2 Video Quality Experts Group . . . . . . . . . . . . . . . . . . . 170

    5.7.3 Limits of Prediction Performance . . . . . . . . . . . . . . . . 171

    5.8 Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . 172

    6 Philosophy of Picture Quality Scale 181

    6.1 Objective Picture Quality Scale for Image Coding . . . . . . . . . . . . 181

    6.1.1 PQS and Evaluation of Displayed Image . . . . . . . . . . . . . 181

    6.1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

    6.1.3 Construction of a Picture Quality Scale . . . . . . . . . . . . . 182

    6.1.3.1 Luminance coding error . . . . . . . . . . . . . . . . 183

    6.1.3.2 Spatial frequency weighting of errors . . . . . . . . . 183

    6.1.3.3 Random errors and disturbances . . . . . . . . . . . 185

    6.1.3.4 Structured and localized errors and disturbances . . . 186

    2006 by Taylor & Francis Group, LLC

  • xxvi Digital Video Image Quality and Perceptual Coding

    6.1.3.5 Principal component analysis . . . . . . . . . . . . . 188

    6.1.3.6 Computation of PQS . . . . . . . . . . . . . . . . . . 189

    6.1.4 Visual Assessment Tests . . . . . . . . . . . . . . . . . . . . . 189

    6.1.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . 191

    6.1.4.2 Test pictures . . . . . . . . . . . . . . . . . . . . . . 192

    6.1.4.3 Coders . . . . . . . . . . . . . . . . . . . . . . . . . 193

    6.1.4.4 Determination of MOS . . . . . . . . . . . . . . . . 193

    6.1.5 Results of Experiments . . . . . . . . . . . . . . . . . . . . . . 193

    6.1.5.1 Results of principal component analysis . . . . . . . 193

    6.1.5.2 Multiple regression analysis . . . . . . . . . . . . . . 195

    6.1.5.3 Evaluation of PQS . . . . . . . . . . . . . . . . . . . 195

    6.1.5.4 Generality and robustness of PQS . . . . . . . . . . . 196

    6.1.6 Key Distortion Factors . . . . . . . . . . . . . . . . . . . . . . 197

    6.1.6.1 Characteristics of the principal components . . . . . . 197

    6.1.6.2 Contribution of the distortion factors . . . . . . . . . 197

    6.1.6.3 Other distortion factors . . . . . . . . . . . . . . . . 198

    6.1.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

    6.1.7.1 Limitations in applications . . . . . . . . . . . . . . 198

    6.1.7.2 Visual assessment scales and methods . . . . . . . . 200

    6.1.7.3 Human vision models and image quality metrics . . . 200

    6.1.7.4 Specializing PQS for a specific coding method . . . . 201

    6.1.7.5 PQS in color picture coding . . . . . . . . . . . . . . 201

    6.1.8 Applications of PQS . . . . . . . . . . . . . . . . . . . . . . . 201

    6.1.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

    6.2 Application of PQS to a Variety of Electronic Images . . . . . . . . . . 202

    6.2.1 Categories of Image Evaluation . . . . . . . . . . . . . . . . . 203

    6.2.1.1 Picture spatial resolution and viewing distance . . . . 203

    6.2.1.2 Constancy of viewing distance . . . . . . . . . . . . 205

    6.2.1.3 Viewing angle between adjacent pixels . . . . . . . . 206

    6.2.2 Linearization of the Scale . . . . . . . . . . . . . . . . . . . . 206

    6.2.3 Importance of Center Area of Image in Quality Evaluation . . . 208

    2006 by Taylor & Francis Group, LLC

  • Contents xxvii

    6.2.4 Other Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 209

    6.3 Various Categories of Image Systems . . . . . . . . . . . . . . . . . . 209

    6.3.1 Standard TV Images with Frame Size of about 500 640Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

    6.3.2 HDTV and Super HDTV . . . . . . . . . . . . . . . . . . . . . 209

    6.3.3 Extra High Quality Images . . . . . . . . . . . . . . . . . . . . 211

    6.3.4 Cellular Phone Type . . . . . . . . . . . . . . . . . . . . . . . 212

    6.3.5 Personal Computer and Display for CG . . . . . . . . . . . . . 213

    6.4 Study at ITU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

    6.4.1 SG9 Recommendations for Quality Assessment . . . . . . . . . 213

    6.4.2 J.143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

    6.4.2.1 FR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

    6.4.2.2 NR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

    6.4.2.3 RR scheme . . . . . . . . . . . . . . . . . . . . . . . 215

    6.4.3 J.144 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

    6.4.4 J.133 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

    6.4.5 J.146 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

    6.4.6 J.147 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

    6.4.7 J.148 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

    6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

    7 Structural Similarity Based Image Quality Assessment 225

    7.1 Structural Similarity and Image Quality . . . . . . . . . . . . . . . . . 225

    7.2 The Structural SIMilarity (SSIM) Index . . . . . . . . . . . . . . . . . 228

    7.3 Image Quality Assessment Based on the SSIM Index . . . . . . . . . . 233

    7.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

    8 Vision Model Based Digital Video Impairment Metrics 243

    8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

    8.2 Vision Modeling for Impairment Measurement . . . . . . . . . . . . . 247

    8.2.1 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . 248

    8.2.2 Temporal Filtering . . . . . . . . . . . . . . . . . . . . . . . . 249

    8.2.3 Spatial Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 250

    2006 by Taylor & Francis Group, LLC

  • xxviii Digital Video Image Quality and Perceptual Coding

    8.2.4 Contrast Gain Control . . . . . . . . . . . . . . . . . . . . . . 251

    8.2.5 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 253

    8.2.6 Model Parameterization . . . . . . . . . . . . . . . . . . . . . 254

    8.2.6.1 Parameterization by vision research experiments . . . 254

    8.2.6.2 Parameterization by video quality experiments . . . . 255

    8.3 Perceptual Blocking Distortion Metric . . . . . . . . . . . . . . . . . . 258

    8.3.1 Blocking Dominant Region Segmentation . . . . . . . . . . . . 259

    8.3.1.1 Vertical and horizontal block edge detection . . . . . 261

    8.3.1.2 Removal of edges coexisting in original and processedsequences . . . . . . . . . . . . . . . . . . . . . . . 263

    8.3.1.3 Removal of short isolated edges in processed sequence 263

    8.3.1.4 Adjacent edge removal . . . . . . . . . . . . . . . . 263

    8.3.1.5 Generation of blocking region map . . . . . . . . . . 263

    8.3.1.6 Ringing region detection . . . . . . . . . . . . . . . 265

    8.3.1.7 Exclusion of ringing regions from blocking region map 265

    8.3.2 Summation of Distortions in Blocking Dominant Regions . . . 265

    8.3.3 Performance Evaluation of the PBDM . . . . . . . . . . . . . . 266

    8.4 Perceptual Ringing Distortion Measure . . . . . . . . . . . . . . . . . . 269

    8.4.1 Ringing Region Segmentation . . . . . . . . . . . . . . . . . . 271

    8.4.1.1 Modified variance computation . . . . . . . . . . . . 272

    8.4.1.2 Smooth and complex region detection . . . . . . . . 272

    8.4.1.3 Boundary labeling and distortion calculation . . . . . 273

    8.4.2 Detection and Pooling . . . . . . . . . . . . . . . . . . . . . . 274

    8.4.3 Performance Evaluation of the PRDM . . . . . . . . . . . . . . 274

    8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

    9 Computational Models for Just-Noticeable Difference 281

    9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

    9.1.1 Single-Stimulus JNDT Tests . . . . . . . . . . . . . . . . . . . . 282

    9.1.2 JND Tests with Real-World Images . . . . . . . . . . . . . . . 283

    9.1.3 Applications of JND Models . . . . . . . . . . . . . . . . . . . 283

    9.1.4 Objectives and Organization of the Following Sections . . . . . 284

    2006 by Taylor & Francis Group, LLC

  • Contents xxix

    9.2 JND with DCT Subbands . . . . . . . . . . . . . . . . . . . . . . . . . 285

    9.2.1 Formulation for Base Threshold . . . . . . . . . . . . . . . . . 286

    9.2.1.1 Spatial CSF equations . . . . . . . . . . . . . . . . . 286

    9.2.1.2 Base threshold . . . . . . . . . . . . . . . . . . . . . 287

    9.2.2 Luminance Adaptation Considerations . . . . . . . . . . . . . 289

    9.2.3 Contrast Masking . . . . . . . . . . . . . . . . . . . . . . . . . 291

    9.2.3.1 Intra-band masking . . . . . . . . . . . . . . . . . . 291

    9.2.3.2 Inter-band masking . . . . . . . . . . . . . . . . . . 291

    9.2.4 Other Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

    9.3 JND with Pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

    9.3.1 JND Estimation from Pixel Domain . . . . . . . . . . . . . . . 294

    9.3.1.1 Spatial JNDs . . . . . . . . . . . . . . . . . . . . . . 294

    9.3.1.2 Simplified estimators . . . . . . . . . . . . . . . . . 295

    9.3.1.3 Temporal masking effect . . . . . . . . . . . . . . . 296

    9.3.2 Conversion between Subband- and Pixel-Based JNDs . . . . . . 297

    9.3.2.1 Subband summation to pixel domain . . . . . . . . . 297

    9.3.2.2 Pixel domain decomposition into subbands . . . . . . 298

    9.4 JND Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 298

    9.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

    10 No-Reference Quality Metric for Degraded and Enhanced Video 305

    10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

    10.2 State-of-the-Art for No-Reference Metrics . . . . . . . . . . . . . . . . 306

    10.3 Quality Metric Components and Design . . . . . . . . . . . . . . . . . 307

    10.3.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 309

    10.3.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 310

    10.3.3 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

    10.3.4 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

    10.3.5 Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

    10.3.6 Sharpness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

    10.4 No-Reference Overall Quality Metric . . . . . . . . . . . . . . . . . . 313

    10.4.1 Building and Training the NROQM . . . . . . . . . . . . . . . 314

    2006 by Taylor & Francis Group, LLC

  • xxx Digital Video Image Quality and Perceptual Coding

    10.5 Performance of the Quality Metric . . . . . . . . . . . . . . . . . . . . 317

    10.5.1 Testing NROQM . . . . . . . . . . . . . . . . . . . . . . . . . 318

    10.5.2 Test with Expert Viewers . . . . . . . . . . . . . . . . . . . . . 320

    10.6 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . 321

    11 Video Quality Experts Group 325

    11.1 Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

    11.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

    11.3 Phase I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

    11.3.1 The Subjective Test Plan . . . . . . . . . . . . . . . . . . . . . 328

    11.3.2 The Objective Test Plan . . . . . . . . . . . . . . . . . . . . . 328

    11.3.3 Comparison Metrics . . . . . . . . . . . . . . . . . . . . . . . 329

    11.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

    11.4 Phase II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

    11.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

    11.5 Continuing Work and Directions . . . . . . . . . . . . . . . . . . . . . 332

    11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

    III Perceptual Coding and Processing of Digital Pictures 335

    12 HVS Based Perceptual Video Encoders 337

    12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

    12.1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

    12.2 Noise Visibility and Visual Masking . . . . . . . . . . . . . . . . . . . 338

    12.3 Architectures for Perceptual Based Coding . . . . . . . . . . . . . . . 340

    12.3.1 Masking Calculations . . . . . . . . . . . . . . . . . . . . . . 343

    12.3.2 Perceptual Based Rate Control . . . . . . . . . . . . . . . . . . 345

    12.3.2.1 Macroblock level control . . . . . . . . . . . . . . . 345

    12.3.2.2 Picture level control . . . . . . . . . . . . . . . . . . 346

    12.3.2.3 GOP level control . . . . . . . . . . . . . . . . . . . 348

    12.3.3 Look Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

    12.4 Standards-Specific Features . . . . . . . . . . . . . . . . . . . . . . . . 352

    2006 by Taylor & Francis Group, LLC

  • Contents xxxi

    12.4.1 Exploitation of Smaller Block Sizes in Advanced Coding Stan-dards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

    12.4.1.1 The origin of blockiness . . . . . . . . . . . . . . . . 352

    12.4.1.2 Parameters that affect blockiness visibility . . . . . . 352

    12.4.2 In-Loop Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 356

    12.4.3 Perceptual-Based Scalable Coding Schemes . . . . . . . . . . . 356

    12.5 Salience/Maskability Pre-Processing . . . . . . . . . . . . . . . . . . . 357

    12.6 Application to Multi-Channel Encoding . . . . . . . . . . . . . . . . . 358

    13 Perceptual Image Coding 361

    13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

    13.1.1 Watsons DCTune . . . . . . . . . . . . . . . . . . . . . . . . 362

    13.1.2 Safranek and Johnstons Subband Image Coder . . . . . . . . . 363

    13.1.3 Hontsch and Karams APIC . . . . . . . . . . . . . . . . . . . 363

    13.1.4 Chou and Lis Perceptually Tuned Subband Image Coder . . . . 365

    13.1.5 Taubmans EBCOT-CVIS . . . . . . . . . . . . . . . . . . . . 366

    13.1.6 Zeng et al.s Point-Wise Extended Visual Masking . . . . . . . 366

    13.2 A Perceptual Distortion Metric Based Image Coder . . . . . . . . . . . 368

    13.2.1 Coder Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 368

    13.2.2 Perceptual Image Distortion Metric . . . . . . . . . . . . . . . 369

    13.2.2.1 Frequency transform . . . . . . . . . . . . . . . . . . 369

    13.2.2.2 CSF . . . . . . . . . . . . . . . . . . . . . . . . . . 371

    13.2.2.3 Masking response . . . . . . . . . . . . . . . . . . . 372

    13.2.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . 373

    13.2.2.5 Overall model . . . . . . . . . . . . . . . . . . . . . 373

    13.2.3 EBCOT Adaptation . . . . . . . . . . . . . . . . . . . . . . . . 375

    13.3 Model Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

    13.3.1 Test Material . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

    13.3.2 Generation of Distorted Images . . . . . . . . . . . . . . . . . 378

    13.3.3 Subjective Assessment . . . . . . . . . . . . . . . . . . . . . . 379

    13.3.4 Arrangements and Apparatus . . . . . . . . . . . . . . . . . . . 380

    13.3.5 Presentation of Material . . . . . . . . . . . . . . . . . . . . . 381

    2006 by Taylor & Francis Group, LLC

  • xxxii Digital Video Image Quality and Perceptual Coding

    13.3.6 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

    13.3.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

    13.3.8 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

    13.3.9 Model Optimization . . . . . . . . . . . . . . . . . . . . . . . 386

    13.3.9.1 Full parametric optimization . . . . . . . . . . . . . 389

    13.3.9.2 Algorithmic optimization . . . . . . . . . . . . . . . 390

    13.3.9.3 Coder optimization . . . . . . . . . . . . . . . . . . 391

    13.3.9.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . 392

    13.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 394

    13.4.1 Assessment Material . . . . . . . . . . . . . . . . . . . . . . . 395

    13.4.2 Objective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 395

    13.4.3 Objective Results . . . . . . . . . . . . . . . . . . . . . . . . . 397

    13.4.4 Subjective Evaluation . . . . . . . . . . . . . . . . . . . . . . . 400

    13.4.4.1 Dichotomous FCM . . . . . . . . . . . . . . . . . . 400

    13.4.4.2 Trichotomous FCM . . . . . . . . . . . . . . . . . . 400

    13.4.4.3 Assessment arrangements . . . . . . . . . . . . . . . 401

    13.4.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 402

    13.4.5.1 PC versus EBCOT-MSE . . . . . . . . . . . . . . . . 402

    13.4.5.2 PC versus EBCOT-CVIS . . . . . . . . . . . . . . . 406

    13.4.5.3 PC versus EBCOT-XMASK . . . . . . . . . . . . . . 406

    13.4.6 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . 406

    13.5 Perceptual Lossless Coder . . . . . . . . . . . . . . . . . . . . . . . . 412

    13.5.1 Coding Structure . . . . . . . . . . . . . . . . . . . . . . . . . 412

    13.5.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 414

    13.5.2.1 Subjective evaluation . . . . . . . . . . . . . . . . . 415

    13.5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . 416

    13.5.2.3 Discussions . . . . . . . . . . . . . . . . . . . . . . 416

    13.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

    14 Foveated Image and Video Coding 431

    14.1 Foveated Human Vision and Foveated Image Processing . . . . . . . . 431

    14.2 Foveation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434

    2006 by Taylor & Francis Group, LLC

  • Contents xxxiii

    14.2.1 Geometric Methods . . . . . . . . . . . . . . . . . . . . . . . . 434

    14.2.2 Filtering Based Methods . . . . . . . . . . . . . . . . . . . . . 436

    14.2.3 Multiresolution Methods . . . . . . . . . . . . . . . . . . . . . 438

    14.3 Scalable Foveated Image and Video Coding . . . . . . . . . . . . . . . 440

    14.3.1 Foveated Perceptual Weighting Model . . . . . . . . . . . . . . 440

    14.3.2 Embedded Foveation Image Coding . . . . . . . . . . . . . . . 445

    14.3.3 Foveation Scalable Video Coding . . . . . . . . . . . . . . . . 447

    14.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452

    15 Artifact Reduction by Post-Processing in Image Compression 459

    15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

    15.2 Image Compression and Coding Artifacts . . . . . . . . . . . . . . . . 461

    15.2.1 Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . . . . 462

    15.2.2 Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 464

    15.3 Reduction of Blocking Artifacts . . . . . . . . . . . . . . . . . . . . . 465

    15.3.1 Adaptive Postfiltering of Transform Coefficients . . . . . . . . 469

    15.3.1.1 Consideration of masking effect . . . . . . . . . . . . 471

    15.3.1.2 Block activity . . . . . . . . . . . . . . . . . . . . . 473

    15.3.1.3 Adaptive filtering . . . . . . . . . . . . . . . . . . . 473

    15.3.1.4 Quantization constraint . . . . . . . . . . . . . . . . 474

    15.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 475

    15.3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 477

    15.3.3.1 Results of block classification . . . . . . . . . . . . . 478

    15.3.3.2 Performance evaluation . . . . . . . . . . . . . . . . 478

    15.4 Reduction of Ringing Artifacts . . . . . . . . . . . . . . . . . . . . . . 482

    15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484

    16 Reduction of Color Bleeding in DCT Block-Coded Video 489

    16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

    16.2 Analysis of the Color Bleeding Phenomenon . . . . . . . . . . . . . . . 490

    16.2.1 Digital Color Video Formats . . . . . . . . . . . . . . . . . . . 490

    16.2.2 Color Quantization . . . . . . . . . . . . . . . . . . . . . . . . 491

    2006 by Taylor & Francis Group, LLC

  • xxxiv Digital Video Image Quality and Perceptual Coding

    16.2.3 Analysis of Color Bleeding Distortion . . . . . . . . . . . . . . 492

    16.3 Description of the Post-Processor . . . . . . . . . . . . . . . . . . . . . 495

    16.4 Experimental Results Concluding Remarks . . . . . . . . . . . . . . 499

    17 Error Resilience for Video Coding Service 503

    17.1 Introduction to Error Resilient Coding Techniques . . . . . . . . . . . . 503

    17.2 Error Resilient Coding Methods Compatible with MPEG-2 . . . . . . . 504

    17.2.1 Temporal Localization . . . . . . . . . . . . . . . . . . . . . . 504

    17.2.2 Spatial Localization . . . . . . . . . . . . . . . . . . . . . . . 506

    17.2.3 Concealment . . . . . . . . . . . . . . . . . . . . . . . . . . . 506

    17.2.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

    17.3 Methods for Concealment of Cell Loss . . . . . . . . . . . . . . . . . . 513

    17.3.1 Spatial Concealment . . . . . . . . . . . . . . . . . . . . . . . 513

    17.3.2 Temporal Concealment . . . . . . . . . . . . . . . . . . . . . . 513

    17.3.3 The Boundary Matching Algorithm (BMA) . . . . . . . . . . . 517

    17.3.4 Decoder Motion Vector Estimation (DMVE) . . . . . . . . . . 520

    17.3.5 Extension of DMVE algorithm . . . . . . . . . . . . . . . . . . 522

    17.4 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 523

    17.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 524

    17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

    18 Critical Issues and Challenges 543

    18.1 Picture Coding Structures . . . . . . . . . . . . . . . . . . . . . . . . . 543

    18.1.1 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . 545

    18.1.2 Complete vs. Over-Complete Transforms . . . . . . . . . . . . 549

    18.1.3 Decisions Decisions . . . . . . . . . . . . . . . . . . . . . . . 551

    18.2 Vision Modeling Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 554

    18.3 Spatio-Temporal Masking in Video Coding . . . . . . . . . . . . . . . 558

    18.4 Picture Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . 559

    18.4.1 Picture Quality Metrics Design Approaches . . . . . . . . . . . 559

    18.4.2 Alternative Assessment Methods and Issues . . . . . . . . . . . 560

    18.4.3 More Challenges in Picture Quality Assessment . . . . . . . . . 561

    2006 by Taylor & Francis Group, LLC

  • Contents xxxv

    18.5 Challenges in Perceptual Coder Design . . . . . . . . . . . . . . . . . 562

    18.5.1 Incorporating HVS in Existing Coders . . . . . . . . . . . . . . 562

    18.5.2 HVS Inspired Coders . . . . . . . . . . . . . . . . . . . . . . . 563

    18.5.3 Perceptually Lossless Coding . . . . . . . . . . . . . . . . . . 565

    18.6 Codec System Design Optimization . . . . . . . . . . . . . . . . . . . 566

    18.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566

    A VQM Performance Metrics 575

    A.1 Metrics Relating to Model Prediction Accuracy . . . . . . . . . . . . 576

    A.2 Metrics Relating to Prediction Monotonicity of a Model . . . . . . . . 580

    A.3 Metrics Relating to Prediction Consistency . . . . . . . . . . . . . . . 581

    A.4 MATLAB Source Code . . . . . . . . . . . . . . . . . . . . . . . . . 583

    A.5 Supplementary Analyses . . . . . . . . . . . . . . . . . . . . . . . . . 591

    2006 by Taylor & Francis Group, LLC

  • Part I

    Picture Coding and Human VisualSystem Fundamentals

    2006 by Taylor & Francis Group, LLC

  • Chapter 1

    Digital Picture Compression andCoding Structure

    Jae Jeong Hwang, Hong Ren Wu and K.R. Rao

    Kunsan National University, Republic of Korea Royal Melbourne Institute of Technology, Australia University of Texas at Arlington, U.S.A.

    1.1 Introduction to Digital Picture Coding

    Digital video service has become an integral part of entertainment, education, broadcast-ing, communication, and business arenas [Say00, Bov05, PE02, PC00, GW02, Ric03,Gha03, Gib97]. Digital camcorders are more preferred than analog ones in the con-sumer market with their convenience and high quality. In fact, still image or movingvideo taken by digital cameras can be stored, displayed, edited, printed or transmittedvia the Internet. Digital television provides strong affinities to the TV audience and isgoing to expel analog television receivers from the market. Digital video and image aresimple alternative means of carrying the same information as their analog counterparts.An ideal analog recorder should exactly record the natural phenomena in the form ofvideo, image or audio. An ideal digital recorder has to do the same work with a num-ber of advantages such as interactivity, flexibility, and compressibility. Although, in thereal situation, ideal conditions seldom prevail and may not be possible by means of bothanalog and digital techniques, digital compression is one of the techniques used to lowerthe cost for a video system while maintaining the same quality of service. Data com-pression is a process to yield a compact representation of a signal in the digital format.For delivery or transmission of information, the key issue is to minimize the bit ratethat represents the number of bits per second in the real-time delivery system such as avideo stream or the number of bits per picture element (pixel or pel) in the static image.Digital data contains huge amounts of information. Full motion video, e.g., in NTSCformat at 30 frames per second (fps) and at 720 x 480 pixel resolution, generates data for

    3 2006 by Taylor & Francis Group, LLC

  • 4 Digital Video Image Quality and Perceptual Coding

    luminance component at 10.4 Mbytes/sec, assuming 8 bits per sample quantization. Ifwe include color components for a 4:2:2 format, data rate of 20.8 Mbytes/sec is needed,allowing only 31 seconds of video storage on a 650 Mbyte CD-ROM. The storage ca-pacity up to 74 minutes is only possible by means of compression technology. Then,how can it be compressed? There is considerable statistical redundancy in the signal.

    Spatial correlation: Within a single two-dimensional image plane, there usuallyexists significant correlation among neighboring samples.

    Temporal correlation: For temporal data, such as moving video through temporaldirection, there usually exists significant correlation among samples in adjacentframes.

    Spectral correlation: For multispectral images, such as satellite images, there usu-ally exists significant correlation among different frequency bands.

    Original video/image data containing any kind of correlation or redundancy can be com-pressed by appropriate techniques such as predictive or transform based coding thatreduces correlation inherently. Image compression aims at reducing the number of bitsneeded to represent an image by removing the spatial and spectral redundancies as muchas possible, while video compression is achieved by removing temporal redundancy aswell. This is called redundancy reduction, the principle behind compression. Anotherimportant principle behind compression is irrelevancy reduction that will not be noticedby the signal receiver, namely the Human Visual System (HVS). Two ways of classify-ing compression techniques in terms of reproduction quality at the decoder are losslesscompression and lossy compression. In lossless compression schemes, the reconstructedimage, after compression, is numerically identical to the original image. This is also re-ferred to as a reversible process. However lossless compression can only achieve amodest amount of compression depending on the amount of data correlation. An imagereconstructed following lossy compression contains degradation relative to the original.Often this is because the compression scheme can completely discard redundant infor-mation. However, lossy schemes are capable of achieving much higher compression.Visually lossless coding is achieved if no visible loss is perceived by human viewersunder normal viewing conditions. Different classes of compression techniques with re-spect to statistical redundancy and irrelevancy (or psychovisual redundancy) reductions

    Another classification in terms of coding techniques is based on prediction or trans-formation techniques. In predictive coding, information already sent or available is usedto predict future values, and the difference is coded and transmitted. Prediction canbe performed in any domain, but is usually done in the image or spatial domain. It isrelatively simple to implement and is readily adapted to local image characteristics. Dif-ferential Pulse Code Modulation (DPCM) is one particular example of predictive coding

    2006 by Taylor & Francis Group, LLC

    are illustrated in Figure 1.1.

  • 1.1. Introduction to Digital Picture Coding 5

    Figure 1.1: Illustration of digital picture compression fundamental concepts.

    in the spatial or time domain. Transform coding, on the other hand, first transforms theimage from its spatial domain representation to a different type of representation us-

    then encodes the transformed values (coefficients). This method provides greater datacompression compared to predictive methods, although at the expense of higher com-putational complexity.

    As a result of a quantization process, inevitable errors or distortions happen in thedecoded picture quality. Distortion measures can be divided into two categories: sub-jective and objective measures. It is said to be subjective if the quality is evaluated byhumans. The use of human analysts, however, is quite impractical and may not guaran-tee objectivity. The assessment is not stationary, depending on their feelings. Moreover,the definition of distortion highly depends on the application, i.e. the best quality eval-uation is not always made by people at all.

    In the objective measures, the distortion is calculated as the difference between theoriginal image, xo, and the reconstructed image, xr, by a predefined function. It isassumed that the original image is perfect. All changes are considered as occurrences ofdistortion, no matter how they appear to a human observer. The quantitative distortionof the reconstructed image is commonly measured by the mean square error (MSE),the mean absolute error (MAE), and the peak-to-peak signal to noise ratio (PSNR):

    MSE =1

    M NM1m=0

    N1n=0

    (xo[m, n] xr[m, n])2 (1.1)

    MAE =1

    M NM1m=0

    N1n=0

    |xo[m, n] xr[m, n])| (1.2)

    2006 by Taylor & Francis Group, LLC

    ing some well-known transforms such as DCT, DWT (See details in Section 1.3) and

  • 6 Digital Video Image Quality and Perceptual Coding

    PSNR = 10 log102552

    MSE(1.3)

    where M and N are the height and the width of the image, respectively, and (1.3) isdefined for 8 bits/pixel monochrome image representation.

    These measures are widely used in the literature. Unfortunately, these measuresdo not always coincide with the evaluations of a human expert. The human eye, forexample, does not observe small changes of intensity between individual pixels, but issensitive to the changes in the average value and contrast in larger regions. Thus, oneapproach would be to calculate the local properties, such as mean values and variancesof some small regions in the image, and then compare them between the original andthe reconstructed images. Another deficiency of these distortion functions is that theymeasure only local, pixel-by-pixel differences, and do not consider global artifacts, suchas blockiness, blurring, jaggedness of the edges, ringing or any other type of structuraldegradation.

    1.2 Characteristics of Picture Data

    1.2.1 Digital Image Data

    Digital image is visual information represented in a discrete form, suitable for digitalelectronic storage and transmission. It is obtained by image sampling techniques that adiscrete array x[m, n] is extracted from the continuous image field at some time instantover some rectangular area M N . The digitized brightness value is called the greylevel value. Each image sample is a picture element called a pixel or a pel. Thus, atwo-dimensional (2-D) digital image is defined as:

    x[m, n] =

    x[0, 0] x[0, 1] x[0, N 1]x[1, 0] x[1, 1] x[1, N 1]

    ......

    . . ....

    x[M 1, 0] x[M 1, 1] x[M 1, N 1]

    (1.4)

    where its array of image samples is defined on the two-dimensional Cartesian coordinate

    size M N with 2q different grey levels is b = M N q. That is, to store a typicalimage of size 512512 with 256 grey levels (q = 8), we need 2,097,152 bits or 262,144bytes. We may try to reduce the factor M , N or q to save capacity of storage or bits fortransmission, but it is not said to be compressed, since it results in significant loss in thequality of the picture.

    2006 by Taylor & Francis Group, LLC

    system as illustrated in Figure 1.2. The number of bits, b, we need to store an image of

  • 1.2. Characteristics of Picture Data 7

    Figure 1.2: Geometric relationship between the Cartesian coordinate system and its array of image sam-ples.

    1.2.2 Digital Video Data

    A natural video stream is continuous in both spatial and temporal domains. In order torepresent and process a video stream digitally it is also necessary to sample spatiallyand temporally as shown in Figure 1.3. An image sampled in the spatial domain istypically represented on a rectangular grid and a video stream is a series of still imagessampled at regular intervals in time. In this case, the still image is usually called a frame.For processing video signal in a television format, a couple of fields are interlaced toconstruct a frame. It is called a picture for processing non-interlaced (frame-based)video signal. Each spatio-temporal sample, pixel, is represented as a positive digitalnumber that describes the brightness (luminance) and color components.

    Figure 1.3: Three dimensional (spatial and temporal) domain in a video stream.

    A natural video scene is captured, typically with a camera, and converted to a sam-

    color-difference format Y C1C2 rather than in the original RGB natural color format.It may then be handled in the digital domain in a number of ways, including processing,storage and transmission. At the final output of the system, it is displayed to a viewerby reproducing it on a video monitor.

    The RGB (red, green, and blue) color space is the basic choice for computer graph-ics and image frame buffers because color CRTs use red, green, and blue phosphors tocreate the desired color as the three primary additive colors. Individual components areadded together to form a color and an equivalent addition of all components produceswhite. However, RGB is not very efficient for representing real-world images, sinceequal bandwidths are required to describe all the three color components. The equal

    2006 by Taylor & Francis Group, LLC

    pled digital representation as shown in Figure 1.4. Digital video is represented in digital

  • 8 Digital Video Image Quality and Perceptual Coding

    Figure 1.4: Digital representation and color format conversion of natural video stream.

    bandwidths result in the same pixel depth and display resolution for each color com-ponent. Using 8 bits per component requires 24 bits information for a pixel, resultingin 3 times the capacity of the luminance component. Moreover, the sensitivity of thecolor component of the human eye is less than that of the luminance component. Forthese reasons, many image coding standards and broadcast systems use luminance andcolor difference signals. These are, for example, Y UV and Y IQ for analog televisionstandards and Y CbCr for their digital version.

    The Y CbCr format recommended by the ITU.R BT-601 [ITU82] as a worldwidevideo component standard is obtained from digital gamma-corrected RGB signals asfollows:

    Y = 0.299R + 0.587G + 0.114B

    Cb = 0.169R 0.331G + 0.500BCr = 0.500R

    0.419G 0.081B(1.5)

    The color-difference signals are given by:

    (B Y ) = 0.299R 0.587G + 0.886B(R Y ) = 0.701R 0.587G 0.114B (1.6)

    where the values for (BY ) have a range of 0.886 and for (RY ) a range of 0.701,while those for Y have a range of 0 to 1.

    To restore the signal excursion of the color-difference signals to unity (-0.5 to +0.5),(BY ) is multiplied by a factor 0.564 (0.5 divided by 0.886) and (RY ) is multipliedby a factor 0.713 (0.5 divided by 0.701). Thus the Cb and Cr are the re-normalized blueand red color difference signals, respectively.

    Given that the luminance signal is to occupy 220 levels (16 to 235), the luminancesignal has to be scaled to obtain the digital value, Yd. Similarly, the color differencesignals are to occupy 224 levels and the zero level is to be level 128. The digital repre-sentation for the three components are expressed as [NH95]:

    2006 by Taylor & Francis Group, LLC

  • 1.2. Characteristics of Picture Data 9

    Yd = 219Y + 16Cb = 224[0.564(B Y )] + 128 = 126(B Y ) + 128Cr = 224[0.713(R Y )] + 128 = 160(R Y ) + 128

    (1.7)

    or in its vector form: YdCb

    Cr

    = 65.481 128.553 24.96637.797 74.203 112.000

    112.000 93.786 18.214

    RG

    B

    + 16128

    128

    (1.8)

    where the corresponding level number after quantization is the nearest integer.

    Video transmission bit rate is decreased by adopting lower sampling rates while pre-serving acceptable video quality. Given image resolution of 720576 pixels representedwith 8 bits each, the bit rate required is calculated as:

    4:4:4 resolution: 72057683 = 10 Mbits/frame10 Mbits/frame29.97 frames/sec = 300 Mbits/sec

    4:2:0 resolution: (7205768) + (3602888)2 = 5 Mbits/frame5 Mbits/frame29.97 frames/sec = 150 Mbits/sec

    The 4:2:0 version requires half as many bits as the 4:4:4 version but compression isstill necessary for transmission and storage.

    1.2.3 Statistical Analysis

    The mean value of the discrete image array, x as defined in (1.4), expressed convenientlyin vector-space form is given by

    x = E{x} = 1M N

    M1m=0

    N1n=0

    x[m, n] =2b1k=0

    xkp(xk) (1.9)

    where xk denotes the k-th grey level that varies from value 0 to maximum level 2b 1defined by the quantization bits b and p(xk) = nk/(M N) the probability of xk.

    The variance function of the image array, x, is defined as

    2x =1

    M NM1m=0

    N1n=0

    (x[m, n] x)2 =2b1k=0

    (