RATE-DISTORTION BASED VIDEO COMPRESSION Optimal …978-1-4757-2566-7/1.pdf · RATE-DISTORTION BASED...
Transcript of RATE-DISTORTION BASED VIDEO COMPRESSION Optimal …978-1-4757-2566-7/1.pdf · RATE-DISTORTION BASED...
RATE-DISTORTION BASED
VIDEO COMPRESSION
Optimal Video Frame Compression and Object Boundary Encoding
RATE-DISTORTION BASED VIDEO
COMPRESSION
Optimal Video Frame Compression and Object Boundary Encoding
Guido M. SCHUSTER U.S. Robotics
Skokie, Illinois, USA
and
Aggelos K. KATSAGGELOS Northwestern University
Evanston, Illinois, USA
SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.
A C .I.P. Catalogue record for this book is available from the Library of Congress
ISBN 978-1-4419-5172-4 ISBN 978-1-4757-2566-7 (eBook) DOI 10.1007/978-1-4757-2566-7
Printed an acid-tree paper
AII Rights Reserved © 1997 Springer Science+Business Media Dordrecht Originally published by Kluwer Academic Publishers in 1997 Softcover reprint of the hardcover lst edition 1997 No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, includ ing photocopying, record ing or by any information storage and retrieval system, without written permission from the copyright owner.
LIST OF FIGURES
LIS'r OF TABLES
Preface
1 INTRODUCTION 1.1 Motivation for video compression 1.2 Problem Statement 1.3 Contributions 1.4 Overview 1.5 Summary
CONTENTS
xi
xviii
xix
1 1 3
10
11 12
2 REVIEW OF LOSSY VIDEO COMPRESSION 13 2.1 Lossless versus lossy compression 2.2 Motion compensated waveform coding 2.3 Three dimensional waveform coding 2.4 Model-based video coding 2.5 Summary
3 BACKGROUND 3.1 Rate distortion theory 3.2 Operational rate distortion theory 3.3 Lagrangian multiplier method 3.4 Dynamic programming 3.5 Shortest path algorithm 3.6 Summary
vii
13 15 34 37 42
43 43 48 56 62 67 71
viii RATE-DISTORTION BASED VIDEO COMPRESSION
4 GENERAL CONTRIBUTIONS 73 4.1 Optimal bit allocation for dependent quantizers using the
minimum total distortion criterion 74 4.2 Very fast convex search based on a Bezier curve 78 4.3 Optimal bit allocation for dependent quantizers using the
minimum maximum distortion criterion 81 4.4 Optimal scanning path for a quad-tree decomposition 88 4.5 Optimal quad-tree decomposition with leaf dependencies 101 4.6 Summary 122
5 OPTIMAL MOTION ESTIMATION AND MOTION COMPENSATED INTERPOLATION FOR VIDEO COMPRESSION 123 5.1 Optimal region matching 124 5.2 Optimal QT-based motion estimator 134 5.3 Motion compensated interpolation 142 5.4 Summary 148
6 A VIDEO COMPRESSION SCHEME WITH OPTIMAL BIT ALLOCATION BETWEEN DISPLACEMENT VECTOR FIELD AND DISPLACED FRAME DIFFERENCE 151 6.1 Introduction 152 6.2 Notation and assumptions 154 6.3 Lossless MCVC 155 6.4 LossyMCVC 160 6.5 The minimum maximum distortion approach 161 6.6 A video compression scheme with optimal bit allocation
between DVF and DFD 162 6.7 Implementation Issues 167 6.8 Experiments 168 6.9 Summary 184
Contents lX
7 A VIDEO COMPRESSION SCHEME WITH OPTIMAL BIT ALLOCATION AMONG SEGMENTATION, MOTION AND RESIDUAL ERROR 187 7.1 Introduction 187 7.2 Notation and assumptions 189 7.3 Lossless VBSMCVC 191 7.4 Lossy VBSMCVC 194 7.5 Implementation 197 7.6 Experimental Results 200 7.7 Summary 213
8 AN OPTIMAL POLYGONAL BOUNDARY ENCODING SCHEME 217 8.1 Introduction 217 8.2 Problem Formulation 220 8.3 Distortion measures based on the maximum operator 224 8.4 Distortion measures based on the summation operator 232 8.5 Including secondary objectives 237 8.6 Extension of the admissible vertex set to off-boundary pixels 241 8.7 Multiple boundary encoding 248 8.8 Vertex encoding scheme 253 8.9 Experimental Results 259 8.10 Summary 265
REFERENCES 267
INDEX 287
LIST OF FIGURES
Chapter 1
1.1 Block diagram of a motion compensated video coder 1.2 Decomposition of the original sequence 1.3 Operational rate distortion curve
Chapter 2
2.1 Tradeoff space of video compression 2.2 Generic motion compensated video coder 2.3 Block matching for estimating the DVF 2.4 Zig-zag scan for the DCT coefficients 2.5 Block diagram of a generic MC-DCT coder 2.6 A three level pyramid video coding scheme
Chapter 3
3.1 Communication system 3.2 Operational rate distortion function 3.3 Bisection method 3.4 Rate distortion plane 3.5 Example trellis 3.6 Topologically sorted trellis 3. 7 Different stages of the shortest path algorithm
Chapter 4
4.1 Trellis for the image compression example 4.2 Continuous rate distortion curve 4.3 The R*(Dmax) function 4.4 Macro block MSE comparison
xi
5
6 7
14 16 18 30
33 36
45
49 59 61 66 69 72
78 79 85
88
xii RATE-DISTORTION BASED VIDEO COMPRESSION
4.5 Macro block quantizer step size comparison 89 4.6 Comparison between Y channels of the three approaches 90 4. 7 Frame segmented by a quad-tree 91 4.8 Quad-tree representation of the frame 91 4.9 Quad-tree notation 92 4.10 Recursive raster scan 93 4.11 Different scanning paths 94 4.12 Completely decomposed quad-tree 95 4.13 Recursive definition of the scanning path 95 4.14 Corrected raster scan 97 4.15 Optimal scanning path 97 4.16 Hilbert curve definition 99 4.17 Recursive Hilbert curve generation 100 4.18 The multilevel trellis for N = 5 and no = 3 105 4.19 Recursive rule for generating the "from" and "to" sets 107 4.20 Recursive distribution of the quad-tree encoding cost 109 4.21 Optimal path 113 4.22 Optimal Quad-tree decomposition 114 4.23 Modified Hilbert scan for a QCIF image 117 4.24 First frame encoded by H.263 119 4.25 First frame encoded by optimal mean value QT decomposition 120 4.26 Segmentation of the first frame encoded by QT decomposition 121
Chapter 5
5.1 The original 176-th and 180-th frames 129 5.2 The predicted frame and the DVF for TMN4 block matching 131 5.3 The predicted frame and the DVF when the rate is matched 132 5.4 The predicted frame and the DVF when the distortion is
matched 133 5.5 Modified Hilbert scan for level no = 3 of a QCIF frame 138 5.6 The predicted frame and the DVF when the rate is matched
for the QT-based scheme 139 5.7 The overall scanning path 140 5.8 The predicted frame and the DVF when the distortion is
matched for the QT-based scheme 141 5.9 Motion compensated interpolation of the first frame 143
List of Figures xm
5.10 The 84-th, 86-th and 88-th reconstructed frames of the "Miss America" sequence 147
5.11 The interpolated frame for the "Miss America" sequence 147 5.12 The QT segmentation and the DVF of the 86-th interpolated
frame of the "Miss America" sequence 148 5.13 The 176-th, 178-th and 180-th reconstructed frames of the
"Mother and Daughter" sequence 149 5.14 The motion compensated interpolation result for the "Mother
and Daughter" sequence 149 5.15 The QT segmentation and the DVF of the 178-th interpolated
frame of the "Mother and Daughter" sequence 150
Chapter 6
6.1 The trellis of the lossless MCVC example 159 6.2 The neighborhood needed for TMN 4 164 6.3 Rate comparison between TMN4 and the proposed coder,
where the TMN4 distortion is the target distortion of the proposed coder. 171
6.4 Rate difference between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 171
6.5 Distortion comparison between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 172
6.6 Distortion difference between TMN4 and the proposed coder, where the TMN4 distortion is the target distortion of the proposed coder. 172
6.7 The 12th reconstructed frame of the "Mother and Daughter" sequence 173
6.8 The optimal mode selection for the 16th frame of the "Mother and Daughter" sequence 174
6.9 The optimal quantizer selection for the 16th frame of the "Mother and Daughter" sequence 175
6.10 The optimal motion vector field for the 16th frame of the "Mother and Daughter" sequence. 176
6.11 Rate comparison between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 179
XIV RATE-DISTORTION BASED VIDEO COMPRESSION
6.12 Rate difference between TMN4 and the proposed coder, where the TMN 4 rate is the target rate of the proposed coder. 179
6.13 Distortion comparison between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 180
6.14 Distortion difference between TMN4 and the proposed coder, where the TMN4 rate is the target rate of the proposed coder. 180
6.15 Rate comparison between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 182
6.16 Rate difference between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 182
6.17 Distortion comparison between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 183
6.18 Distortion difference between TMN4 and the proposed coder, where the distortion of the proposed coder is fixed. 183
Chapter 7
7.1 The multilevel trellis for N = 5 and n0 = 3 193 7.2 Rate comparison between TMN4 and the optimal coder, where
the TMN4 distortion is the target distortion of the optimal coder. 202
7.3 Rate difference between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 202
7.4 Distortion comparison between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 203
7.5 Distortion difference between TMN4 and the optimal coder, where the TMN4 distortion is the target distortion of the optimal coder. 203
7.6 The 12th reconstructed frame of the "Mother and Daughter" sequence. This frame is used to predict the 16th frame. 204
7.7 The optimal mode selection for the 16th frame of the "Mother and Daughter" sequence 205
7.8 The optimal motion vector field for the 16th frame of the "Mother and Daughter" sequence. 206
7.9 Rate comparison between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 208
List of Figures xv
7.10 Rate difference between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 208
7.11 Distortion comparison between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 209
7.12 Distortion difference between TMN4 and the optimal coder, where the TMN4 rate is the target rate of the optimal coder. 209
7.13 Rate comparison between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 211
7.14 Rate difference between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 211
7.15 Distortion comparison between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 212
7.16 Distortion difference between TMN4 and the optimal coder, where the distortion of the optimal coder is fixed. 212
7.17 Optimal quad-tree segmentation and encoding modes for the 80th frame of the "Miss America" sequence 214
7.18 Optimal inhomogeneous motion vector field for the 80th frame of the "Miss America" sequence 215
Chapter 8
8.1 Interpretation of the boundary and the polygon approxima-tion as a fully connected weighted directed graph 226
8.2 Examples of polygons with rapid changes in direction. 227 8.3 Interpretation of the boundary and the polygon approxima-
tion as a weighted directed graph 228 8.4 The R*(Dma:z:) function, which is a non-increasing function
exhibiting a staircase characteristic 231 8.5 Pruned decision tree for the encoding of a boundary 236 8.6 The "band" concept 242 8. 7 Vector increments for the ordering of the admissible vertices 245 8.8 Distance as a function of the index 246 8.9 Result of the ordering algorithm 247 8.10 Pruned decision tree for the optimal encoding of three bound-
aries 252 8.11 Improved orientation encoding 258 8.12 Original segmentation 260 8.13 Optimal segmentation for Dma:z: = 1 pixel 260
xvi RATE-DISTORTION BASED VIDEO COMPRESSION
8.14 Optimal segmentation for Rma:z: = 280 bits 262 8.15 Closeup of the lower boundary 262 8.16 Lagrangian multiplier approach 263 8.17 Pruning approach 263 8.18 Comparison between the Lagrangian approach and the prun-
ing approach 264 8.19 Operational rate distortion function 264 8.20 Optimal extended segmentation for Dma:z: = 1 pixel 265
LIST OF TABLES
Chapter 4
4.1 Statistics of the two paradigms 4.2 Code word length for DC prediction error 4.3 Comparison between fixed block sizes
Chapter 5
5.1 Comparison between optimal motion estimators
Chapter 6
6.1 Average rate distortion comparison for the "Mother and Daughter" sequence between TMN4 and the proposed coder for dif-
87 116 116
130
ferent modes of operation 177 6.2 Average rate comparison for the "Mother and Daughter" se-
quence between TMN4 and the distortion matched proposed coder with differently constrained search spaces 178
6.3 Average rate distortion comparison for the "Mother and Daugh-ter" sequence between TMN6 and the proposed coder for dif-ferent modes of operation 184
Chapter 7
7.1 Average rate distortion comparison for the "Mother and Daughter" sequence between TMN4 and the proposed optimal coder for different modes of operation 207
7.2 Average rate distortion comparison for the "Mother and Daugh-ter" sequence between TMN6 and the proposed optimal coder for different modes of operation 216
xvii
PREFACE
One of the most intriguing problems in video processing is the removal of the redundancy or the compression of a video signal. There are a large number of applications which depend on video compression. Data compression represents the enabling technology behind the multimedia and digital television revolution.
In motion compensated lossy video compression the original video sequence is first split into three new sources of information, segmentation, motion and residual error. These three information sources are then quantized, leading to a reduced rate for their representation but also to a distorted reconstructed video sequence. After the decomposition of the original source into segmentation, motion and residual error information is decided, the key remaining problem is the allocation of the available bits into these three sources of information. In this monograph a theory is developed which provides a solution to this fundamental bit allocation problem. It can be applied to all quad-tree-based motion compensated video coders which use a first order differential pulse code modulation (DPCM) scheme for the encoding of the displacement vector field (DVF) and a block-based transform scheme for the encoding of the displaced frame difference (DFD). An optimal motion estimator which results in the smallest DFD energy for a given bit rate for the encoding of the DVF is also a result of this theory. Such a motion estimator is used to formulate a motion compensated interpolation scheme which incorporates a global smoothness constraint for the DVF.
Several algorithms of general nature pertaining to the problem mentioned in the previous paragraph are also presented in this monograph. Among them is an optimal bit allocation scheme for dependent quantizers with arbitrary dependencies, a very fast convex search based on a Bezier curve, an optimal bit allocation scheme for dependent quantization using the minimum-maximum distortion criterion, an optimal scanning path for quad-trees, and an optimal quad-tree decomposition with leaf dependencies.
The optimal bit allocation problem is a rather old one in the information theory community. The optimal bit allocation problem as part of the design of a
xix
XX RATE-DISTORTION BASED VIDEO COMPRESSION
coder and decoder ( codec), or the design of a co dec in the rate-distortion (RD) sense has became a quite popular problem in the last few years. Such a problem becomes extremely "critical" when codecs are designed to operate at low bit-rates, as represented by one of the functionalities of the ongoing standardizations effort MPEG4. Various other related encoding problems in the R-D sense have also become quite popular.
One such problem is the encoding of a boundary by either minimizing the bit rate given an acceptable level of distortion, or minimizing the distortion for a given bit budget. Suc~1 a problem plays an important role in object oriented video coding. In this work a polygonal approximation of the original boundary is used and two solutions are developed. With one of them the vertices of the polygon coincide with boundary points, while with the other one they are allowed to be located inside a band formed around the original boundary. Several different classes of distortion measures are investigated which result in different algorithms. Most of these algorithms are based on a weighted directed acyclic graph formulation of the problem.
This monograph was originally written by Guido M. Schuster as a doctoral thesis under the supervision of Prof. Aggelos K. Katsaggelos at Northwestern University. We gratefully acknowledge the financial support from Motorola Inc. and Northwestern University. The discussion with the members of the Motorola Visual Communications Group, especially SteveN. Levine, James C. Brailean, Mark R. Hanham, Chueng Auyeung and Kevin O'Connell, have resulted in many important inputs to this research effort. Finally, Guido Schuster wants to thank his wonderful wife, Prof. Dawn Barnes-Schuster. She did not only support him morally but she also spend many nighttime hours with him in the Lab. completing this work.
Guido M. Schuster and Aggelos K. Katsaggelos