GEOMETRICAL REPRESENTATION, PROCESSING, AND CODING …

GEOMETRICAL REPRESENTATION, PROCESSING, AND CODINGOF VISUAL INFORMATION

Arthur Luiz Amaral da Cunha, Ph.D.Department of Electrical and Computer Engineering

University of Illinois at Urbana-Champaign, 2007Minh N. Do, Adviser

In recent years there has been considerable effort to construct representations for

visual information that exploits geometrical structure. The task of constructing mul-

tidimensional transforms that satisfy some optimality condition such as nonlinear ap-

proximation is ongoing. In this dissertation we make several contributions toward this

work.

In the first part of the dissertation, we propose a class of filter banks that have the

property of directional vanishing moments on the filters. This design criterion is an al-

ternative to the often used frequency localization property. The directional vanishing

moment property, we show, ensures that the filter annihilates directional information,

similar to the wavelet filter that annihilates smooth signals in one dimension. Our

technique produces filters that perform similarly to conventional ones, but that have

significantly shorter support size.

In the second part of the dissertation, we propose a multiscale, multidirection, shift-

invariant, and redundant transform that we call the nonsubsampled contourlet transform.

The redundant transform is tailored to applications where overcompleteness is an advan-

tage, for example image denoising and enhancement. This transform is studied in detail

and an associated filter design methodology is developed. The proposed design ensures

that the transform basis functions are regular, directional, and anisotropic.

In the third part of this dissertation we propose a model for the information rates of

the plenoptic function. Samples of the plenoptic function (POF) are seen in video and

in general visual content, and represent large amounts of information. We distinguish

between two cases, depending on whether the spatial positions of samples of the POF

are known.

In the first case, the video coding case, the spatial locations of the POF are not known.

In this case, we propose a stochastic model for video and compute its information rates.

We model camera motion with discrete random walks. We use information theoretic tools

to precisely characterize how such recurrences affect the overall bitrate. Both lossless and

lossy information rates are derived.

In the second case, we use a simple model to show that the information rates of the

POF are equivalent to the information rates of the scene around it.

GEOMETRICAL REPRESENTATION, PROCESSING, AND CODINGOF VISUAL INFORMATION

BY

ARTHUR LUIZ AMARAL DA CUNHA

Engenheiro, University of Brasilia, 2000Mestrado, Pontifical Catholic University of Rio de Janeiro, 2002

DISSERTATION

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Electrical and Computer Engineering

in the Graduate College of theUniversity of Illinois at Urbana-Champaign, 2007

Urbana, Illinois

ABSTRACT

In recent years there has been considerable effort to construct representations for

visual information that exploits geometrical structure. Such representations have the

potential to improve image and video processing, understanding, and practice on various

fronts including compression, denoising, and enhancement. The task of constructing

multidimensional transforms that satisfy some optimality condition such as nonlinear

approximation is ongoing. In this dissertation we make several contributions toward this

work.

In the first part of the dissertation, we propose a class of filter banks that have the

property of directional vanishing moments on the filters. Such property is a generalization

of the vanishing moment property in one-dimensional filter banks, and is characterized

by a simple design criterion. This design criterion is an alternative to the often used

frequency localization property. The directional vanishing moment property, we show,

ensures that the filter annihilates directional information, similar to the wavelet filter

that annihilates smooth signals in one dimension. Our technique produces filters that

perform similarly to conventional ones, but that have significantly shorter support size.

The images reconstructed after coefficient truncation exhibit considerably fewer ringing

artifacts. In denoising experiments, the filters proposed outperform the best ones in the

literature while being less complex at the same time.

In the second part of the dissertation, we propose a multiscale, multidirection, shift-

invariant, and redundant transform that we call the nonsubsampled contourlet transform.

The redundant transform is tailored to applications where overcompleteness is an advan-

tage, for example image denoising and enhancement. This transform is studied in detail

and an associated filter design methodology is developed. The proposed design ensures

iii

that the transform basis functions are regular, directional, and anisotropic. Furthermore,

we propose a fast implementation of the transform and study its application in image

denoising, where the nonsubsampled contourlet transform compares favorably to other

similar decompositions in the literature.

In the third part of this dissertation we propose a model for the information rates of

the plenoptic function. The plenoptic function (Adelson and Bergen, 91) describes the

visual information available to an observer at any point in space and time. Samples of the

plenoptic function (POF) are seen in video and in general visual content, and represent

large amounts of information. We distinguish between two cases, depending on whether

the spatial positions of samples of the POF are known.

In the first case, the video coding case, the spatial locations of the POF are not known.

In this case, we propose a stochastic model for video and compute its information rates.

The model has two sources of information representing ensembles of camera motion and

visual scene data (i.e., “realities”). The sources of information are combined, generat-

ing a vector process that we study in detail. We model camera motion with discrete

random walks. Recurrences are a key property associated with a random walk, and are

also observed in some video sequences. We use information theoretic tools to precisely

characterize how such recurrences affect the overall bitrate. Both lossless and lossy in-

formation rates are derived. The model is further extended to account for realities that

change over time. We derive bounds on the lossless and lossy information rates for this

dynamic reality model, stating conditions under which the bounds are tight.

In the second case, we use a simple model to show that the information rates of the

POF are equivalent to the information rates of the scene around it. That is, a random

traversal of the plenotic function for the purpose of rendering at a receiver results in the

information rate of the surrounding scene.

iv

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Signal Expansions and Filter Banks . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Signal expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Filter banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The Challenge of Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 The Plenoptic Function and Its Information Rates . . . . . . . . . . . . . 41.4 Problem Statement and Contributions . . . . . . . . . . . . . . . . . . . 41.5 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 7

CHAPTER 2 FILTER BANKS WITH DIRECTIONAL VANISHINGMOMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Directional Annihilating Filters . . . . . . . . . . . . . . . . . . . . . . . 132.3 Two-channel Filter Banks with Directional Vanishing Moments . . . . . 16

2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.2 Two-channel filter banks with DVMs . . . . . . . . . . . . . . . . 182.3.3 Characterization of the product filter . . . . . . . . . . . . . . . . 23

2.4 Design via Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4.1 Design procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4.2 Filter size analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.3 Design examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5 Tree-Structured Filter Banks with Directional Vanishing Moments . . . 322.6 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6.1 Annihilating directional edges . . . . . . . . . . . . . . . . . . . . 352.6.2 Nonlinear approximation with the contourlet transform . . . . . . 372.6.3 Image denoising with the contourlet transform . . . . . . . . . . . 39

2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.8 The Equivalence Between Ladder and Mapping Designs . . . . . . . . . 41

vi

CHAPTER 3 THE NONSUBSAMPLED CONTOURLET TRANSFORM:THEORY, DESIGN, AND APPLICATIONS . . . . . . . . . . . . . . . 433.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2 Nonsubsampled Contourlets and Filter Banks . . . . . . . . . . . . . . . 46

3.2.1 The nonsubsampled contourlet transform . . . . . . . . . . . . . . 463.2.1.1 The nonsubsampled pyramid (NSP) . . . . . . . . . . . 463.2.1.2 The nonsubsampled directional filter bank (NSDFB) . . 483.2.1.3 Combining the nonsubsampled pyramid and

nonsubsampled directional filter bank in the NSCT . . . 493.2.2 Nonsubsampled filter banks . . . . . . . . . . . . . . . . . . . . . 513.2.3 Frame analysis of the NSCT . . . . . . . . . . . . . . . . . . . . . 52

3.3 Filter Design and Implementation . . . . . . . . . . . . . . . . . . . . . . 553.3.1 Implementation through lifting . . . . . . . . . . . . . . . . . . . 563.3.2 Pyramid filter design . . . . . . . . . . . . . . . . . . . . . . . . . 573.3.3 Fan filter design . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.3.4 Design examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.5 Regularity of the NSCT basis functions . . . . . . . . . . . . . . . 63

3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4.1 Image denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4.1.1 Comparison to other transforms . . . . . . . . . . . . . . 663.4.1.2 Comparison to other denoising methods . . . . . . . . . 69

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

CHAPTER 4 THE INFORMATION RATES OF THE PLENOPTICFUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.1.2 Prior art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.1.3 Chapter contributions . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2 Definitions and Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . 774.2.1 The video coding problem . . . . . . . . . . . . . . . . . . . . . . 794.2.2 Properties of the random walk . . . . . . . . . . . . . . . . . . . 79

4.3 Information Rates for a Static Reality . . . . . . . . . . . . . . . . . . . . 804.3.1 Lossless information rates for discrete memoryless wall . . . . . . 804.3.2 Memory constrained coding . . . . . . . . . . . . . . . . . . . . . 854.3.3 Lossy information rates . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4 Information Rates for Dynamic Reality . . . . . . . . . . . . . . . . . . . 894.4.1 Lossless information rates . . . . . . . . . . . . . . . . . . . . . . 904.4.2 Lossy information rates for the AR(1) random field . . . . . . . . 98

4.5 The Recording Reality Case . . . . . . . . . . . . . . . . . . . . . . . . . 1024.5.1 A possible code: Shannon + run-length . . . . . . . . . . . . . . . 1064.5.2 Coding with a finite buffer . . . . . . . . . . . . . . . . . . . . . . 108

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

vii

CHAPTER 5 CONCLUSION AND FUTURE DIRECTIONS . . . . . 1125.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.2.1 Filter banks with directional vanishing moments . . . . . . . . . . 1145.2.2 The nonsubsampled contourlet transform . . . . . . . . . . . . . . 1145.2.3 Complex contourlet transform . . . . . . . . . . . . . . . . . . . . 1155.2.4 Information rates of the plenoptic function . . . . . . . . . . . . . 116

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

AUTHOR’S BIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . 129

viii

LIST OF FIGURES

Figure Page

1.1 Pictorial description of a problem studied in this dissertation. There isa camera following a random trajectory through the plenoptic function.The camera motion adds to the complexity of the dynamic scene beingcontinuously acquired. The underlying scene is either static, or containsmoving objects, or changes over time. . . . . . . . . . . . . . . . . . . . 5

1.2 Filter design using the mapping approach. The 1-D filter bank is mappedto a 2-D filter bank. The mapping function is such that important proper-ties of the 1-D filter bank such as phase linearity and perfect reconstructionare preserved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 The directional polyphase representation. Here u = (2, 1)T and r1 =(0, 0)T , r2 = (1, 1)T , r3 = (0, 1)T , and r4 = (1, 0)T . The directionalpolyphase decomposition splits the signal into 1-D subsignals sampledalong the direction u. Those signals tile the whole 2-D discrete plane.We highlight in the picture some of the subsignals. . . . . . . . . . . . . 15

2.2 Illustration of line zero moments as an edge annihilator. The piecewisepolynomial image in (a) was filtered with a 2-D filter C(z1, z2) = (1 − z1z2

2)3.

The output image (b) pixels are approximately zero. . . . . . . . . . . . 172.3 Change of variable is equivalent to a pre/post resampling operation plus

filtering with modified filter. (a) Filter with DVM along u. (b) Equivalentfiltering structure with horizontal DVM. . . . . . . . . . . . . . . . . . . 20

2.4 Filter banks with DVMs along a fixed arbitrary direction are equivalentto a filter bank with DVMs along the horizontal direction. (a) Filter bankin which the filters have DVMs along the direction u. (b) The equivalentfilter bank with DVMs along the horizontal direction. Note that U isconstructed according to Proposition 2 and S = US. . . . . . . . . . . . 21

2.5 Frequency response of analysis and synthesis filters designed with fourth-order directional vanishing moment. The filters degenerate to the 9-7wavelet filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6 Frequency response of analysis and synthesis filters designed with second-order directional vanishing moment. . . . . . . . . . . . . . . . . . . . . . 31

2.7 Two types of prototype fan filter banks used in the DFB expansion tree.Each filter bank has one of its branches featuring a DVM. . . . . . . . . 32

ix

2.8 The DVM directional filter bank. (a) The four-channel DFB with type 0(horizontal) and type 1 (vertical) DVM filter banks. (b) The four-channelequivalent filter bank. The equivalent filter bank has DVMs in three dif-ferent directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.9 Directional vanishing moments on equivalent filters of a 16-channel DFB.Different arrangements of type 0 and type 1 fan filter banks lead to differentnumbers of distinct directions. Each distinct direction is numbered. (a)Tree 1 has 8 distinct directions. (b) Tree 2 has 7 distinct directions. (c)Tree 3 has 6 distinct directions. . . . . . . . . . . . . . . . . . . . . . . . 34

2.10 Equivalent filters in an 8-channel DFB using two-channel filter banks withDVM. The filters are the ones designed in Example 2. Notice the goodfrequency localization in addition to the imposed DVMs (red line). . . . 35

2.11 Decomposition of synthetic image using two schemes. (a) Original image.(b) Wavelet decomposition. (c) DVM decomposition. . . . . . . . . . . . 36

2.12 The DVM Haar filters. In (a) the response of the filter H0(z) = (1 −z−11 z−3

2 )/√

2 is shown. Notice the single DVM that is replicated due toperiodicity. In (b) we have the response of the other Haar filter H10(z) =(1 − z−2

1 z2)/√

2 also used in the experiment. . . . . . . . . . . . . . . . . 372.13 Nonlinear approximation behavior of the contourlet transform with DVM

filters for a toy image. (a) Synthetic piecewise polynomial image. (b) NLAcurves (on a semilog scale). This simple toy image is better representedby the contourlet transform with DVM filters. . . . . . . . . . . . . . . . 38

2.14 Nonlinear approximation behavior of the contourlet transform with DVMfilters for natural images. NLA curves (on a semilog scale) for “Peppers”(a) and “Barbara” (b) images. . . . . . . . . . . . . . . . . . . . . . . . . 39

2.15 “Peppers” image reconstructed with 2048 coefficients. (a) PKVA filters,PSNR = 26.05 dB (b) DVM filters of Example 1, PSNR = 26.76 dB. Theimage on the right shows less ringing artifacts. . . . . . . . . . . . . . . . 40

3.1 The nonsubsampled contourlet transform. (a) Nonsubsampled filter bankstructure that implements the NSCT. (b) The idealized frequency parti-tioning obtained with the proposed structure. . . . . . . . . . . . . . . . 47

3.2 The proposed nonsubsampled pyramid is a 2-D multiresolution expan-sion similar to the 1-D nonsubsampled wavelet transform. (a) A three-stage pyramid decomposition. The lighter gray regions denote the aliasingcaused by upsampling. (b) The subbands on the 2-D frequency plane. . 47

3.3 A four-channel nonsubsampled directional filter bank constructed withtwo-channel fan filter banks. (a) Filtering structure. The equivalent filterin each channel is given by Ueq

k (z) = Ui(z)Uj(zQ). (b) Correspondingfrequency decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . 49

x

3.4 The need for upsampling in the NSCT. (a) With no upsampling, the high-pass at higher scales will be filtered by the portion of the directional filterthat has “bad” response. (b) Upsampling ensures that filtering is done inthe “good” region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5 The two-channel nonsubsampled filter banks used in the NSCT. The sys-tem is two times redundant and the reconstruction is error free when thefilters satisfy Bezout’s identity. (a) Pyramid NSFB. (b) Fan NSFB. . . . 51

3.6 Lifting structure for the nonsubsampled filter bank designed with the map-ping approach. The 1-D prototype is factored with the Euclidean algo-rithm. The 2-D filters are obtained by replacing x #→ f(z). . . . . . . . . 56

3.7 Magnitude response of the filters designed in Example 3 with maximallyflat filters. The nonsubsampled pyramid filter bank underlies almost tightanalysis and synthesis frames. . . . . . . . . . . . . . . . . . . . . . . . . 63

3.8 Fan filters designed with prototype filters of Example 3 and diamond max-imally flat mapping filters. . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.9 Basis functions of the nonsubsampled contourlet transform. (a) Basis func-tions of the second stage of the pyramid. (b) Basis functions of third (top8) and fourth (bottom 8) stages of the pyramid. . . . . . . . . . . . . . . 67

3.10 Image denoising with the NSCT and hard thresholding. The noisy inten-sity is 20. (a) Original Lena image. (b) Denoised with the NSWT, PSNR= 31.40 dB. (c) Denoised with the curvelet transform and hard thresh-olding, PSNR = 31.52 dB. (d) Denoised with the NSCT, PSNR = 32.03dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.11 Comparison between the NSCT-LAS and BLS-GSM denoising methods.The noise intensity is 20. (a) Original Barbara image. (b) Denoised withthe BLS-GSM method, PSNR = 30.28 dB. (c) Denoised with NSCT-LAS,PSNR = 30.60 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.1 The problem under consideration. There is a world and a camera thatproduces a “view of reality” that needs to be coded with finite or infinitememory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 A stochastic model for video. (a) Simplified model. (b) The resultingvector process V . Each sample of the vector process is a block of L samplesfrom the process X taken at the position indicated by the random walkWt. In the figure L = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.3 Bounds on information rate. (a) Lower and upper bounds as a function ofpW for the binary wall with pX = 1/2 and L = 8. . . . . . . . . . . . . . 84

4.4 Memory constrained coding. Difference H(V )−H(VM |V M) as a functionof M . When pW = 0.5, the bit rate can be lowered significantly at thecost of large memory. A moderate bit rate reduction is obtained with smallvalues of M when pW = 0.1. The curves are computed using Theorem 1for X uniform over an alphabet of size 256. . . . . . . . . . . . . . . . . 86

xi

4.5 A model for the dynamic reality. (a) It entails a random field that isMarkov in the time dimension t, and i.i.d. in the spatial dimension n. (b)Motion then occurs within this random field. . . . . . . . . . . . . . . . 91

4.6 The binary random field. Innovations are in the form of bit flips causedby binary symmetric channels between consecutive time instants. . . . . 94

4.7 The binary symmetric innovations. (a) The curves show the lower andupper bounds on the entropy rate. Notice that the bounds are sharp forvarious values of pI . (b) Contour plots of the upper bound for various pI

and pW . The lines indicate points of similar entropy but with differentamounts of spatial and temporal innovation. . . . . . . . . . . . . . . . 96

4.8 Memory and Innovations. Shown is the difference between the conditionalentropy and the true entropy for the binary innovations with pX = 0.5,pW = 0.5, and L = 8. The curves show the intuitive fact that when thebackground changes too rapidly, there is little to be gained in bitrate byutilizing more memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.9 Differential entropy bounds for the Gaussian AR(1) case as function ofthe innovation parameter ρ. In this example Pe is small enough that thelower and upper bounds practically coincide. Note that the slope of thedifferential entropy curve is influenced by the value of pW . . . . . . . . . 99

4.10 Performance of DPCM with motion for various ρ and pW . For ρ = 0.99 andρ = 0.9 the upper bound is valid for SNR greater than 23 dB and 12.8 dB,respectively. (a) Memory provide considerable gains, pW = 0.5, ρ = 0.99.(b) Modest gains when pW = 0.1. (c) Modest gains when ρ = 0.9, asbackground changes too rapidly. . . . . . . . . . . . . . . . . . . . . . . 103

4.11 The proposed code for the trajectory. The proposed code with buffer sizeK attains an entropy rate of roughly 1/K. Notice that when K is infinity,the code attains the entropy rate bound as the number of samples j goesto infinity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.1 The idealized analytic complex transform. The frame elements are sup-ported on the first and third quadrants of the frequency plane. The realand imaginary parts of each atom are supported in the whole plane fol-lowing the dashed boundaries. . . . . . . . . . . . . . . . . . . . . . . . . 115

5.2 Complex contourlet transform basis functions (4 out of 8 directions shown).real and imaginary parts on top and bottom, respectively. Note the dif-ferent symmetry of the real and imaginary parts. . . . . . . . . . . . . . 116

xii

LIST OF TABLES

Table Page

2.1 Improvement in image denoising. . . . . . . . . . . . . . . . . . . . . . . 41

3.1 Frame bounds evolving with scale for the pyramid filters given in Example3 in Section 3.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.2 Maximally flat mapping polynomials used in the design of the nonsubsam-pled fan filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3 Denoising performance of the NSCT. The left-most columns are hardthresholding and the right-most ones soft estimators. For hard thresh-olding, the NSCT consistently outperforms curvelets and the NSWT. TheNSCT-LAS performs on a par with the more sophisticated estimator BLS-GSM and is superior to the BivShrink estimator of. . . . . . . . . . . . 68

3.4 Relative loss in PSNR performance (dB) when using the NSP with a criti-cally sampled DFB and the LAS estimator with respect to the NSCT-LASmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xiii

CHAPTER 1

INTRODUCTION

1.1 Signal Expansions and Filter Banks

1.1.1 Signal expansions

In a variety of signal processing applications, processing can be done more efficiently

over the domain of an invertible linear transform. Early examples of such transforms

in signal processing are the discrete fourier transform (DFT) and the discrete cosine

transform (DCT). A more recent example is the discrete wavelet transform (DWT),

which has been proven effective in a wide range of applications (see e.g., [1]). Transforms

such as these are characterized by a set of vectors that form an orthogonal basis for the

underlying Hilbert space. The DWT has the attractive feature of being multiscale. That

is, it decomposes the signal in several scales, each one characterized by a set of basis

vectors [2]. The multiscale property enables the transform to highlight different features

in different scales, thereby facilitating processing.

An important feature of orthogonal bases is that they are nonredundant. This means

that for finite signals, the number of samples of the transformed signal is the same as

that of the input. This is a crucial property in compression applications. By contrast,

a frame is a redundant transform such that its output for a given input signal can be

reconstructed in a stable way [2, 3].

A frame is characterized by a set of vectors that are linear dependent. This set often

leads to a straightforward expansion resembling that of an orthogonal or biorthogonal

1

basis. Frames are typically better alternatives to orthogonal basis in applications where

redundancy is not a major issue. In addition, in some cases the design of frame systems

can be considerably easier than basis due to the smaller number of constraints.

1.1.2 Filter banks

Filter banks have been a very active research subject in the last 20 years. Since the

discovery of perfect reconstruction filter banks [4–6], great progress has been made. This

culminated in a number of books on the subject [1, 7–10]. Noteworthy is the connection

between tree-structured filter banks and orthogonal wavelet bases [2, 11]. This link

provides an easy interchange between continuous and discrete time that is useful in

understanding and in applications.

Perfect reconstruction filter banks can be critically sampled or oversampled. Critically

sampled filter banks are nonexpansive and underlie orthogonal or biorthogonal bases in

discrete time. In an oversampled filter bank [12, 13], the number of samples in the

output signal is greater than that of the input (i.e., the analysis part is expansive).

Thus, oversampled filter banks implement redundant expansions that under additional

conditions can constitute frames.

Among oversampled filter banks is the nonsubsampled filter bank where the redun-

dancy is given by the number of channels in the bank. Because nonsubsampled filter

banks do not have downsamplers and upsamplers, they have the property of being shift-

invariant. This property is hard to obtain with critically sampled filter banks in that it

requires ideal filters which are nonrealizable.

The theoretical aspects of filter banks are well understood in general, both in one and

several dimensions. However, the design tools existent in the literature mostly focus on

the 1-D and critically sampled case. Recently there have been several methods to design

oversampled 1-D filter banks in the context of framelets [14, 15]. For the multidimensional

case, there are very few design methodologies for both critically sampled and oversampled

cases.

2

1.2 The Challenge of Geometry

Natural images are rich in geometric structure. Yet most image transforms employed

in practice are “geometry blind.” Such transforms are typically constructed with basis

functions that are tensor products of one-dimensional basis functions. The transform

is thus computed on a row-column processing. In words, it treats a 2-D signal as a

collection of 1-D ones. Hence, it fails to exploit regularity on edges, contours, and other

geometrical features of the image.

In recent years, there have been several efforts toward constructing transforms that

exploit geometric structure. For example, Mallat and others constructed the bandelet

transform [16, 17], which is an adaptive transform – it adapts the basis vectors according

to the signal being represented. The wedgelet transform is also adaptive. It tiles smooth

geometry with building block wedge-like tiles [18, 19]. The curvelet transform is a fixed

transform that is a frame. Despite being nonadaptive, the curvelet transform essentially

attains the same theoretical performance of bandelets and wedgelets. Do and Vetterli

[20] constructed the contourlet transform. Contourlets are curvelets’ discrete-time cousin.

The notable feature of contourlets is that the transform can be efficiently computed with

filter banks.

On a related front, signal processing researchers have long tried to build transforms

with additional directionality. For example, the steerable pyramid [21] is a multiscale

transform that has better directional resolution than the separable DWT. The directional

filter bank of Bamberger and Smith is a directional decomposition that can be computed

with quincunx filter banks [22, 23]. The complex wavelet transform of [24] improves

the directional resolution of wavelets while being complex-valued at the same time. This

results in an almost shift-invariant transform with low redundancy that can be computed

efficiently.

3

1.3 The Plenoptic Function and Its Information Rates

The problem of sensing visual information for storage and later reproduction can be

cast in terms of sampling and compressing the plenoptic function [25]. Given a 3-D scene,

the plenoptic function describes the light intensity passing through every viewpoint, in

every direction, for all time, and for every wavelength. It is usually denoted by

POF (x, y, z,φ,ϕ, t,λ),

where (x, y, z) is the point in Euclidean space being considered, t is time, λ is the light

ray wavelength, and the angles (φ,ϕ) characterize the direction at which the light ray

hits the point (x, y, z).

The sampling part of the plenoptic function (POF) has received a lot of attention

recently. In [26] a sampling framework based on epipolar geometry is proposed, while

in [27], the plenoptic function is shown to have infinite bandwidth. In [28], a spectral

analysis of the sampling problem for the plenoptic function is presented.

Compression schemes for several simplifications of the POF are reviewed in [29].

While several algorithms to compress the POF have been proposed, a sound theoretical

understanding of the associated source coding problem is still lacking.

In some applications, the POF falls into the setup shown in Figure 1.1. In it, there is

a camera traversing the plenoptic function. As it moves through the scene, the camera

generates a process that needs to be coded and reproduced at a decoder. The process

can constitute a sequence of snapshots, for example in the case of video. Information

can also be acquired for the purpose of reproducing the scene around the camera, such

as, for example, in the light field.

1.4 Problem Statement and Contributions

Given the context outlined in the previous section, we consider in this dissertation

the following unresolved problems:

4

Figure 1.1 Pictorial description of a problem studied in this dissertation. There isa camera following a random trajectory through the plenoptic function. The cameramotion adds to the complexity of the dynamic scene being continuously acquired. Theunderlying scene is either static, or contains moving objects, or changes over time.

• Is is known that the DWT is the optimal1 representation for 1-D piecewise smooth

signals [1]. This is due to vanishing moments in the filter bank that ensure the

smooth part is zeroed out in the highpass branch. Is there a similar property for

edges in 2-D signals? In this case, how can the corresponding filter bank be designed?

• Given the lack of directionality of the nonsubsampled wavelet transform, we seek to

design and construct a transform that is multiscale, multidirectional, shift-invariant,

and can be implemented using a fast computational algorithm.

• Consider the plenoptic function. We seek to quantify its compression limits. A par-

ticular case of the plenoptic function is that of video. How to construct a statistical

1In the Nonlinear Approximation sense. Let the signal be reconstructed with the N largest-magnitudetransform coefficients. The transform is optimal in the NLA sense if the decay of the mean-squared error(MSE) as a function of N is the fastest possible for that signal.

5

model for a scene that has motion in it such that the information rates such as en-

tropy and rate-distortion can be computed with precision? Another common setup

of the POF is the light field [30, 31]. What are the information rates associated

with this problem?

To address the problems outlined above several contributions are made. These con-

tributions are summarized below:

• We propose a new filter design criterion for multiple dimensions, that is, filters with

directional vanishing moments.

• We characterize the eigensignals of such filters. We also develop a design method-

ology that can be extended to any number of dimensions.

• We propose a new transform construction that is fully shift-invariant. We study this

transform in detail and provide methods for its design and efficient computation.

• We study the compression problem of the plenoptic function. We propose a statis-

tical model for a camera traversing the plenoptic function. Within this model, we

distinguish between two coding problems, that of video and that of the light field.

We then propose a stochastic model for video whereby information rates can be

precisely computed. We also characterize the information rate in the case of the

light field.

Perhaps the greatest challenge in constructing transforms that better handle geometry

and that can be computed with filter banks is filter design. The design of 2-D filter banks

still lacks a definitive tool. The main impairment is the absence of a factorization theorem

in multiple dimensions [32].

In this dissertation we use the mapping approach to design filters. This method is

essentially illustrated in Figure 1.2, for the critically sampled case. It extends easily to

the nonsubsampled case. Despite its simplicity, the mapping design has several advan-

tages over other methods. In particular, it offers seamless control over frequency and

6

phase responses, regularity is easily attained, and it provides filter banks that can be

implemented with a fast algorithm.

H0(z)

H1(z)

G0(z)

G1(z)

x x

y0

y1

↓ 2

↓ 2

↑ 2

↑ 2

(a) 1-D Filter Bank

Mapping z #→ f(z)

y0

y1

H0(z)

H1(z)

G0(z)

G1(z)

x x

↓ S

↓ S

↑ S

↑ S

(b) 2-D Filter Bank

Figure 1.2 Filter design using the mapping approach. The 1-D filter bank is mappedto a 2-D filter bank. The mapping function is such that important properties of the 1-Dfilter bank such as phase linearity and perfect reconstruction are preserved.

1.5 Dissertation Organization

In Chapter 2 we propose filter banks with directional vanishing moments. To guaran-

tee good nonlinear approximation behavior, the directional filters in the contourlet filter

bank require sharp frequency response; this requires a large support size for the filters.

7

We seek to isolate the key filter property that ensures good approximation. In this di-

rection, we propose filters with directional vanishing moments (DVM). These filters, we

show, annihilate information along a given direction. We study two-channel filter banks

with DVM filters. We provide conditions under which the design of DVM filter banks

is possible. A complete characterization of the product filter is thus obtained. We pro-

pose a design framework that avoids two-dimensional factorization using the mapping

technique. The filters designed, when used in the contourlet transform, exhibit nonlinear

approximation comparable to the conventional filters while being shorter and therefore

provide better visual quality with less ringing artifacts.

In Chapter 3 we develop the nonsubsampled contourlet transform (NSCT) and study

its applications. The construction proposed in this chapter is based on a nonsubsampled

pyramid structure and nonsubsampled directional filter banks. The result is a flexible

multiscale, multidirection, and shift-invariant image decomposition that can be efficiently

implemented via the “a trous” algorithm. At the core of the proposed scheme is the

nonseparable two-channel nonsubsampled filter bank. We exploit the less stringent design

condition of the nonsubsampled filter bank to design filters that lead to a NSCT with

better frequency selectivity and regularity when compared to the contourlet transform.

We propose a design framework based on the mapping approach, that allows for a fast

implementation based on a lifting or ladder structure, and only uses one-dimensional

filtering in some cases. In addition, our design ensures that the corresponding frame

elements are regular, symmetric, and the frame is close to a tight one. We assess the

performance of the NSCT in image denoising. The NSCT compares favorably to other

existing methods in the literature.

In Chapter 4 we propose a model to study information rates of the plenoptic function.

The POF setup enables us to construct a stochastic model for video generation. This

model is studied with information theoretic tools, and the associated information rates

are computed. We extend this video model to make it account for dynamic changes

in a scene. Experiments with synthetic sources using DPCM coding suggest that the

introduction of motion in effect makes DPCM perform suboptimally relative to the rate-

8

distortion bound even with perfect knowledge of the motion. We also consider the coding

problem associated with light fields.

In Chapter 5 we make concluding remarks, and we outline our ongoing work and

potential developments to be made in the future.

9

CHAPTER 2

FILTER BANKS WITH DIRECTIONALVANISHING MOMENTS

The contourlet transform was proposed to address the limited directional resolution

of the separable wavelet transform. One way to guarantee good approximation behavior

is to let the directional filters in the contourlet filter bank have sharp frequency response.

This requires filters with large support size. We seek to isolate the key filter property

that ensures good approximation. In this direction, we propose filters with directional

vanishing moments (DVM). These filters, we show, annihilate information along a given

direction. We study two-channel filter banks with DVM filters. We provide conditions

under which the design of DVM filter banks is possible. A complete characterization

of the product filter is thus obtained. We propose a design framework that avoids two-

dimensional factorization using the mapping technique. The filters designed, when used

in the contourlet transform, exhibit nonlinear approximation comparable to the conven-

tional filters while being shorter and therefore provide better visual quality with fewer

ringing artifacts. Furthermore, experiments show that the proposed filters outperform

the conventional ones in image approximation and denoising.

2.1 Introduction

The separable discrete wavelet transform has established itself as a state-of-the art

tool in several image processing applications, including compression, denoising, and fea-

The results of this chapter are presented in references [33–35].

10

ture extraction. A key property, which partially justifies the efficiency of wavelets in

applications, is that it provides a sparse representation for several classes of images.

Such sparsity can be precisely measured in some cases as the decay of the coefficients

magnitude. In spite of its high applicability, it is known that separable wavelets fail to ex-

plore the geometric regularity existent in most natural scenes, thus offering a suboptimal

sparse representation. In this context, it is believed that the next generation transform

coding compression algorithms will use a transform that better handles orientation and

geometric information. In this direction, a number of researchers have proposed image

representation schemes that achieve optimal sparsity behavior for some reasonable image

model. Such is the case of the curvelet tight frames proposed by Candes and Donoho [36].

Inspired by curvelets, Do and Vetterli proposed the contourlet transform [20], which is

a multiscale directional representation constructed in the discrete grid by combining the

Laplacian pyramid [25, 37] and the directional filter bank (DFB) [23]. The distinctive

feature of both curvelets and contourlets is that they are nonadaptive schemes.

Geometric regularity in images is exhibited through the fact that image edges are typ-

ically located along smooth contours. Thus, image singularities (i.e., edges) are localized

in both location and direction. In order to extend 1-D wavelets crucial property for good

approximation, namely vanishing moments [1], new 2-D representations like contourlets

require a new condition named directional vanishing moment [20]. For contourlets, such

property can be imposed by carefully designing the refinement filters. Ideally, if the fil-

ters in the contourlet construction (see [20, 38] for details) are sinc-type filters with ideal

response, then the contourlet atoms are guaranteed to have DVMs in an infinite number

of directions. In practice, however, ideal filters are approximated with a finite number

of coefficients, and to ensure having DVMs, this number has to be large, thus increasing

complexity. Alternatively, if FIR filters with enough DVMs can be obtained, one could

achieve similar performance with potentially shorter filters, which would result in a fast

and efficient decomposition algorithm. In addition, as we learned from the wavelet expe-

rience, short filters (e.g., the filters chosen for the JPEG2000 standard) are very desirable

for images as they are less affected by the Gibbs phenomena artefact.

11

In this chapter we study two-channel critically sampled filter banks with DVMs. The

DVM property leads to a new filter bank design problem and, to the best of our knowl-

edge, this is the first work that addresses this problem. Two-channel filter banks are

attractive since they are simpler to design and can be used in a tree structure to generate

more complicated systems such as the DFB. Our goal is to impose directional vanishing

moments in the contourlet basis function without resorting to long filters. That is, we

attempt to cancel directional information using DVMs instead of good frequency selec-

tivity, thus working with shorter filters and avoiding the Gibbs phenomena (see Figure

2.15). Although our initial motivation for the DVM filter bank design is the contourlet

transform, we point out that our methods are general and can be applied in more general

contexts. Potential applications of the filters designed in this work are in the contourlet

transform of [20], the CRISP-contourlet system [39], and directionlets [40]. A preliminary

version of the present work has appeared in [33].

The chapter is structured as follows. In Section 2.2 we study filters with DVM and the

class of signals that would be annihilated by such filter. In Section 2.3 we study the DVM

in the context of 2-D FIR two-channel filter banks. We provide existence conditions as

well as the design constraints. We also provide a complete characterization of the product

filter of those filter banks. To overcome 2-D factorization, in Section 2.4 we propose a

design procedure using the mapping technique. The design is simple to carry out and

uses the solution introduced in Section 2.3. In Section 2.5 we study the use of filter banks

with DVM in the contourlet construction. Experiments illustrating the approximation

properties of the proposed filters are presented in Section 2.6 and conclusions drawn in

Section 2.7.

Notation: Throughout the chapter we use boldface and capital boldface characters

to represent two-dimensional (2-D) vectors and 2 × 2 matrices, respectively. Thus, a

discrete 2-D signal is denoted by x[n] where n = (n1, n2)T . The 2-D z-transform of a

signal x[n] is denoted by X(z), where it is understood that z is shorthand for (z1, z2)T .

If u = (u1, u2)T is a vector in Z2, then we denote zu = zu11 zu2

2 , whereas zS = (zs1 , zs2)T

with the integer vectors s1 and s2 being the columns of the matrix S. Note that with

12

this notation, if S is a matrix of integers and u is an integer vector, then (zS)u = z(Su).

In the unit sphere we write X(ω) for X(ejω), where ωT = (ω1,ω2)T . We we use both

notations X(z), and X(ω), according to which one is more convenient. A 1-D signal and

its z-transform are denoted by x[n] and X(z), respectively.

2.2 Directional Annihilating Filters

Much of the efficiency of wavelets in analyzing transient signals is due to the vanish-

ing moments in the wavelet function and its practical consequences [1]. Together with

the time localization property, wavelets with vanishing moments provide a sparse repre-

sentation for piecewise polynomial signals. Most successful wavelet filters, such as the

orthogonal Daubechies family and the JPEG2000 filters, were designed with vanishing

moment as a primary design criterion. This is in contrast to early filter bank construc-

tions in which the frequency selectivity was a primary goal. Vanishing moments in a

wavelet transform can be characterized by zeros in the highpass filters of the underlying

filter bank. Suppose H1(z) is the highpass analysis filter of a two-channel filter bank. A

vanishing moment of order d is characterized by d zeros at z = 1 or ω = 0 on the unit cir-

cle. That is, the filter is factored as H1(z) = (1− z)dR1(z). The filter H1(z) is related to

discrete polynomial signals of degree less than d — signals of the form x[n] =∑i

j=0 αjnj ,

with αj real and 0 ≤ i < d. In particular, filtering x[n] with H1(z) produces a zero out-

put (see for example [9, 41]). In other words, the filter H1(z) totally annihilates discrete

polynomials of degree less than d.

For 2-D filter banks with two channels, the vanishing moment concept can be general-

ized as point zeros at z = (1, 1)T or ω = (0, 0)T [42, 43]. However, filters with point-zeros

on the 2-D frequency plane do not cancel piecewise smooth images with discontinuities.

A somewhat different philosophy motivated by contourlets is the directional vanishing

moment in which the zeros are required to be along a line. Formally, we define DVM as

follows.

13

Definition 1 Let C(z) be a discrete filter and u = (u1, u2)T be a 2-D vector of coprime

integers. We say C(z) has a DVM of order d along the direction u if it can be factored

as

C(z) = (1 − zu11 zu2

2 )d R(z), or

C(ω) =(1 − ej(ω1u1+ω2u2)

)dR(ω). (2.1)

For contourlets, the filter C(z) is a composite one, which involves the Laplacian pyramid

filters and the polyphase components of the directional filters [20].

A question of interest is: What signal would be annihilated (i.e., completely filtered

out) by the filter in (3.20)? Such a signal is an eigen-signal of the complementary branch

of a two-channel filter bank where C(z) is an analysis filter. This filter bank will be

studied in detail in the next section. Similar to the 1-D case, 2-D filter banks with

filters of the form in (3.20) have interesting properties with respect to approximation of

smooth signals. In order to see those properties, we introduce the directional polyphase

representation.

Lemma 1 Suppose that u ∈ Z2 and u2 *= 0. Then for every n ∈ Z2 there exists a unique

pair (k, r) where k ∈ Z, r ∈ R := Z × {0, 1, . . . , |u2|− 1} such that

n = ku + r. (2.2)

Proof. Notice that (2.2) is equivalent to having n1 = ku1 + r1 and n2 = ku2 + r2. For

the second equation, k and r2 are uniquely determined as the quotient and remainder of

n2 divided by u2. Given k, from the first equation, r1 is also uniquely determined. !

Throughout the chapter we assume u2 *= 0. The case u2 = 0 can be similarly handled

by swapping the two variables u1,u2. Lemma 1 allows us to partition any 2-D signal x[n]

into a set of disjoint 1-D signals {xr[k] : r ∈ R} with

xr[k] := x[ku + r]. (2.3)

14

n1

n2

xr1 [k]

xr2 [k]

xr3 [k]

xr4 [k]

x[n]

Figure 2.1 The directional polyphase representation. Here u = (2, 1)T and r1 = (0, 0)T ,r2 = (1, 1)T , r3 = (0, 1)T , and r4 = (1, 0)T . The directional polyphase decompositionsplits the signal into 1-D subsignals sampled along the direction u. Those signals tile thewhole 2-D discrete plane. We highlight in the picture some of the subsignals.

Figure 2.1 illustrates the directional polyphase representation. Note that each signal

xr[k] is a 1-D slice of x[n] along the direction u. Therefore, the directional polyphase

representation is distinct from the ordinary polyphase representation. Using Lemma 1,

we can characterize the signals that are annihilated by the filter C(z).

Proposition 1 Let C(z1, z2) be a 2-D filter with a factor (1−zu11 zu2

2 )d. Then a signal x[n]

is annihilated by C(z) if each 1-D signal xr[k] defined in (2.3) is a discrete polynomial

of degree less than d.

Proof. Using Lemma 1 we have that

X(z) =∑

n∈Z2

x[n]z−n

=∑

k∈Z

∑

r∈R

x[ku + r]z−(ku+r)

=∑

r∈R

z−r∑

k∈Z

xr[k]z−ku

=∑

r∈R

z−rXr(zu).

Since the signals xr[k] are polynomials of degree less than d, as in the 1-D case, each

term Xr(zu) is annihilated by a factor (1−zu)d. Thus it follows that X(z) is annihilated

by C(z1, z2). !

15

An immediate consequence of Proposition 1 is that discrete signals sampled from

a continuous-time signal which is smooth away from line discontinuities along a given

direction, are also annihilated by C(z1, z2). In other words, if xc(t) is a continuous-time

piecewise polynomial signal of degree less than d and x[n] = xc(∆Tn), then it follows

that x[n] is annihilated by a filter C(z) with a factor (1 − zu)d.

As an illustration, we filter a piecewise smooth image with a 2-D filter with a third-

order DVM along direction u = (1, 2)T . The image is described by e−x2

−y2

α +1{β1<y−2x<β2}.

Such an image is well approximated by a piecewise polynomial image of sufficiently large

degree d. As can be seen in Figure 2.2, the edge was totally annihilated by the filtering

operation. Notice that the DVM formulation is in the space domain, and the annihilation

of directional edges takes place regardless of the frequency response of the filters. This

is similar to the 1-D wavelet case in which “zeros at π” alone ensures the cancellation of

smooth signals.

2.3 Two-channel Filter Banks with Directional Van-ishing Moments

2.3.1 Preliminaries

Our setup consists of a general two-dimensional critically sampled two-channel filter

bank with a valid sampling S that has downsampling ratio 2, i.e., |det S| = 2. Figure 2.4

(a) illustrates such a filter bank. In this setting, given a set of analysis/synthesis filters

the reconstructed signal is a perfect replica of the original provided that [7]

H0(ω)G0(ω) + H1(ω)G1(ω) = 2, (2.4)

H0(ω + 2πS−Tk1)G0(ω) + H1(ω + 2πS−Tk1)G1(ω) = 0, (2.5)

where k1 is the nonzero integer vector in the set N (S) := {ST x, x ∈ [0, 1) × [0, 1)} [7].

The modulation term 2πS−Tk1 is a function of the sampling lattice generated by S [44].

Because | detS| = 2, the vector 2S−Tk1 has integer entries. Thus, the modulation term

2πS−Tk1 has the form 2πS−Tk1 = (m1π, m2π)T , where (m1, m2)T = 2S−Tk1. Moreover,

16

(a) Original (b) Filtered

Figure 2.2 Illustration of line zero moments as an edge annihilator. The piecewisepolynomial image in (a) was filtered with a 2-D filter C(z1, z2) = (1 − z1z2

2)3. The

output image (b) pixels are approximately zero.

since Hi(ω) and Gi(ω) are 2π-periodic functions, the system of equations (2.4-2.5) exists

in only three distinct cases that correspond to when both m1 and m2 are odd, and when

one is odd and the other is even. These cases in turn correspond to three distinct lattices

for sampling by a factor of two in 2-D that are generated, for instance, by 1

S0 =

1 1

1 −1

,S1 =

2 0

0 1

, and S2 =

1 0

0 2

.

The sampling lattice generated by S0 is called the quincunx lattice [45], whereas the

other two are called rectangular lattices. We will consider only the first two cases since

the third can be obtained from the second by swapping the two dimensions. Therefore,

the two cases corresponding to S0 and S1 encompass all possible cases in 2-D. It can

be checked that k1 = (1, 0)T for both S0 and S1 so that 2πS−Tk1 = (−π,−π)T and

2πS−Tk1 = (−π, 2π)T for S0 and S1, respectively.

Throughout the chapter we assume FIR filters. In such cases, using an argument

similar to the one in [6], we can show the synthesis filters are completely determined (up

1All matrix generators of a given lattice are equivalent, up to right multiplication by a unimodularinteger matrix [7]. A square matrix is unimodular if its determinant is equal to one.

17

to a scale factor and a delay) from the pair (H0(ω), G0(ω)) [6] through the relation

H1(ω) = ejωT k1G0(ω + 2πS−Tk1),

G1(ω) = e−jωT k1H0(ω + 2πS−Tk1). (2.6)

As a result, the reconstruction condition reduces to

H0(ω)G0(ω) + H0(ω + 2πS−Tk1)G0(ω + 2πS−Tk1) = 2. (2.7)

The above biorthogonal relation specializes to the orthogonal one when G0(ω) = H0(−ω).

Moreover, we say that G0(ω) is the complementary filter to H0(ω) whenever they satisfy

(2.7).

2.3.2 Two-channel filter banks with DVMs

In general, given the desired direction of the zero moment, the product filter H0(ω)G0(ω)

takes the form

H0(ω)G0(ω) = (1 − ejωT u)LR(ω),

where L denotes the order of the DVM. Substituting this in (2.7) we obtain the design

equation(1 − ejωT u

)LR(ω) +

(1 − ej(ω+2πS−T k1)T u

)LR(ω + 2πS−Tk1) = 2, (2.8)

where R(ω) is the complementary filter to(1 − ejωT u

)L. We always assume that u1 and

u2 are coprime integers. Note that the above relation sets a system of linear equations

which can be solved under certain conditions. Since∣∣det(S−T)

∣∣S=S0 or S1

= 1/2, it follows

that uT 2S−Tk1 is an integer scalar. If uT 2S−Tk1 is even, then a factor (1−ejωT u) exists

in the two left terms of (2.8) in which case an FIR solution is not possible. Consequently,

we see that uT 2S−Tk1 being odd is a necessary condition for solving (2.8). In that case,

(2.8) reduces to(1 − ejωT u

)L

R(ω) +(1 + ejωT u

)L

R(ω + 2πS−Tk1) = 2. (2.9)

Although in principle it is possible to solve (2.8) directly, the following Proposition

simplifies the problem.

18

Proposition 2 Consider the perfect reconstruction equation (2.8) where u has coprime

entries and uT 2S−Tk1 is odd. Then there exists a unimodular integer matrix U such that

if R(ω) solves (2.8) then R(ω) = R(UTω) solves

(1 − ejω1

)LR(ω) +

(1 + ejω1

)LR(ω + 2πS−Tk1) = 2, (2.10)

where S = US. Conversely, if R(ω) is a solution to (2.10) with S given and u as

above, then there exists a matrix U such that R(ω) = R(UTω) is a solution to (2.8) with

S = US.

Proof. We need to construct U so that ej(UT ω)T u = ej(Uu)T ω = ejω1. We then set

U :=

a b

−u2 u1

(2.11)

and choose a, b ∈ Z so that au1 + bu2 = 1. Because u1 and u2 are assumed to be coprime,

such a and b are guaranteed to exist. Since uT 2S−Tk1 is odd, substituting ω #→ UTω

in (2.9) gives (2.10). Conversely, if R(ω) solves (2.10), then set U = U−1 with U as in

(2.11) and substitute ω #→ UTω in (2.10) to get (2.8). !

Remark 1 The fact that U has integer entries and is unimodular implies that it is a

resampling matrix. Hence, the change of variables ω #→ UTω, or equivalently z #→ zU

amounts to a resampling operation of the filter R(z) which can be seen as a rearrange-

ment of the filter coefficients in the 2-D discrete plane. This has the signal processing

interpretation illustrated in Figure 2.3 (in the z-domain). Thus, we see that a filter with

a DVM along a direction other than the horizontal one can be implemented in terms of

a filter with a horizontal DVM plus pre/post resampling operations.

Remark 2 This change of variables can also be done in the filters of a filter bank. We

thus obtain the equivalence shown in Figure 2.4 for a unimodular matrix U and S = US.

The equivalence can be easily checked using multirate identities. The filter bank within

the dotted region in Figure 2.4 (b) is a perfect reconstruction if and only if the filter bank

19

in Figure 2.4 (a) is. Since the equivalence is one-to-one, one can design filter banks with

horizontal DVMs and then, following Proposition 2, obtain filter banks with DVMs in

any direction u such that uT 2S−Tk1 is odd.

Remark 3 Notice that a vertical DVM, i.e., a factor of the form (1 − ejω2)L

could be

obtained similarly, by just exchanging the rows of the matrix U constructed in the proof

above.

x y

(a) H(z) = (1 − zu)LR(z)

↓ U↑ Ux y

(b) H(z) = H(zU) = (1 − z1)L

R(z)

Figure 2.3 Change of variable is equivalent to a pre/post resampling operation plusfiltering with modified filter. (a) Filter with DVM along u. (b) Equivalent filteringstructure with horizontal DVM.

Proposition 2 gives a simpler equation in the sense that a complete characterization

of its solution is possible. Furthermore, with the aid of Proposition 2 we can establish a

sufficient condition for solving (2.8) as the next proposition shows.

Proposition 3 Let uT be an integer vector of coprime integers. Then (2.8) admits an

FIR solution if and only if uT 2S−Tk1 is an odd integer.

Proof. We already discussed necessity. To establish sufficiency, suppose that uT 2S−Tk1

is an odd integer. Using Proposition 2, we can reduce the problem to that of (2.10).

Thus if (2.10) is solvable then we are done. Now consider the modulation shift 2πS−Tk1.

If the first entry of 2πS−Tk1 is an odd multiple of π, then at least a univariate solution

R(ω) = R(ω1) is guaranteed to exist [46]. But because S = US, one can easily check

that the first entry of 2S−Tk1 is 2uTS−T k1, which is odd by assumption. !

If uT 2S−Tk1 is an odd integer we then say that the direction u is admissible for the

sampling matrix S. Proposition 3 asserts that not all DVMs can be obtained for a given

20

↓ S

↓ S ↑ S

↑ SH0(z) G0(z)

H1(z) G1(z)

x x

y0

y1

(a)

↓ U↑ Ux x

y0

y1

H0(zU) G0(zU)

H1(zU) G1(zU)↓ S

↓ S ↑ S

↑ S

(b)

Figure 2.4 Filter banks with DVMs along a fixed arbitrary direction are equivalent to afilter bank with DVMs along the horizontal direction. (a) Filter bank in which the filtershave DVMs along the direction u. (b) The equivalent filter bank with DVMs along thehorizontal direction. Note that U is constructed according to Proposition 2 and S = US.

downsampling matrix S. In particular, for the quincunx lattice generated by S0, we

have that uT 2S−Tk1 = (u1 + u2) so that u is admissible if u1 + u2 is an odd integer,

and similarly for the rectangular lattice generated by S1, uT 2S−Tk1 = u1 so that u is

admissible whenever u1 is odd. For instance, u = (2, 1)T is admissible for S0 but not for

S1, whereas u = (1, 1)T is only admissible for S1.

The aforementioned discussion provides necessary and sufficient conditions for having

one of the branches of the filter bank featuring DVMs. In the context of contourlets (see

Section 2.5) it is desirable to have DVMs in both channels so that the DFB expansion tree

is balanced in the sense that DVMs are present in all frequency channels. Unfortunately,

this is not possible to attain with FIR filters. Likewise, it is neither possible to have

21

different DVMs in the same filter channel. We summarize these assertions in the next

proposition.

Proposition 4 Consider a two-channel 2-D filter bank with FIR filters H0(ω), H1(ω),

G0(ω), and G1(ω), and downsampling matrix S. Let u = (u1, u2)T and v = (v1, v2)T

be two distinct admissible directions. Then the filter bank cannot have the perfect recon-

struction property if one of the following is true:

1. The filter H0(ω) has a factor of (1 − ejωT u) and H1(ω) has factor of (1 − ejωT v)

leaving FIR remainders.

2. One of the filters, say H0(ω), has factors (1−ejωT u) and (1−ejωT v) simultaneously

leaving an FIR remainder.

Proof. 1. If the factor (1 − ejωT u) is in H0(ω), and (1 − ejωT v) in H1(ω), then the

reconstruction condition (2.4) is not satisfied when ejω = (1, 1)T .

2. Suppose H0(ω) has a factor (1 − ejωT u)(1 − ejωT v). Because v is admissible, we

have from Proposition 3 that vT2S−Tk1 is odd. Consequently we have that

(ej(ω+2πS−T k1)

)v

= −ejωT v.

It then follows from (2.6) that G1(ω) has a factor (1 + ejωT v). Consider the system of

equations {1 − ejωT u = 0, 1 + ejωT v = 0}, which is equivalent to

u1 u2

v1 v2

ω1

ω2

=

π

2π

.

Because u and v are distinct we have that u2v1 *= u1v2 which guarantees a solution. It

then follows that H0(ω) and G1(ω) have a common zero, thus violating (2.4). !

The previous proposition shows we can only afford to have DVM in one branch of

the filter bank. In the next section, we present methods for solving the DVM filter bank

design problem in the form of (2.10).

22

2.3.3 Characterization of the product filter

With the aid of Proposition 2 (see also Figure 2.4), in order to design filter banks with

DVMs we need to consider (2.10) with two possible forms for R(ω + 2πS−Tk1), namely

R(ω1+π,ω2) or R(ω1+π,ω2+π), corresponding to the rectangular and quincunx lattices,

respectively. In the z-domain, we equivalently have R(−z1, z2) and R(−z1,−z2). We

denote those two cases collectively as R(−z1, sz2) where s ∈ {1,−1}. It turns out that

a complete characterization of the solution of (2.10) is possible as the next proposition

shows.

Proposition 5 Let s ∈ {1,−1}. An FIR filter R(z1, z2) is the solution to the equation

(1 − z1)LR(z1, z2) + (1 + z1)

LR(−z1, sz2) = 2 (2.12)

if and only if it has the form

R(z1, z2) = RL(z1) + (1 + z1)LRo(z1, z2) (2.13)

with RL(z1) being a univariate solution given explicitly by

RL(z1) =L−1∑

i=0

(L + i − 1

L − 1

)2−(L+i−1)(1 + z1)

i (2.14)

and Ro(z) satisfying

Ro(z1, z2) + Ro(−z1, sz2) = 0. (2.15)

Proof. First, the 1-D complementary filter to (1− z1)L is guaranteed to exist, a conse-

quence of the Bezout theorem for polynomials [46]. Moreover, RL(z1) as in (2.14) is the

1-D minimum degree polynomial that solves (2.12), which can be found by Taylor series

expansion [2]. Furthermore, if Ro(z1, z2) satisfies (2.15), it can be readily checked that

R(z) given in (2.13) solves (2.12).

To prove sufficiency, suppose R(z1, z2) solves (2.12). Let R′(z1, z2) := R(z1, z2) −

RL(z1). Since RL(z1) and R(z1, z2) are both solutions to (2.12) we must have

R′(−z1, sz2)(1 + z1)L = −R′(z1, z2)(1 − z1)

L, (2.16)

23

which implies that R′(z1, z2) = (1+z1)LRo(z1, z2). Now, let z1 *= ±1. Then (2.16) implies

(2.15) and since Ro(z) is an FIR filter, it follows that (2.15) is valid for all z ∈ C2. !

Remark 4 The above proposition is akin to its 1-D counterpart which is used to con-

struct compactly supported wavelets (see e.g., [2]). The distinction occurs in the higher

order term Ro(z1, z2) which now can be any two-dimensional function satisfying (2.15).

This higher order term will make the filter a “truly” 2-D one, meaning a filter with a

nonseparable support. Moreover, the higher order term can be used to control the shape

of the 2-D frequency response.

Remark 5 If L is even, then it is easy to check that

R(z1, z2) = (−2z1)−L/2RL/2

(z1 + z−1

1

2

)

+ (1 + z−11 )LRo(z1, z2) (2.17)

with Ro(z1, z2) satisfying (2.15), also solves (2.12). If in addition Ro(z1, z2) = Ro(z−11 , z−1

2 ),

this solution provides a class of linear-phase biorthogonal filters with DVM in which each

one of the filters in the analysis and synthesis is a degenerate 1-D solution. Thus, this

solution can be seen as a 2-D generalization of the 1-D biorthogonal spline wavelet filters

of [2].

The result also extends to the orthogonal case as the next corollary shows.

Corollary 1 Let s be as in Proposition 5. Consider the orthogonal perfect reconstruction

condition

H0(z1, z2)H0(z−11 , z−1

2 ) + H0(−z1, sz2)H0(−z−11 , sz−1

2 ) = 2,

with H0(z1, z2) = (1 − z1)Lr0(z1, z2). Also set R0(z1, z2) = r0(z1, z2)r0(z−11 , z−1

2 ). Then

R0(z1, z2) has the form

R0(z1, z2) = RL

(z1 + z−1

1

2

)+

(1 +

z1 + z−11

2

)L

R0(z1, z1),

where RL is as in (2.14), and R0(z1, z2) = R0(z−11 , z−1

2 ) satisfies (2.15) .

24

The proof of this corollary is a direct application of Proposition 5 by making the change of

variables z′1 = z1+z−11

2 , and z′2 = z2+z−12

2 . Notice that orthogonal FBs with DVMs could be

obtained using the above result by taking the square root of the filter R0(ω) = |r0(ω)|2.

This requires 2-D spectral factorization, which is a hard task. Furthermore, such a

square root is not guaranteed to exist and one has to carefully select the higher order

term R0(ω) so as to make R0(ω) factorizable. For biorthogonal solutions, one can avoid

spectral factorization using the mapping approach as discussed in the next section.

2.4 Design via Mapping

2.4.1 Design procedure

Due to the lack of a factorization theorem for 2-D polynomials, the design of nonsep-

arable 2-D filter banks is substantially harder than the 1-D counterpart. In particular,

we cannot easily factorize the solution for the product filter given by Proposition 5 into

H0(z) and G0(z) as in (2.7). There are two known ways to avoid factorization: (1)

Constructing the polyphase matrix in a lattice structure and (2) Mapping 1-D filters

to 2-D by appropriate change of variables. Most filters designed in the literature use

one of these two approaches (see, e.g., [32, 47–50].) The first method has the attractive

feature of possible construction of both orthogonal and biorthogonal solutions. How-

ever, it is harder to impose vanishing moments, since the corresponding conditions in the

polyphase domain are nonlinear (see, e.g., [51].) For processing images, orthogonal FBs

have the shortcoming of lack of phase linearity which causes severe visual distortions.

For biorthogonal FIR solutions, one can use the general mapping approach proposed in

[49]. In this approach, we first design 1-D prototype filters H(1D)0 (z), and G(1D)

0 (z), such

that P (1D)(z) := H(1D)0 (z)G(1D)

0 (z) is a halfband filter, i.e.,

P (1D)(z) + P (1D)(−z) = 2. (2.18)

Next, we apply the change of variables z #→ M(z) to map the 1-D filters to 2-D ones:

H0(z) = H(1D)0 (M(z)), G0(z) = G(1D)

0 (M(z)).

25

It can be easily checked that the mapped 2-D filters will satisfy the perfect recon-

struction condition (2.7) provided

M(z1, z2) = −M(−z1, sz2). (2.19)

Notice that for FIR solutions, it is necessary that the 1-D prototype filters have only

positive powers of z. This automatically precludes FIR orthogonal solutions.

Mapping 1-D filters can also be carried over to the polyphase domain as done in [50].

A more careful examination of the filters proposed in [50] reveals that the polyphase

mapping can also be performed in the filter domain, and as such, the technique boils

down to a particular case of the mapping method [52]. For completeness, we include a

derivation of this equivalence in Section 2.8.

In the context of filter banks with DVMs, the goal is to devise a mapping function

M(z) such that each of the 2-D filters H0(z) and G0(z) has a given number of (1 − z1)

factors. In addition we require M(z) so that perfect reconstruction is kept after mapping.

The next proposition shows an explicit form of the required mapping function.

Proposition 6 Let H(1D)0 (z), G(1D)

0 (z) be such that P (1D)(z) = H(1D)0 (z)G(1D)

0 (z) satisfies

(2.18) and let s ∈ {1,−1}. Suppose M(z) is an FIR mapping function such that

M(z1, z2) = (1 − z1)LR(z1, z2) + c0, (2.20)

where c0 is such that H(1D)0 (z) has a factor (z−c0)Na/L, G(1D)

0 (z) has a factor (z−c0)Ns/L,

and R(z1, z2) satisfies the valid mapping equation:

(1 − z1)L R(z1, z2) + (1 + z1)

L R(−z1, sz2) = 2c0. (2.21)

Then

1. The mapped filters H0(z) = H(1D)0 (M(z)) and G0(z) = G(1D)

0 (M(z)) are perfect

reconstruction, i.e., they satisfy

H0(z1, z2)G0(z1, z2) + H0(−z1, sz2)G0(−z1, sz2) = 2. (2.22)

26

2. The mapped filters are factored as

H0(z) = (1 − z1)NaRH0(z), G0(z) = (1 − z1)

NsRG0(z). (2.23)

Proof. Suppose M(z) is as in (2.20). Then, from (2.21) it follows that M(z1, z2) =

−M(−z1, sz2) which is (2.19), hence the mapped filters satisfy (2.22). Moreover, substi-

tuting z #→ M(z) in the factors (z−c0)Na/L of H(1D)0 (z) and (z−c0)Ns/L of G(1D)

0 (z) gives

H0(z) and G0(z) as in (2.23). !

Interestingly, it turns out the valid mapping equation (2.21) is similar to equation

(2.12) for the product filter in Proposition 5. We then can use Proposition 5 to find an

explicit solution to (2.21). In short, we see that the mapping overcomes the need for

spectral factorization and, together with Proposition 5, gives a straightforward design

methodology. Hence, we can formulate the design of DVM filter banks via mapping as

follows.

Problem: Design 2-D filters H(1D)0 (z) and G(1D)

0 (z) satisfying the perfect reconstruction

condition (2.7) and such that H(1D)0 (z) has a factor (1 − z1)Na and G(1D)

0 (z) has factor

(1 − z1)Ns .

Step 1 Design 1-D filters H(1D)0 (z) and G(1D)

0 (z) with Na/L and Ns/L zeros at some

point c0 ∈ C, respectively, and such that P (1D)(z) = H(1D)0 (z)G(1D)

0 (z) satisfies

(2.18).

Step 2 Let M(z) = (1 − z1)LR(z) + c0 with

R(z) = RL(z1) + (1 + z1)LRo(z1, z2), and

Ro(z1, z2) = −Ro(−z1, sz2).

Step 3 Set H0(z) = H(1D)0 (M(z)) and G0(z) = G(1D)

0 (M(z)) to obtain the desired 2-D

filters.

27

Notice that one can choose M(z) so that M(z) = M(z−1) and, as a result, the 2-D

filters are zero-phase [49]. In this case, L is necessarily even and R(z) can have the more

convenient form in (2.17) instead of the one in (2.13). In the design examples that follow

we use c0 = 1. It is easy to check that this ensures the gain of H1(z) at z = (1, z2)T is√

2 whenever H(1D)0 (−1) =

√2.

2.4.2 Filter size analysis

One shortcoming of the mapping design procedure is that the size of the support of the

filters tends to be increasingly large. However, if extra care is taken when designing the

mapping function as well as the 1-D prototypes, the filters can have reasonable support

size. The support of the filters can be easily quantified as we show next. We use the

notation deg [·] to denote the support size of the filter. For a 2-D filter the support size will

be a pair of integers that represent the sides of the smallest discrete square that contains

all the filter coefficients, including the boundary. Thus, since H0(z) = H(1D)0 (M(z)),

G0(z) = G(1D)0 (M(z)), we have that deg [H0(z)] = deg [H(1D)(z)]deg [M(z)] and similarly

deg [G0(z)] = deg [G(1D)(z)]deg [M(z)]. Notice that deg [M(z)] = deg [R(z)] + (L, 0)T .

Moreover, from Proposition 5 we have that R(z1, z2) = RL(z1) + (1 + z1)LRo(z). Since

RL(z1) is the minimum degree complementary filter to (1 − z1)L, it has support size L.

Therefore, if we further assume that Ro(z) is supported around the origin, it follows

that deg [R(z1, z2)] is dominated by deg [(1 + z1)LRo(z)]. Thus, denoting deg [Ro(z)] =

(µ1, µ2)T , we have that

deg [R(z1, z2)] ≤ deg [(1 + z1)LRo(z)] =

L + µ1

µ2

,

and consequently,

deg [H0(z1, z2)] ≤ deg [H(1D)(z)]

µ1 + 2L

µ2

. (2.24)

28

Similarly, for the synthesis filter

deg [G0(z1, z2)] ≤ deg [G(1D)(z)]

µ1 + 2L

µ2

. (2.25)

From the foregoing discussion we see that when µ1 + µ2, by increasing the number

of DVMs in the mapping function, i.e., increasing L, the filter support will be stretched

along the n1 direction. Furthermore, for a fixed mapping function, the support of the

resulting 2-D filter will increase linearly in both n1, n2 directions with the number of

vanishing moments in the prototype filters H(1D)(z). Thus, to avoid the filters being

too large, we point out that the 1-D prototype filters should be as short as possible,

preferably with only one zero at c0. We present design examples next.

2.4.3 Design examples

Example 1 Nonseparable filter family that includes the 1-D 9-7 filters

For the purpose of this example we assume the quincunx lattice and we generate DVMs

along the horizontal direction u1 = 0. Following the discussion in the previous section we

choose the prototype filters H(1D)0 (z) and G(1D)

0 (z) to have zeroes at z = −1. We consider

the minimum degree complementary filter to (1− z)4 which from (2.14) gives the product

filter

P (1D)(z) =1

16(16 − 29z + 20z2 − 5z3)(1 − z)4.

We let each prototype filter have a factor (1 − z)2 and then we split the factor (16 −

29z + 20z2 − 5z3) between the two prototypes assigning the real root to H(1D)0 (z) and the

two complex-conjugate roots to G(1D)0 (z).

In the mapping function we impose the condition M(z1, z2) = M(z−11 , z−1

2 ) so that the

filters are zero-phase. Following (2.20), in order to generate a second-order horizontal

DVM, we set

M(z1, z2) = (1 − z1)2 R(z1, z2) − 1.

29

To guarantee that the map satisfies the valid mapping condition (2.21) and is zero-

phase we use (2.17) to obtain

R(z1, z2) = −z−11

2+(1 + z−1

1

)2Ro(z1, z2).

Notice that Ro(z1, z2) can be any zero-phase filter that satisfies (2.15). For simplicity

we choose Ro(z1, z2) = α(z2 + z−12 ). With α = 0 we recover the 9-7 filters. Figure 2.5

displays the frequency response of the filters when we set α = −4√

2.

(a) |H0(ejω)| (b) |G0(ejω)|

(c) |H1(ejω)| (d) |G1(ejω)|

Figure 2.5 Frequency response of analysis and synthesis filters designed with fourth-order directional vanishing moment. The filters degenerate to the 9-7 wavelet filters.

Example 2 Using the higher order term to improve frequency response

In this example we use the extra degrees of freedom in our proposed design to obtain

filters with low order DVMs and better frequency selectivity. The following filters can be

checked to satisfy (2.18):

H(1D)0 (z) = K

(1 + k2z + k2k3z

2),

G(1D)0 (z) = K−1

(1 − k1z − k3z + k1k2z

2 − k1k2k3z3).

30

To obtain the prototype we choose the constants K, k1, k2, and k3 such that each filter

has a zero at z = −1 and in addition that H(1D)0 (−1) = G(1D)

0 (−1) =√

2. The following

prototypes are obtained:

H(1D)0 (z) =

1

2(1 − z)

(2 + (2 −

√2)z

),

G(1D)0 (z) =

1

2(1 − z)

(2 + (6 − 4

√2)z + (4 − 3

√2)z2

).

We then use the same mapping function of Example 1, but now we let

Ro(z1, z2) = r0

(z1 +

1

z1

)+ r1

(z2 +

1

z2

)

+ r2

(z1 +

1

z1

)(z2 +

1

z2

)2

(2.26)

and optimize the coefficients r0, r1, r2 so that the filters approximate the ideal fan response.

The resulting filters H0(z) and G0(z) have sizes 13 × 9 and 19× 13, respectively. Figure

2.6 displays the frequency response of all the filters.

(a) |H0(ejω)| (b) |G0(ejω)|

(c) |H1(ejω)| (d) |G1(ejω)|

Figure 2.6 Frequency response of analysis and synthesis filters designed with second-order directional vanishing moment.

31

2.5 Tree-Structured Filter Banks with DirectionalVanishing Moments

In order to study the approximation properties of DVM filters we replace the con-

ventional DFB in the contourlet transform with a DFB constructed with fan filter banks

that have DVMs. The DFB is constructed with fan filter banks and pre/post resampling

operations in a tree structure [23]. Referring to the fan-shaped fundamental frequency

support, it is natural to impose vanishing moments along the vertical and horizontal

directions. Notice that there are different ways to impose DVMs in the contourlet trans-

form. For instance, one could consider the Laplacian pyramid in conjunction with the

directional filters to impose DVMs. This is equivalent to considering a 2-D oversampled

filter bank with its highpass channels followed by the DFB. Such a design, however, is

outside the scope of this work.

↓ S0

↓ S0

x

y0

y1

H0(z)

H1(z)

(a) Type 0

↓ S0

↓ S0

x

y0

y1

H0(z)

H1(z)

(b) Type 1

Figure 2.7 Two types of prototype fan filter banks used in the DFB expansion tree.Each filter bank has one of its branches featuring a DVM.

As we already discussed in Proposition 4, for a 2-D, two-channel filter bank, we can

only have DVMs in one of its branches. Thus, for the prototype fan filter bank used

in the DFB, we have the two possible configurations (denoted type 0 and type 1) with

similar frequency decomposition illustrated in Figure 2.7.

In light of that, in each node of the DFB tree structure we use a sheared/rotated

filter bank, according to the DFB expansion rule, obtained from the prototype fan filter

bank of either type 0 or type 1. This naturally opens the question of how to arrange the

filter bank types in the tree structure efficiently. Notice that, for a DFB with l stages,

32

x

y0

y1

y2

y3

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0H0(z)

H1(z)

H0(z)

H1(z)

H0(z)

H1(z)

(a) DFB with DVM filters

x

y3

y2

y1

y0

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0

↓ S0

H0(z)

H1(z)

H0(zS)

H0(zS)

H1(zS)

H1(zS)

(b) Equivalent DFB

Figure 2.8 The DVM directional filter bank. (a) The four-channel DFB with type 0(horizontal) and type 1 (vertical) DVM filter banks. (b) The four-channel equivalentfilter bank. The equivalent filter bank has DVMs in three different directions.

a total of∏l

i=1 2i tree arrangements are possible, each with a different DVM allocation

among the DFB channels. Figure 2.8 illustrates a possible arrangement for a four-channel

DFB. Notice that with type 0 and type 1 prototype fan filter banks we obtain DVMs in

different directions (in this case the two diagonal directions).

For a single node in the DFB tree, the possible DVMs for either type 0 or type 1

fan filter bank to be appended in that node will depend on the overall downsampling

matrix of that particular node. Furthermore, from Proposition 4, we have that each stage

33

3 3

3 3

3 3

1 1

1 1

1 1

1 1

1 1

2

5 5

5

4

6 6

6 6

7

8

6

6 6

3

1

(a) Tree 1

1 1

1 1

1

1

1

1 1

1

1

1

2

3 4

6

2

3

3

4

5

5 5

5

5

5

5

5 5

5

5

6

(b) Tree 2

1

3 1

1

1

1

1

1 1

1

1

1

1

2

2

3

3

3 3

4

4

4

5

6

66

6

4

7

3 1

3

(c) Tree 3

Figure 2.9 Directional vanishing moments on equivalent filters of a 16-channel DFB.Different arrangements of type 0 and type 1 fan filter banks lead to different numbersof distinct directions. Each distinct direction is numbered. (a) Tree 1 has 8 distinctdirections. (b) Tree 2 has 7 distinct directions. (c) Tree 3 has 6 distinct directions.

0 ≤ q ≤ l in the DFB tree introduces DVMs in 2q−1 channels. Heuristically, we observe

that for representing natural images with directional information spread among several

orientations, it is desirable to have the set of distinct DVMs as large as possible. Thus, in

each DFB stage, the goal is to introduce as many “new” DVMs as possible. For instance,

Figure 2.9 displays three possible arrangements for a 24-channel expansion, each with a

different number of distinct directions. It can be verified that the arrangement in Figure

2.9 (a) (Tree 1) yields the maximum number of distinct directions.

We stress that the DVM formulation is a space-domain one. As a result, when

coupled with the DFB tree-structure the DVM filters alone do not ensure the directional

resolution of a DFB with ordinary fan filter. However, if the DVM design also considers

frequency localization such as in Example 2, then the DFB tree structure may have good

localization in the frequency domain. Figure 2.10 shows the equivalent response using

34

the fan filters with DVMs designed in Example 2. Notice that the equivalent filters have

good frequency localization in addition to the DVMs in all but one direction.

Figure 2.10 Equivalent filters in an 8-channel DFB using two-channel filter banks withDVM. The filters are the ones designed in Example 2. Notice the good frequency local-ization in addition to the imposed DVMs (red line).

2.6 Numerical Experiments

2.6.1 Annihilating directional edges

In order to illustrate the potential of the filter banks with DVMs we construct a toy

example using Haar-type filters and bilevel images. That is, we use filters with one DVM

and two nonzero coefficients. As a test image, we use a bilevel polygon image displayed

in Figure 2.11. An efficient representation of the image shown can be obtained in the

following way. Consider an ordinary separable wavelet decomposition with downsampling

along the rows on the first filtering stage. Each level of the line-column wavelet transform

can be seen as a generate 2-D decomposition, where the filters are 1-D. In this case, the

corresponding downsampling matrices for the first and second levels are

1 0

0 2

, and

2 0

0 1

,

35

respectively. The DVM filters in this two-stage filter bank are as follows. On the first

level we use the filter H0(z) = (1−z−11 z−3

2 )/√

2 so that one of the directions is annihilated,

namely the one along the angle θ ≈ 2π/5. Notice that the z1 exponent is odd so, from

Proposition 3, perfect reconstruction is possible. On the second filtering stage we use the

filters H00(z) = (1 − z−11 z3

2)/√

2 in the low-pass branch and H10(z) = (1 − z−21 z2)/

√2 in

the high-pass branch. Using multirate identities [7] we see that the first splitting yields a

DVM along θ ≈ −5π/16 whereas the second yields θ ≈ −6π/77. The frequency response

of both filters used in the experiment is shown in Figure 2.12.

(a) Original (b) Haar (c) DVM

Figure 2.11 Decomposition of synthetic image using two schemes. (a) Original image.(b) Wavelet decomposition. (c) DVM decomposition.

Similar to the ordinary wavelet decomposition, we iterate the filter bank on the low-

pass channel using the same filter bank on each scale. Figure 2.11 shows the four-

scale decomposition. Also shown is the four-level decomposition using Haar filters. As

the pictures show, the DVM filters produce a more efficient decomposition in the sense

that fewer significant coefficients are present. Furthermore, we observe a reduction of

about 50% in the first-order entropy after uniformly quantizing the coefficients in the

two expansions. Note that the DVMs in the expansion closely match the edges in the

image. For general images, we need DVMs along several directions.

36

-2

0

2w2

-2

0

2

w1

00.51

1.52

2.5

-2

0

2w2

(a)

-2

0

2w2

-2

0

2

w1

0

1

2

3

-2

0

2w2

(b)

Figure 2.12 The DVM Haar filters. In (a) the response of the filter H0(z) = (1 −z−11 z−3

2 )/√

2 is shown. Notice the single DVM that is replicated due to periodicity. In(b) we have the response of the other Haar filter H10(z) = (1 − z−2

1 z2)/√

2 also used inthe experiment.

2.6.2 Nonlinear approximation with the contourlet transform

To illustrate the applicability of the directional vanishing moment filters proposed,

we perform an experiment in which we replace the conventional DFB in the contourlet

transform with a DFB built with DVM fan filter banks following the discussion in Section

2.5 (see Figure 2.8). The filters we use are those designed in Example 1. To study

the nonlinear approximation (NLA) behavior of the filters in our proposed design, we

reconstruct the image using the N coefficients with largest magnitude and compute the

resulting PSNR. It is recognized that the faster the asymptotic decay of the NLA error,

the sparser the decomposition. This sparsity is important for potential applications

including denoising and compression [1]. The directional expansion tree we use in each

scale is one that leads to a maximum number of distinct DVMs and is the same for

all test images, hence the expansion is fixed. The analysis filter is zero-phase with 7 ×

13 coefficients and the synthesis with 9 × 17, also zero-phase (see Example 1). As a

comparison, we use the quincunx/fan filters of [50] (PKVA), where the analysis filter has

23 × 23 taps and the synthesis 45 × 45. We observe that the PKVA filters give the best

PSNR performance in the contourlet transform among existing designs in the literature.

37

Figure 2.13 displays the NLA curve obtained for a piecewise polynomial image. For

this synthetic image, a significant improvement is observed. This improvement is due

to the fact that the synthetic image has directional information in a very small set of

directions, which, due to DVMs, are well represented in the expansion.

(a)

10326

28

30

32

34

36

38

40

42

Number of retained coefficients

PSNR

(dB)

DVMPKVAWAVELET

(b) Synthetic

Figure 2.13 Nonlinear approximation behavior of the contourlet transform with DVMfilters for a toy image. (a) Synthetic piecewise polynomial image. (b) NLA curves (on asemilog scale). This simple toy image is better represented by the contourlet transformwith DVM filters.

Figure 2.14 shows the NLA curves for the standard 512×512 “Peppers” and “Barbara”

images. As the plots show, the DVM filters slightly improve over PKVA for both natural

images. For a highly textured image such as “Barbara” there is significant improvement

over wavelets. In contrast, for a smooth image such as “Peppers,” the redundancy

inherent to the contourlet expansion is more apparent. However, when the number of

coefficients is very low, the results are comparable.

Because the DVM filters are considerably shorter, we observe fewer ringing artifacts

when compared against the PKVA filters, even when both give similar PSNR. Figure

2.15 shows the “Peppers” image reconstructed with 2048 coefficients using both of the

filters. As can be seen, the image reconstructed with the DVM filters exhibits many fewer

ringing artifacts. This result is akin to that in the 1-D wavelet case in which subband

filters without vanishing moments produce similar PSNR results, but have more artifacts

due to long filters.

38

103 10422

24

26

28

30

32

34

36

38


PSNR

(dB)

DVM

PKVA

WAVELET

(a) Peppers

103 10420

22

24

26

28

30

32

34

36

38


PSNR

(dB)

DVM

PKVA

WAVELET

(b) Barbara

Figure 2.14 Nonlinear approximation behavior of the contourlet transform with DVMfilters for natural images. NLA curves (on a semilog scale) for “Peppers” (a) and “Bar-bara” (b) images.

2.6.3 Image denoising with the contourlet transform

To assess the performance of the DVM filters relative to the conventional approach,

we consider threshold estimators on the contourlet transform domain. Such estimators

are very efficient in removing additive Gaussian noise from images [1]. In our experiment

we consider a hard-threshold applied to the highpass directional subbands. The threshold

is chosen as Tj = KσN,j , where σN,j is the estimated noise standard deviation in that

subband, and K is a constant. We choose K = 3 for the coarse scale and K = 4 for the

finest scale. Table 2.1 shows the PSNR of the denoised images. For comparison purposes

we also include the results obtained with the wavelet transform.2

The DVM filters slightly improve the conventional filters which improves over wavelets.

The improvements are more noticeable in the Barabara image.

2Note that our intent is only to compare the performance of the filters. State-of-the art denoisingmethods often would result in better performance than the ones shown in Table 2.1.

39

(a) PKVA (b) DVM

Figure 2.15 “Peppers” image reconstructed with 2048 coefficients. (a) PKVA filters,PSNR = 26.05 dB (b) DVM filters of Example 1, PSNR = 26.76 dB. The image on theright shows less ringing artifacts.

2.7 Conclusion

We have studied two-channel biorthogonal filter banks in which one filter bank chan-

nel annihilates information along a prescribed direction by means of directional vanishing

moments. We investigated in detail the classes of signals that are annihilated by filters

having DVMs. In addition, we studied the DVM filter bank design problem and provided

a complete characterization of the product filter. The characterization splits the com-

plementary filter into two terms, one minimum order 1-D degenerate filter and a higher

order 2-D term. Using the mapping design methodology, we proposed a design procedure

in which the mapping can be calculated explicitly. Our approach is easy to carry out

and yields a large class of linear-phase 2-D filter banks with DVMs of any prescribed

order. We also investigated the potential usage of such filter banks in the context of the

contourlet transform. Nonlinear approximation curves indicate that filters with DVMs

can be as good as filters designed with frequency response as a primary criteria and,

40

Table 2.1 Improvement in image denoising.

LENA

σ Noisy DWT CT-PKVA CT-DVM10 28.12 32.31 32.08 32.5420 22.10 28.34 28.40 28.8330 18.59 25.80 25.93 26.3540 16.08 23.88 24.04 24.4550 14.14 22.34 22.52 22.96

BARBARA

σ Noisy DWT CT-PKVA CT-DVM10 28.12 29.96 30.18 30.7020 22.11 25.68 26.40 26.8430 18.58 23.30 24.18 24.5740 16.08 21.75 22.50 22.9150 14.15 20.60 21.18 21.60

in some cases, yield better results. In addition, because the filters are short, the Gibbs

phenomenon is considerably reduced.

2.8 The Equivalence Between Ladder and MappingDesigns

We now show that the Nyquist filters proposed by Phoong et al. in [50] can be seen

as a special case of the mapping approach proposed by Tay and Kingsbury [49], and here

extended to handle directional vanishing moments. We present the derivation for the

quincunx lattice – the rectangular lattice case can be similarly handled. First recall the

general form of the 2-D filters obtained in [50]:

H0(z1, z2) =1

2

(z−2N1 + z−1

1 p(z1z−12 )p(z1z2)

)(2.27)

H1(z1, z2) = −p(z1z−12 )p(z1z2)H0(z1, z2) + z−4N+1

1 , (2.28)

where p(z) is usually chosen as a halfband filter. The synthesis lowpass filter is G0(z1, z2) =

−H1(−z1,−z2) and, from the above, it follows that

G0(z1, z2)H0(z1, z2) − G0(−z1,−z2)H0(−z1,−z2) = z−6N+11 .

41

Now, consider the delayed versions of H0(z) and G0(z) given by H0(z1, z2) = z2N1 H0(z1, z2)

and G0(z1, z2) = z4N−11 G0(z1, z2). Then

H0(z1, z2) =1

2

(1 + z2N−1

1 p(z1z−12 )p(z1z2)

)

G0(z1, z2) =1

2

[2 + z2N−1

1 p(z1z−12 )p(z1z2)

×(1 − z2N−1

1 p(z1z−12 )p(z1z2)

)].

The above filters are simply delayed versions of the starting ones, thus being the same fil-

ters for practical purposes. Notice that the filters now satisfy H0(z)G0(z)+H0(z)G0(z) =

2. Setting z := z2N−11 p(z1z

−12 )p(z1z2) in (2.29), we see that the filters can be written in

terms of the 1-D polynomials

H(1D)0 (z) =

1

2(1 + z)

G(1D)0 (z) =

1

2[2 + z (1 − z)] ,

where H(1D)0 (z)G(1D)

0 (z)+H(1D)0 (−z)G(1D)

0 (−z) = 2. Finally note that z2N−11 p(z1z

−12 )p(z1z2)

is odd regardless of the nature of p(z). Hence we have the following result.

Proposition 7 The filters proposed in [50] constitute a particular case of the mapping

design of [49], where the mapping function is M(z1, z2) = z2N−11 p(z1z

−12 )p(z1z2).

42

CHAPTER 3

THE NONSUBSAMPLED CONTOURLETTRANSFORM: THEORY, DESIGN, AND

APPLICATIONS

In this chapter we develop the nonsubsampled contourlet transform (NSCT) and

study its applications. The construction proposed in this chapter is based on a nonsub-

sampled pyramid structure and nonsubsampled directional filter banks. The result is a

flexible multiscale, multidirection, and shift-invariant image decomposition that can be

efficiently implemented via the a trous algorithm. At the core of the proposed scheme is

the nonseparable two-channel nonsubsampled filter bank. We exploit the less stringent

design condition of the nonsubsampled filter bank to design filters that lead to a NSCT

with better frequency selectivity and regularity when compared to the contourlet trans-

form. We propose a design framework based on the mapping approach, that allows for a

fast implementation based on a lifting or ladder structure, and only uses one-dimensional

filtering in some cases. In addition, our design ensures that the corresponding frame

elements are regular, symmetric, and the frame is close to a tight one. We assess the

performance of the NSCT in image denoising. The NSCT compares favorably to other

existing methods in the literature.

This chapter is joint work with Minh Do and Jianping Zhou. The results of this chapter aresummarized in the references [53–55].

43

3.1 Introduction

A number of image processing tasks are efficiently carried out in the domain of an

invertible linear transformation. For example, image compression and denoising are ef-

ficiently done in the wavelet transform domain [1, 8]. An effective transform captures

the essence of a given signal or a family of signals with few basis functions. The set of

basis functions completely characterizes the transform and this set can be redundant or

not, depending on whether the basis functions are linear dependent. By allowing redun-

dancy, it is possible to enrich the set of basis functions so that the representation is more

efficient in capturing some signal behavior. In addition, redundant representations are

generally more flexible and easier to design. In applications such as denoising, enhance-

ment, and contour detection, a redundant representation can significantly outperform a

nonredundant one.

Another important feature of a transform is its stability with respect to shifts of the

input signal. The importance of the shift-invariance property in imaging applications

dates back at least to Daugman [56] and was also advocated by Simoncelli et al. in

[21]. An example that illustrates the importance of shift-invariance is image denoising by

thresholding where the lack of shift-invariance causes pseudo-Gibbs phenomena around

singularities [57]. Thus, most state-of-the-art wavelet denoising algorithms (see for ex-

ample [58–60]) use an expansion with less shift sensitivity than the standard maximally

decimated wavelet decomposition — the most common being the nonsubsampled wavelet

transform (NSWT) computed with the a trous algorithm [61]. 1

In addition to shift-invariance, it has been recognized that an efficient image represen-

tation has to account for the geometrical structure pervasive in natural scenes. In this

direction, several representation schemes have recently been proposed [16, 18–20, 36].

The contourlet transform [20] is a multidirectional and multiscale transform that is con-

structed by combining the Laplacian pyramid [25, 37] with the directional filter bank

1Denoising by thresholding in the NSWT domain can also be realized by denoising multiple circularshifts of the signal with a critically sampled wavelet transform and then averaging the results. This hasbeen termed cycle spinning after [57].

44

(DFB) proposed in [23]. The pyramidal filter bank structure of the contourlet transform

has very little redundancy, which is important for compression applications. However,

designing good filters for the contourlet transfom is a difficult task. In addition, due to

downsamplers and upsamplers present in both the Laplacian pyramid and the DFB, the

contourlet transform is not shift-invariant.

In this chapter we propose an overcomplete transform that we call the nonsubsampled

contourlet transform (NSCT). Our main motivation is to construct a flexible and efficient

transform targeting applications where redundancy is not a major issue (e.g., denoising).

The NSCT is a fully shift-invariant, multiscale, and multidirection expansion that has

a fast implementation. The proposed construction leads to a filter design problem that,

to the best of our knowledge, has not been addressed elsewhere. The design problem

is much less constrained than that of contourlets. This enables us to design filters with

better frequency selectivity, thereby achieving better subband decomposition. Using the

mapping approach we provide a framework for filter design that ensures good frequency

localization in addition to having a fast implementation through ladder steps. The NSCT

has proven to be very efficient in image denoising and image enhancement [53–55].

The chapter is structured as follows. In Section 3.2 we describe the NSCT and its

building blocks. We introduce a pyramid structure that ensures the multiscale feature

of the NSCT and the directional filtering structure based on the DFB. The basic unit in

our construction is the nonsubsampled filter bank (NSFB), which is discussed in Section

3.2. In Section 3.3 we study the issues associated with the NSFB design and implemen-

tation problems. Application of the NSCT in image denoising is discussed in Section 3.4.

Conclusions are drawn in Section 3.5.

Notation: Throughout the chapter, a two-dimensional (2-D) filter is represented by its

z-transform H(z) where z = [z1, z2]T . Evaluated on the unit sphere, a filter is denoted by

H(ejω) where ejω = [ejω1, ejω2]T . If m = [m1, m2]T is a 2-D vector, then zm = zm11 zm2

2 ,

whereas if M is a 2 × 2 matrix, then zM = [zm1, zm2] with m1,m2 the columns of M.

In this chapter we often deal with zero-phase 2-D filters. On the unit sphere, such filters

45

can be written as polynomials in cosω = (cosω1, cosω2)T . We thus write F (x1, x2) for a

zero-phase filter in which x1 and x2 denote cosω1 and cosω2, respectively.

Abbreviations: A number of abbreviations are used throughout the chapter:

NSCT - Nonsubsampled Contourlet Transform.

NSFB - Nonsubsampled Filter Bank.

NSP - Nonsubsampled Pyramid.

NSDFB - Nonsubsampled Directional Filter Bank.

NSWT - Nonsubsampled 2-D Wavelet Transform.

LAS - Local Adaptive Shrinkage.

3.2 Nonsubsampled Contourlets and Filter Banks

3.2.1 The nonsubsampled contourlet transform

Figure 3.1 (a) displays an overview of the proposed NSCT. The structure consists of

a bank of filters that splits the 2-D frequency plane in the subbands illustrated in Figure

3.1(b). Our proposed transform can thus be divided into two shift-invariant parts: (1)

A nonsubsampled pyramid structure that ensures the multiscale property and (2) A

nonsubsampled DFB structure that gives directionality.

3.2.1.1 The nonsubsampled pyramid (NSP)

The multiscale property of the NSCT is obtained from a shift-invariant filtering struc-

ture that achieves a subband decomposition similar to that of the Laplacian pyramid.

This is achieved by using two-channel nonsubsampled 2-D filter banks. Figure 3.2 il-

lustrates the proposed nonsubsampled pyramid (NSP) decomposition with J = 3 stages.

Such expansion is conceptually similar to the 1-D nonsubsampled wavelet transform com-

puted with the a trous algorithm [61] and has J + 1 redundancy, where J denotes the

number of decomposition stages. The ideal passband support of the lowpass filter at

the j-th stage is the region [− π2j ,

π2j ]2. Accordingly, the ideal support of the equivalent

highpass filter is the complement of the lowpass, i.e., the region [− π2j−1 ,

π2j−1 ]2\[− π

2j ,π2j ]2.

46

subband

subbands

directional

Bandpass

subbands

directional

Bandpass

Image

Lowpass

(a)

(π, π)

(−π,−π)

ω1

ω2

(b)

Figure 3.1 The nonsubsampled contourlet transform. (a) Nonsubsampled filter bankstructure that implements the NSCT. (b) The idealized frequency partitioning obtainedwith the proposed structure.

The filters for subsequent stages are obtained by upsampling the filters of the first stage.

This gives the multiscale property without the need for additional filter design. The

proposed structure is thus different from the separable nonsubsampled wavelet transform

(NSWT). In particular, one bandpass image is produced at each stage resulting in J + 1

redundancy. By contrast, the NSWT produces three directional images at each stage

resulting in 3J + 1 redundancy.

H0(z)

H1(z)

H0(z2I)

H1(z2I)

H0(z4I)

H1(z4I)

x

y0

y1

y2

y3

(a)

0 1 2 3

(π, π)

(−π,−π)

ω1

ω2

(b)

Figure 3.2 The proposed nonsubsampled pyramid is a 2-D multiresolution expansionsimilar to the 1-D nonsubsampled wavelet transform. (a) A three-stage pyramid decom-position. The lighter gray regions denote the aliasing caused by upsampling. (b) Thesubbands on the 2-D frequency plane.

47

The 2-D pyramid proposed in [62] is obtained with a similar structure. Specifically,

the NSFB of [62] is built from lowpass filter H0(z). One then sets H1(z) = 1−H0(z), and

the corresponding synthesis filters G0(z) = G1(z) = 1. A similar decomposition can be

obtained by removing the downsamplers and upsamplers in the Laplacian pyramid and

then upsampling the filters accordingly. Those perfect reconstruction systems can be seen

as a particular case of our more general structure. The advantage of our construction is

that it is general and, as a result, better filters can be obtained. In particular, in our

design G0(z) and G1(z) are lowpass and highpass. Thus, they filter certain parts of the

noise spectrum in the processed pyramid coefficients.

3.2.1.2 The nonsubsampled directional filter bank (NSDFB)

The directional filter bank of Bamberger and Smith [23] is constructed by combining

critically sampled two-channel fan filter banks and resampling operations. The result is a

tree-structured filter bank that splits the 2-D frequency plane into directional wedges. A

shift-invariant directional expansion is obtained with a nonsubsampled DFB (NSDFB).

The NSDFB is constructed by eliminating the downsamplers and upsamplers in the DFB.

This is done by switching off the downsamplers/upsamplers in each two-channel filter

bank in the DFB tree structure and upsampling the filters accordingly. This results in a

tree composed of two-channel nonsubsampled filter banks. Figure 3.3 illustrates a four-

channel decomposition. Note that in the second level, the upsampled fan filters Ui(zQ),

i = 0, 1 have checker-board frequency support and, when combined with the filters in

the first level, give the four directional frequency decomposition shown in Figure 3.3.

The synthesis filter bank is obtained similarly. Just like the critically sampled directional

filter bank, all filter banks in the nonsubsampled directional filter bank tree structure are

obtained from a single nonsubsampled filter bank with fan filters (see Figure 3.5 (b)).

Moreover, each filter bank in the NSDFB tree has the same computational complexity

as that of the prototype NSFB.

48

U0(z)

U1(z)

U0(zQ)

U0(zQ)

U1(zQ)

U1(zQ)

x

y0

y1

y2

y3

(a)

0 13

2

(π, π)

(−π,−π)

ω1

ω2

(b)

Figure 3.3 A four-channel nonsubsampled directional filter bank constructed with two-channel fan filter banks. (a) Filtering structure. The equivalent filter in each channel isgiven by Ueq

k (z) = Ui(z)Uj(zQ). (b) Corresponding frequency decomposition.

3.2.1.3 Combining the nonsubsampled pyramid and nonsubsampleddirectional filter bank in the NSCT

The NSCT is constructed by combining the NSP and the NSDFB as shown in Figure

3.1 (a). In constructing the nonsubsampled contourlet transform, care must be taken

when applying the directional filters to the coarser scales of the pyramid. Due to the

tree-structure nature of the NSDFB, the directional response at the lower and upper

frequencies suffers from aliasing, which can be a problem in the upper stages of the

pyramid (see Figure 3.8). This is illustrated in Figure 3.4 (a), where the passband region

of the directional filter is labeled as “Good” or “Bad.” Thus we see that for coarser

scales, the highpass channel in effect is filtered with the bad portion of the directional

filter passband. This results in severe aliasing and in some observed cases a considerable

loss of directional resolution.

We remedy this by judiciously upsampling the NSDFB filters. Denote the k-th direc-

tional filter by Uk(z). Then for higher scales, we substitute Uk(z2mI) for Uk(z), where m is

chosen to ensure that the good part of the response overlaps with the pyramid passband.

Figure 3.4 (b) illustrates a typical example. Note that this modification preserves perfect

49

(π, π)

(−π,−π)

ω1

ω2

(a)

“Good”“Bad”

(π, π)

(−π,−π)

ω1

ω2

(b)

Figure 3.4 The need for upsampling in the NSCT. (a) With no upsampling, the highpassat higher scales will be filtered by the portion of the directional filter that has “bad”response. (b) Upsampling ensures that filtering is done in the “good” region.

reconstruction. In a typical five-scale decomposition, we upsample by 2I the NSDFB

filters of the last two stages.

Filtering with the upsampled filters does not increase computational complexity.

Specifically, for a given sampling matrix S and a 2-D filter H(z), to obtain the out-

put y[n] resulting from filtering x[n] with H(zS), we use the convolution formula

y[n] =∑

k∈supp (h)

h[k]x[n − Sk]. (3.1)

This is the a trous filtering algorithm [61] (“a trous” is French for “with holes”).

Therefore, each filter in the NSDFB tree has the same complexity as that of the building-

block fan NSFB. Likewise, each filtering stage of the NSP has the same complexity as

that incurred by the first stage. Thus, the complexity of the NSCT is dictated by the

complexity of the building-block NSFBs. If each NSFB in both NSP and NSDFB requires

L operations per output sample, then for an image of N pixels the NSCT requires about

BNL operations, where B denotes the number of subbands. For instance, if L = 32, a

typical decomposition with 4 pyramid levels, 16 directions in the two finer scales, and 8

directions in the two coarser scales would require a total of 1536 operations per image

pixel.

If the building block 2-channel NSFBs in the NSP and NSDFB are invertible, then

clearly the NSCT is invertible. It also underlies a frame expansion (see Section 3.2.3).

50

The frame elements are localized in space and oriented along a discrete set of directions.

The NSCT is flexible in that it allows any number of 2l directions in each scale. In

particular, it can satisfy the anisotropic scaling law — a key property in establishing

the expansion nonlinear approximation behavior [20, 36]. This property is ensured by

doubling the number of directions in the NSDFB expansion at every other scale. The

NSCT has redundancy given by 1 +∑J

j=1 2lj , where lj denotes the number of levels in

the NSDFB at the j-th scale.

3.2.2 Nonsubsampled filter banks

At the core of the proposed NSCT structure is the 2-D two-channel nonsubsampled

filter bank. Shown in Figure 3.5 are the NSFBs needed to construct the NSCT. In this

chapter we focus exclusively on the FIR case simply because it is easier to implement

in multiple dimensions. For a general FIR two-channel NSFB, perfect reconstruction is

achieved provided the filters satisfy the Bezout identity:

H0(z)G0(z) + H1(z)G1(z) = 1. (3.2)

The Bezout identity puts no constraint on the frequency response of the filters in-

volved. Therefore, to obtain good solutions one has to impose additional conditions on

the filters.

+

H0(z)

H1(z)

G0(z)

G1(z)

x x

y0

y1

(a)

+x x

y0

y1

U0(z)

U1(z)

V0(z)

V1(z)(b)

Figure 3.5 The two-channel nonsubsampled filter banks used in the NSCT. The systemis two times redundant and the reconstruction is error free when the filters satisfy Bezout’sidentity. (a) Pyramid NSFB. (b) Fan NSFB.

51

3.2.3 Frame analysis of the NSCT

The nonsubsampled filter bank can be interpreted in terms of analysis/synthesis op-

erators of frame systems. A family of vectors {φn}n∈Γ constitute a frame for a Hilbert

space H if there exist two positive constants A, B such that for each x ∈ H we have

A‖x‖2 ≤∑

n∈Γ

|〈x,φn〉|2 ≤ B‖x‖2. (3.3)

In the event that A = B the frame is said to be tight. The frame bounds are the tightest

positive constants satisfying (3.3).

Consider the NSFB of Figure 3.5 (a). The family {h0[·−n], h1[·− n]}n∈Z2 is a frame

for *2(Z2) if and only if there exist constants 0 < A ≤ B < ∞ such that [12]

A ≤ |H0(ejω)|2 + |H1(e

jω)|2︸︷︷︸t(ejω )

≤ B. (3.4)

Thus, the frame bounds of an NSFB can be computed by

A = ess. infω∈[−π,π]2

t(ejω), B = ess. supω∈[−π,π]2

t(ejω), (3.5)

where ess. inf and ess. sup denote the essential infimum and essential supremum, respec-

tively. From (3.4), we see that the frame is tight whenever t(ejω) is almost everywhere

constant. For FIR filters, this means that H0(z)H0(z−1) + H1(z)H1(z−1) = c. Such a

condition can only be met with linear phase FIR filters if H0(z) and H1(z) are either

trivial delays or combinations of two delays (for a formal proof, see [7] pp. 337-338 or

[43]).

Because the NSFB is redundant, an infinite number of inverses exist. Among them,

the pseudo-inverse is optimal in the least square sense [1]. Given a frame of analysis

filters, the synthesis filters corresponding to the frame pseudo-inverse are given by Gi(z) =

Hi(z)/t(z) for i = 0, 1 [12]. In this case, the synthesis filters form the dual frame with

lower and upper frame bounds given by B−1 and A−1, respectively. When the analysis

filters are FIR, then unless the frame is tight, the synthesis filters of the pseudo-inverse

will be IIR.

52

From the above discussion we gather two important points: (1) linear phase filters

and tight frames are mutually exclusive; (2) the pseudo-inverse is desirable, but is IIR if

the frame is not tight. Consequently, an FIR NSFB system with linear phase filters and

with synthesis filters corresponding to the pseudo-inverse is not possible. However, we

can approximate the pseudo-inverse with FIR filters. For a given number of filter taps,

the closer the frame is to being tight, the better an FIR approximation of the pseudo-

inverse can be [13]. Thus, in the designs of the filters we seek linear phase filters that

underly a frame that is as close to a tight one as possible.

In a general FIR perfect reconstruction NSFB system both analysis and synthesis

filters form a frame. If we denote the analysis and synthesis frame bounds by Aa, Ba and

As, Bs, respectively, the frames will be close to tight provided [13]

ra := Aa/Ba ≈ 1, and rs := As/Bs ≈ 1.

We always assume that the filters are normalized so that we have Aa ≤ 1 ≤ Ba. In

case the pseudo-inverse is used, then we also have As ≤ 1 ≤ Bs. The following result

shows the NSCT is a frame operator for *2(Z2) whenever each constituent NSFB forms

a frame.

Proposition 8 In the nonsubsampled contourlet transform, if the pyramid filter bank

constitutes a frame with frame bounds Ap and Bp, and the fan filters constitute a frame

with frame bounds Aq and Bq, then the NSCT is a frame with bounds A and B satisfying

AJpAmin {lj}

q ≤ A ≤ B ≤ BJp Bmax {lj}

q

Proof. Consider the pyramid shown in Figure 3.2 (a). If J = 1, then we have that

‖y0‖2 + ‖y1‖2 ≤ Bp‖x‖2. Now, suppose we have J levels and assume∑J

j=0 ‖yi‖2 ≤

53

BJp ‖x‖2. Then if we further split yJ into y′

J and yJ+1, noting that Bp ≥ 1 we have that

J−1∑

j=0

‖yj‖2 + ‖y′J‖2 + ‖yJ+1‖2 ≤

J−1∑

j=0

‖yj‖2 + Bp‖yJ‖2

≤ Bp

(J−1∑

j=0

‖yj‖2 + ‖yJ‖2

)

≤ BJ+1p ‖x‖2.

Thus, by induction we conclude that∑J

j=0 ‖yi‖2 ≤ BJp ‖x‖2 for any J ≥ 1. A similar

argument shows that in the NSDFB with lj stages, one has that∑2lj−1

k=0 ‖yj,k‖2 ≤ Bljq ‖yj‖2

so that

‖yJ‖2 +J−1∑

j=0

2lj−1∑

k=0

‖yj,k‖2 ≤ ‖yJ‖2 +J−1∑

j=0

Bljq ‖yj‖2

≤ ‖yJ‖2 + Bmax {lj}q

J−1∑

j=0

‖yj‖2

≤ BJBmax {lj}q ‖x‖2.

The bound for A is proved similarly, by just reversing the inequalities. !

Remark 6 When both the pyramid and fan filter banks form tight frames with bound

1, then Ap = Aq = Bp = Bq = 1, and from the above proposition, the nonsubsampled

contourlet transform is also a tight frame with bound 1.

The above estimates on A and B can be accurate in some cases, especially when the

frame is close to a tight one and the number of levels is small (e.g., J = 4). In general,

however, they are not accurate estimates. Their purpose is more of giving an interval for

the frame bounds rather than the actual values. Table 3.1 shows estimates for different

numbers of scales. The actual frame bound is computed from (3.4 - 3.5), whereas the

estimates are the ones given according to Proposition 8.

54

Table 3.1 Frame bounds evolving with scale for the pyramid filters given in Example 3in Section 3.3.

J A actual A estimated B actual B estimated1 0.9586 0.9596 1.0435 1.04352 0.9393 0.9189 1.0504 1.08893 0.9332 0.8808 1.0515 1.13624 0.9316 0.8444 1.0517 1.1857

3.3 Filter Design and Implementation

The filter design problem of the NSCT comprises the two basic NSFBs displayed in

Figure 3.5. The goal is to design the filters imposing the Bezout identity (i.e., perfect

reconstruction) and enforcing other properties such as sharp frequency response, easy

implementation, regularity of the frame elements, and tightness of the corresponding

frames. It is also desirable that the filters are linear-phase.

Two-channel 1-D NSFBs that underlie tight frames are designed in [12]. However, the

design methodology of [12] is not easy to extend to 2-D designs since it relies on spectral

factorization, which is hard in 2-D. If we relax the tightness constraint, then the design

becomes more flexible. In addition, as we alluded to earlier, nontight filters can be linear

phase.

An effective and simple way to design 2-D filters is the mapping approach first pro-

posed by McClellan [63] in the context of digital filters and then used by several authors

[34, 43, 49, 50] in the context of filter banks. In this approach, the 2-D filters are obtained

from 1-D ones. In the context of NSFBs, a set of perfect reconstruction 2-D filters is

obtained in the following way:

Step 1. Construct a set of 1-D polynomials {H(1D)i (x), G(1D)

i (x)}i=0,1 that satisfies the

Bezout identity.

Step 2. Given a 2-D FIR filter f(z), then {H(1D)i (f(z)), G(1D)

i (f(z))}i=0,1 are 2-D filters

satisfying the Bezout identity.

Thus, one has to design the set of 1-D filters and the mapping function f(z) so that

the ideal responses are well approximated with a small number of filter coefficients. In

55

the mapping approach, one can control the frequency and phase responses through the

mapping function. Moreover, if the mapping function is zero-phase, then f(z) = f(z−1)

and it follows that the mapped filters are also zero-phase. In this case, on the unit

sphere, the mapping function is a 2-D polynomial in (cosω1, cosω2). We thus denote it

by F (x1, x2), where it is implicit that f(ejω) = F (cosω1, cosω2).

x xQ(1D)(f(z)) Q(1D)(f(z)) P (1D)(f(z))P (1D)(f(z))

Figure 3.6 Lifting structure for the nonsubsampled filter bank designed with the map-ping approach. The 1-D prototype is factored with the Euclidean algorithm. The 2-Dfilters are obtained by replacing x #→ f(z).

3.3.1 Implementation through lifting

Filters designed with the mapping approach can be efficiently factored into a ladder

[64] or lifting [65] structure that simplifies computations. To see this, assume without loss

of generality that the degree of the highpass prototype polynomial H(1D)1 (x) is smaller

than that of H(1D)0 (x). Suppose also that there are synthesis filters G(1D)

0 (x) and G(1D)1 (x)

such that the Bezout identity is satisfied. In this case it follows that gcd{H(1D)0 , H(1D)

1 } =

1. The Euclidean algorithm then enables us to factor the filters in the following way

[64–66]:

H(1D)0 (x)

H(1D)1 (x)

=N∏

i=0

1 0

P (1D)i (x) 1

1 Q(1D)i (x)

0 1

1

0

. (3.6)

As a result, we can obtain a 2-D factorization by replacing x with f(z). This factor-

ization characterizes every 2-D NSFB derived from 1-D through the mapping method.

Figure 3.6 illustrates the ladder structure with one stage.

In general, the lifting implementation at least halves the number of multiplications

and additions of the direct form [65]. The complexity can be reduced further if the lifting

56

steps in the 1-D prototype are monomials and the mapping filter f(z) has the form

f(z) = f1(zp11 zp2

2 )f2(zq11 zq2

2 ) (3.7)

for suitable f1(z), f2(z), and integers p1, p2, q1, q2. Note that if f(z) is a 1-D filter, then the

2-D filter f(zp1z

q2) for integers p and q has the same complexity as that of f(z). Therefore,

filters of the form in (3.7) have the same complexity as that of separable filters (i.e., filters

of the form f(z) = f(z1)f(z2) ) which amounts to two 1-D filtering operations. Notice

that if f(z) is as in (3.7), then for an arbitrary sampling matrix S, f(zS) also has the form

in (3.20). Consequently, all NSFBs in the NSDFB tree structure can be implemented

with 1-D operations whenever the prototype fan NSFB can be implemented with 1-D

filtering operations. The same reasoning applies to the NSFBs of the NSP.

3.3.2 Pyramid filter design

In the pyramid case, we impose line zeros at ω = (π,ω2) and ω = (ω1, π). Notice

that an N -th order line zero at ω = (π,ω2), for example, amounts to a (1 + ejω1)N

factor in the low pass filter. Such zeros are useful to obtain good approximation of the

ideal frequency response of Figure 3.5, in addition to imposing regularity of the scaling

function. We point out that for the approximation of smooth Cα images, 1 + 2α3 point

zeros at (±π,±π) would suffice [67]. However, our experience shows that point zeros alone

do not guarantee a “reasonable” frequency response of the pyramid filters. The following

proposition characterizes the mapping function that generates zeros in the resulting 2-D

filters.

Proposition 9 Let G(1D)(z) be a polynomial with roots {zi}ni=1 where each zi has multi-

plicity ni. Suppose we want a mapping function F (x1, x2) such that

G(1D) (F (x1, x2)) = (x1 − c)N1 (x2 − d)N2 L(x1, x2), (3.8)

where L(x1, x2) is a bivariate polynomial. Then G(1D)(F (x1, x2)) has the form in (3.8) if

and only if F (x1, x2) takes the form

F (x1, x2) = zj + (x1 − c)N ′

1 (x2 − d)N ′

2 LF (x1, x2), (3.9)

57

for some root zj ∈ {zi}ni=1, where LF (x1, x2) is a bivariate polynomial, and N ′

1, N′2 are

such that N ′1ni ≥ N1 and N ′

2nj ≥ N2.

Proof.

We prove the claim for the case in which the zeros of G(1D)(z) are distinct. The proof

for the case of repeated roots can be handled similarly.

Denote

G(1D)(z) = g0

n∏

i=1

(z − zi), g0 ∈ C. (3.10)

Then sufficiency follows by direct substitution of (3.9) in (3.10).

We prove necessity by induction. Suppose G(1D)(F (x1, x2)) = (x1 − c)N1L(x1, x2)

for some polynomial L(x1, x2). Note that G(1D)(F (c, x2)) = 0 for all x2 if and only if

F (c, x2) = zj for all x2 and some zero zj of G(1D)(z). So, it follows that

(F (x1, x2) − zj)x1=c = 0 for all x2, (3.11)

which implies that F (x1, x2) = zj + (x1 − c)L1(x1, x2) with L1(x1, x2) a polynomial.

Suppose

F (x1, x2) = zj + (x1 − c)k−1Lk−1(x1, x2),

where k ≤ N1. By successively applying the chain rule for differentiation we get that

0 =∂k−1G(1D)(F (x1, x2))

∂xk−11

∣∣∣∣x1=c

= G(1D)′(F (c, x2))∂k−1F (x1, x2)

∂xk−11

∣∣∣∣x1=c

.

Because F (c, x2) = zj and the zj ’s are distinct, we have that G(1D)′(F (c, x2)) *= 0 and

then

∂k−1F (x1, x2)

∂xk−11

∣∣∣∣x1=c

= 0

⇒ ∂k−1F (x1, x2)

∂xk−11

= (x1 − c)L1(x1, x2) (3.12)

⇒ ∂F (x1, x2)

∂x1= (x1 − c)k−1Lk−1(x1, x2), (3.13)

58

where (3.13) follows by successively integrating (3.12). Combining (3.13) with (3.11) we

obtain that F (x1, x2) = zj + (x1 − c)kLk(x1, x2). By induction we conclude that

F (x1, x2) = zj + (x1 − c)N1LN1(x1, x2).

If G(1D)(F (x1, x2)) = (x2−d)N2L(x1, x2) for some polynomial L(x1, x2), then a similar

argument shows that

F (x1, x2) = zi + (x2 − d)N2LN2(x1, x2),

where zi is a zero of G(1D)(z). Thus,

zi − zj = (x1 − c)N1LN1(x1, x2) − (x2 − d)N2LN2(x1, x2),

and hence we have zj = zi. Therefore,

(x1 − c)N1LN1(x1, x2) = (x2 − d)N2LN2(x1, x2),

and so

LN1(x1, x2) = (x2 − d)N2LF (x1, x2),

and (3.9) is established with N ′1 = N1 and N ′

2 = N2. !

The above result holds in general, even for point zeros (as opposed to line zeros as in

Proposition 9). We will explore this extensively in the designs that follow.

Suppose the prototype filters H(1D)0 (x), G(1D)

0 (x) each have zeros at x = −1. Then, in

order to produce a suitable zero-phase mapping function for the pyramid NSFB, we

consider the class of maximally flat filters given by the polynomials [2]

PN,L(x) :=

(1 + x

2

)N L−1−N∑

l=0

(N + l − 1

l

)(1 − x

2

)l

, (3.14)

where N controls the degree of flatness at x = −1 and L controls the degree flatness at

x = 1. Following Proposition 9, we can construct a family of mapping functions as

F (x1, x2) = −1 + 2PN0,L0(x1)PN1,L1(x2) (3.15)

so that zero moments at x1 = −1 and x2 = −1 are guaranteed. Note that, except for

the constant −1, F (x1, x2) has the form of (3.7) and hence can be implemented with 1-D

filtering operations only.

59

Table 3.2 Maximally flat mapping polynomials used in the design of the nonsubsampledfan filter bank.

N FN(x1, x2)

1 12(x1 + x2)

2 14(x1 + x2)(3 − x1x2)

3 116(x1 + x2)(15 − x2

1 − 8x1x2 − x22 + 3x2

1x22)

4 132(x1 + x2)(35 − 5x2

1 − 25x1x2 + 3x31x2 − 5x2

2 + 15x21x

22 + 3x1x3

2 − 5x31x

32)

5 1256(x1 + x2) (315 − 70x2

1 + 3x41 − 280x1x2 + 72x3

1x2 − 70x22 + 228x2

1x22

−30x41x

22 + 72x1x3

2 − 120x31x

32 + 3x4

2 − 30x21x

42 + 35x4

1x42)

3.3.3 Fan filter design

To design the fan filters idealized in Figure 3.5(b) we use the same methodology as

in the pyramid case. The distinction occurs in the mapping function. The fan-shaped

response can be obtained from a diamond-shaped response by simple modulation in one

of the frequency variables. This modulation preserves the perfect reconstruction prop-

erty. A useful family of mapping functions for the diamond-shaped response is obtained

by imposing flatness around ω = (±π,±π) and ω = (0, 0) in addition to point zeros at

(±π,±π). If the mapping function is zero-phase, this amounts to imposing flatness in a

polynomial at (x1, x2) = (−1,−1) and (x1, x2) = (1, 1), and zeros at (x1, x2) = (−1,−1).

Mapping functions satisfying these desiderata with minimum number of polynomial co-

efficients are given by

FN(x1, x2) = −1 + QN (x1, x2), (3.16)

where the polynomials QN (x1, x2) give the class of maximally flat half-band filters with

diamond support. A closed-form expression for QN (x1, x2) is given in [68]. Table 3.2

displays the first six mapping functions FN(x1, x2) for the diamond filter bank.

We point out that diamond maximally flat mapping polynomials can be generated

from 1-D ones by a separable product and then an appropriate change of variables. In

this case the mapping function is separable and has the form f(z) = f1(z1z2)f2(z1z−12 ).

This has been done for instance in [50]. However, the nonseparable solution is generally

shorter (roughly by a factor of 2) while yielding the same number of zeros at the aliasing

60

frequencies and a similar frequency response. For faster implementation, one may choose

the longer mapping filters which can be implemented with 1-D filtering operations.

3.3.4 Design examples

The design through mapping is based on a set of 1-D polynomials that satisfies the

Bezout identity, that is, H(1D)0 (x)G(1D)

0 (x) + H(1D)1 (x)G(1D)

1 (x) = 1. The design can be

simplified if we impose the restriction that

H(1D)1 (x) = G(1D)

0 (−x)and G(1D)1 (x) = H(1D)

0 (−x).

One advantage of this choice is that the frequency response of the filters can be controlled

by the lowpass branch — the highpass will automatically have the complementary re-

sponse. Another advantage is that, under an additional condition, the frame bounds of

the analysis and synthesis frames are the same and can be computed from the 1-D proto-

types. To see this, suppose that f is the mapping function and that Ran(f) = Ran(−f)

with Ran(f) denoting the range of the mapping function f . Then we have that

Aa = infω∈[π,π]2

|H(1D)0 (f(ejω))|2 + |G(1D)

0 (−f(ejω))|2

= infx∈Ran(f)

|H(1D)0 (x)|2 + |G(1D)

0 (−x)|2

= infx∈Ran(−f)

|H(1D)0 (−x)|2 + |G(1D)

0 (x)|2

= infx∈Ran(f)

|H(1D)0 (−x)|2 + |G(1D)

0 (x)|2

= As. (3.17)

A similar argument shows that

Ba = Bs = supx∈Ran(f)

|H(1D)0 (−x)|2 + |G(1D)

0 (x)|2. (3.18)

Example 3 ( Pyramid filters very close to tight ones ) In order to get filters that

are almost tight, we design the prototypes H(1D)0 (x) and G(1D)

0 (x) to be very close to each

other. If we let H(1D)1 (x) = G(1D)

0 (−x) and G(1D)1 (x) = H(1D)

0 (−x), then the following

61

filters can be checked to satisfy the Bezout identity:

H(1D)0 (x) = K

(1 + k2x + k2k3x

2),

G(1D)0 (x) = K−1

(1 − k1x − k3x + k1k2x

2 − k1k2k3x3).

To obtain the prototype we choose the constants K, k1, k2, and k3 such that each filter has

a zero at z = −1 and in addition that H(1D)0 (−1) = G(1D)

0 (−1) =√

2. We obtain

H(1D)0 (x) =

1

2(x + 1)

(√2 + (1 −

√2)x

),

G(1D)0 (x) =

1

2(x + 1)

(√2 + (4 − 3

√2)x + (2

√2 − 3)x2

).

The lifting factorization of the prototype filters is given by

H(1D)0 (x)

H(1D)1 (x)

=

1 αx

0 1

1 0

βx 1

1 γx

0 1

K1

K2

(3.19)

with

α = γ = 1 −√

2, β =1√2, K1 = K2 =

1√2.

Notice that this implementation has 4 multiplies/sample whereas the direct form has

7 multiplies/sample. In this example we set F (x1, x2) = −1+2P2,4(x)P2,4(y) so that each

of the filters has a fourth-order zero at ω1 = ±π and at ω2 = ±π. Since the ladder steps

are monomials, the NSFB can be implemented with 1-D filtering operations. The frame

bounds are computed using (3.17-3.18):

Aa = As = 0.96, Ba = Bs = 1.04.

Thus we have ra = rs = 1.083 and the frame is almost tight. The support size of H0(z)

is 13× 13, whereas G0(z) has support size 19× 19. Figure 3.7 shows the response of the

resulting filters.

Example 4 (Maximally flat fan filters) Using the prototype filters of Example 3, we

modulate a maximally flat diamond mapping function to obtain the maximally flat fan

mapping function. Thus we choose FN (x1, x2) in Table 3.2 with N = 3. We use the

62

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(a) |H0(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(b) |G0(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(c) |H1(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(d) |G1(ejω)|

Figure 3.7 Magnitude response of the filters designed in Example 3 with maximallyflat filters. The nonsubsampled pyramid filter bank underlies almost tight analysis andsynthesis frames.

lifting factorization of the previous example. The diamond filters obtained have vanishing

moments on the corner points (±π,±π). Figure 3.8 shows the magnitude response of the

filters. The support size of U0(z) is 21× 21, whereas the support size of V0(z) is 31× 31.

3.3.5 Regularity of the NSCT basis functions

The regularity of the NSCT basis functions can be controlled by the NSP low-pass

filters. Denoting by H0(ω) the scaling lowpass filter used in the pyramid (we write H0(ω)

instead of H0(ejω) for convenience), we have the associated scaling function

Φ(ω) :=∞∏

j=1

H0(2−j

ω),

63

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(a) |U0(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(b) |V0(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(c) |U1(ejω)|

-2

0

2-2

0

2

00.250.5

0.75

1

-2

0

2

(d) |V1(ejω)|

Figure 3.8 Fan filters designed with prototype filters of Example 3 and diamond maxi-mally flat mapping filters.

where convergence is in the weak sense. In our proposed design the filter H0(ω) can be

factored as

H0(ω) =

(1 + ejω1

2

)N1(

1 + ejω2

2

)N2

RH0(ω). (3.20)

Notice that the remainder filter RH0(ω) is not separable. Therefore, one cannot separate

the regularity estimation as two 1-D problems. Nonetheless, a similar argument can be

developed and an estimate of the 2-D regularity of the scaling filter is obtained.

Proposition 10 Let H0(ω) be a scaling filter as in (3.20) with the corresponding scaling

function Φ(ω). Let

B = supω∈[π,π]2

|RH0(ω)|.

Then

|Φ(ω)| ≤ C

(1

1 + |ω1|

)N1−log2 B ( 1

1 + |ω2|

)N2−log2 B

.

64

Proof. The proof follows the same lines as for the 1-D case (see e.g., [1, p. 245]). Using

the identity∞∏

k=1

(1 + ej 2−kω

2

)N

=

(1 + ejω

ω

)N

we have that

Φ(ω) =

(1 + ejω1

ω1

)N1(

1 + ejω2

ω2

)N2 ∞∏

j=1

R0(2−j

ω).

Because H0 is continuously differentiable at ω = 0, the same also holds for R0. Now

write R0 in polar coordinates (r,ϕ) where r2 = |ω1|2 + |ω2|2. Since R0 is continuously

differentiable, for each ϕ, R0(r,ϕ) is a continuously differentiable function of r. Since

R0(0) = 1, from the mean value theorem we have that for ε > 0, and 0 ≤ r < ε,

|R(r,ϕ)| ≤ 1 + |∂rR(ρ,ϕ)| r ≤ 1 + Kr

where K := sup{|R(ρ,ϕ)| : 0 ≤ ρ < ε, ϕ ∈ [0, 2π]} < ∞. This gives

0 ≤ r < ε =⇒∞∏

j=1

R0(2−j

ω) ≤∞∏

j=1

(1 + K2−jr) ≤ eKε,

where we have used the inequality 1 + r ≤ er. Now choose J so that 2J−1ε ≤ r ≤ 2Jε.

We then obtain, for r > ε,

∞∏

j=1

|R0(2−j

ω)| =J∏

j=1

|R0(2−j

ω)|∞∏

j=1

|R0(2−j−J

ω)| ≤ BJeKε ≤ C1rlog2 BeKε

so that for each ω ∈ R2,

∞∏

j=1

R0(2−j

ω) ≤ eKε(1 + C1r

log2 B)

= eKε(1 + C1(|ω1|2 + |ω2|2)

12 log2 B

).

Now, putting all together we obtain

|Φ(ω)| ≤ C21

|ω1|N1

1

|ω2|N2

(1 + C1(|ω1|2 + |ω2|2)

12 log2 B

)

≤ C3

(1

1 + |ω1|

)N1−log2 B ( 1

1 + |ω2|

)N2−log2 B

,

65

which completes the proof. !

As an example, consider the prototype filter in Example 3 and the mapping F (x1, x2) =

−1 + 2P1,2(x)P1,2(y). The resulting filter has second-order zeros at ω1 = ±π and at

ω2 = ±π. It can be verified that |RH0(ω)| ! 1.83 and |RG0(ω)| ! 1.49 so that the

regularity exponent2 is at least 2 − log2 1.83 ≈ 1.13 for H0(ω) and 2 − log2 1.49 ≈ 1.43

for G0(ω).

Thus the corresponding scaling functions and wavelets are at least continuous. We

point out that better estimates are possible applying similar 1-D techniques. For instance,

one could prove a result similar to Lemma 7.1.2 in [2], as a consequence of Proposition

10.

Figure 3.9 shows the basis functions of the NSCT obtained with the filters designed

via mapping. As the picture shows, the functions have a good degree of regularity.

3.4 Applications

3.4.1 Image denoising

In order to illustrate the potential of the NSCT design using the techniques previously

discussed, we study additive white Gaussian noise (AWGN) removal from images by

means of thresholding estimators.

3.4.1.1 Comparison to other transforms

To highlight the performance of the NSCT relative to other transforms, we perform

hard threshold on the subband coefficients of the various transforms. We choose the

threshold Ti,j = Kσnijfor each subband. This has been termed K-sigma thresholding in

[69]. We set K = 4 for the finest scale and K = 3 for the remaining ones. We use five

scales of decomposition for nonsubsampled contourlet transform, contourlet transform

2The regularity exponent of a scaling function φ(t) is the largest number α such that Φ(ω) decays asfast as 1

(1+|ω1|+|ω2|)α .

66

Basis functions, Level 2

(a)



(b)

Figure 3.9 Basis functions of the nonsubsampled contourlet transform. (a) Basis func-tions of the second stage of the pyramid. (b) Basis functions of third (top 8) and fourth(bottom 8) stages of the pyramid.

(CT), and nonsubsampled wavelet transform. For the NSCT and CT we use 4,8,8,16,16

directions in the scales from coarser to finer, respectively.

Table 3.3 (left columns) shows the PSNR results for various transforms and noise

intensities. The results show the NSCT is consistently superior to curvelets and NSWT

67

Table 3.3 Denoising performance of the NSCT. The left-most columns are hard thresh-olding and the right-most ones soft estimators. For hard thresholding, the NSCT con-sistently outperforms curvelets and the NSWT. The NSCT-LAS performs on a par withthe more sophisticated estimator BLS-GSM and is superior to the BivShrink estimatorof.

Lena Comparison to other transforms Comparison to other methods

σ Noisy NSWT Curvelets CT NSCT NSWT-LAS BivShrink BLS-GSM NSCT-LAS

10 28.13 34.26 34.17 31.90 34.69 35.19 35.34 35.59 35.46

20 22.13 31.40 31.52 28.34 32.03 32.12 32.40 32.62 32.50

30 18.63 29.66 30.01 27.10 30.35 30.30 30.54 30.84 30.70

40 16.13 28.37 28.84 25.84 29.10 29.01 - 29.58 29.38

50 14.20 27.41 27.78 24.87 28.10 28.00 - 28.61 28.34

Barb. Comparison to other transforms Comparison to other methods

σ Noisy NSWT Curvelets CT NSCT NSWT-LAS BivShrink BLS-GSM NSCT-LAS

10 28.17 31.58 32.28 29.62 33.01 33.40 33.35 34.03 34.09

20 22.15 27.23 28.89 26.26 29.41 29.45 29.80 30.28 30.60

30 18.63 25.10 26.93 24.42 27.24 27.22 27.65 28.11 28.56

40 16.14 24.02 25.51 23.16 25.79 25.76 - 26.58 27.12

50 14.20 23.37 24.31 22.29 24.79 24.72 - 25.43 26.02

Pepp. Comparison to other transforms Comparison to other methods

σ Noisy NSWT Curvelets CT NSCT NSWT-LAS BLS-GSM NSCT-LAS

10 28.17 33.71 33.59 31.30 33.81 34.35 34.63 34.41

20 22.15 31.19 31.13 28.57 31.60 31.65 32.06 31.82

30 18.63 29.43 29.45 26.81 30.07 29.95 30.41 30.19

40 16.13 28.09 28.01 25.50 28.85 28.65 29.20 28.95

50 14.20 27.04 26.70 24.47 27.82 27.62 28.24 27.93

in PSNR measure. For the “Barbara” image, the NSCT yields improvements in excess

of 1.90 dB in PSNR over the NSWT. The NSCT also is superior to the CT as the results

show. Figure 3.10 displays the reconstructed images using the the NSWT, curvelets, and

NSCT. As the figure shows, both the NSCT and the curvelet transform offer a better

recovery of edge information relative to the NSWT. But improvements can be seen in

the NSCT, particularly around the eye.

68

(a) Original (b) NSWT

(c) Curvelets (d) NSCT

Figure 3.10 Image denoising with the NSCT and hard thresholding. The noisy intensityis 20. (a) Original Lena image. (b) Denoised with the NSWT, PSNR = 31.40 dB. (c)Denoised with the curvelet transform and hard thresholding, PSNR = 31.52 dB. (d)Denoised with the NSCT, PSNR = 32.03 dB.

3.4.1.2 Comparison to other denoising methods

We perform soft thresholding (shrinkage) independently in each subband. Following

[58] we choose the threshold

Ti,j =σ2

Nij

σi,j,n,

69

where σi,j,n denotes the variance of the n-th coefficient at the i-th directional subband

of the j-th scale, and σ2Nij

is the noise variance at scale j and direction i. It is shown

in [58] that shrinkage estimation with T = σ2

σX, and assuming X generalized Gaussian

distributed yields a risk within 5% of the optimal Bayes risk. As studied in [70], contourlet

coefficients are well modeled by generalized Gaussian distributions. The signal variances

are estimated locally using the neighboring coefficients contained in a square window

within each subband and a maximum likelihood estimator. The noise variance in each

subband is inferred using a Monte Carlo technique where the variances are computed for

a few normalized noise images and then averaged to stabilize the results. We refer to this

method as local adaptive shrinkage (LAS). Effectively, our LAS method is a simplified

version of the denoising method proposed in [71] that works in the NSCT or NSWT

domain. In the LAS estimator we use four scales for both the NSCT and NSWT. For

the NSCT we use 3,3,4,4 directions in the scales from coarser to finer, respectively.

To benchmark the performance of the NSCT-LAS scheme we have used two of the

best denoising methods in the literature: (1) bivariate shrinkage with local variance

estimation (BivShrink) [59]; (2) Bayes least-squares with a Gaussian scale-mixture model

(BLS-GSM) proposed in [60]. Table 3.3 (right columns) shows the results obtained.3

The NSCT coupled with the LAS estimator (NSCT-LAS) produced very satisfactory

results. In particular, among the methods studied, the NSCT-LAS yields the best results

for the “Barbara” image, being surpassed by the BLS-GSM method for the other images.

Despite its slight loss in performance relative to BLS-GSM, we believe the NSCT has

potential for better results. This is because by comparison, the BLS-GSM is a consider-

ably richer and more sophisticated estimation method than our simple local thresholding

estimator. However, studying more complex denoising methods in the NSCT domain

is beyond the scope of the present chapter. Figure 3.11 displays the denoised images

with both BLS-GLM and NSCT-LAS methods. As the pictures show, the NSCT offers

3The PSNR values of the BivShrink method were obtained from the tables in [59]. In [59] the authorsdo not use the “Peppers” image as a test image, hence we do not have a BivShrink column for “Peppers.”

70

a slightly better reconstruction. In particular, the table cloth texture is better recovered

in the NSCT-LAS scheme.

(a) Original (b) BLS-GSM (c) NSCT-LAS

Figure 3.11 Comparison between the NSCT-LAS and BLS-GSM denoising methods.The noise intensity is 20. (a) Original Barbara image. (b) Denoised with the BLS-GSMmethod, PSNR = 30.28 dB. (c) Denoised with NSCT-LAS, PSNR = 30.60 dB.

We briefly mention that in denoising applications, one can reduce the redundancy of

the NSCT by using critically sampled directional filter banks over the nonsubsampled

pyramid. This results in a transform with J + 1 redundancy which is considerably

faster. There is however a loss in performance as Table 3.4 shows. Nonetheless, in some

applications, the small performance loss might be a good price to pay given the reduced

redundancy of this alternative construction.

Table 3.4 Relative loss in PSNR performance (dB) when using the NSP with a criticallysampled DFB and the LAS estimator with respect to the NSCT-LAS method.

σ “Lena” “Barbara” “Peppers”10 -0.23 -0.48 -0.3220 -0.09 -0.44 -0.0930 -0.05 -0.37 -0.0140 -0.04 -0.34 -0.0050 -0.05 -0.31 -0.01

71

3.5 Conclusion

We have developed a fully shift-invariant version of the contourlet transform, the non-

subsampled contourlet transform. The design of the NSCT is reduced to the design of

a nonsubsampled pyramid filter bank and a nonsubsampled fan filter bank. We exploit

this new, less stringent filter design problem using a mapping approach, thus overcom-

ing the need of factorization. We also developed a lifting/ladder structure for the 2-D

NSFB. This structure, when coupled with the filters designed via mapping, provides

a very efficient implementation that under some additional conditions can be reduced

to 1-D filtering operations. Applications of our proposed transform in image denoising

and enhancement were studied. In denoising, we studied the performance of the NSCT

when coupled with a hard thresholding estimator and a local adaptive shrinkage. For

hard thresholding, our results indicate that the NSCT provides better performance than

competing transform such as the NSWT and curvelets. Concurrently, our local adaptive

shrinkage results are competitive to other denoising methods. In particular, our results

show that a fairly simple estimator in the NSCT domain yields comparable performance

to state-of-the-art denoising methods that are more sophisticated and complex. In image

enhancement, the results obtained with the NSCT are superior to those of the NSWT

both visually and with respect to objective measurements.

72

CHAPTER 4

THE INFORMATION RATES OF THEPLENOPTIC FUNCTION

The plenoptic function (Adelson and Bergen, 91) describes the visual information

available to an observer at any point in space and time. Samples of the plenoptic function

(POF) are seen in video and in general visual content, and represent large amounts of

information.

In this chapter we study the compression limits of the plenoptic function. A model

for the POF is that of a camera moving randomly through space and acquiring samples

of the POF at each time instant. The model has two sources of information representing

ensembles of camera motions and ensembles of visual scene data (i.e., “realities”). An

ensemble of camera motions is obtained by considering discrete random walks, and an

ensemble of realities is modelled with stationary ergodic processes.

Within our model, there are two cases to consider. In the first case, which we refer

to as the “video coding case,” the locations of the samples of the POF are not available

to the encoder, and the goal is to reproduce the sequence of samples of the POF at the

decoder. This results in a stochastic model for video that we study in detail. Both lossless

and lossy information rates are studied. The model is further extended to account for

realities that change over time. We derive bounds on the lossless and lossy information

rates for this dynamic reality model, stating conditions under which the bounds are tight.

Examples with synthetic sources suggest that in the presence of scene motion, simple

The results of this chapter appear in part in the references [72, 73]. Parts of this work were doneduring visits to LCAV-EPFL in Aug-Sept 2005 and May-Jul 2006. This is joint work with Prof. MartinVetterli and Prof. Minh Do.

73

hybrid coding using motion estimation with DPCM performs suboptimally relative to

the true rate-distortion bound.

In the second case, which we refer to as the “recording reality” case, the trajectory

is available at the encoder, and the goal is to reproduce the reality at the decoder. We

show that in this case, the information rate of the resulting process is the same one as

that of the underlying reality process, even in the case of a random traversal. We also

propose a simple code for the process that essentially attains the information rate bound.

4.1 Introduction

4.1.1 Background

Consider a moving camera that takes sample snapshots of an environment over time.

The samples are to be coded for later transmission or storage. Because the movements

of the camera are small relative to the scene, there are large correlations among multiple

acquisitions.

Examples of such scenarios include video compression and the compression of light-

fields. More generally, the compression problem in these examples can be seen as rep-

resenting and compressing samples of the plenoptic function [74]. The 7-D plenoptic

function (POF) describes the light intensity passing through every viewpoint, in every

direction, for all times, and for every wavelength. Thus, the samples of the plenoptic

function can be used to reconstruct a view of reality at the decoder. The POF is usually

denoted by POF (x, y, z,φ,ϕ, t,λ), where (x, y, z) represents a point in 3-D space, (φ,ϕ)

characterizes the direction of the light rays, t denotes time, and λ denotes the wave-

length of the light rays. The POF is usually parametrized in order to reduce its number

of dimensions. This is common in image based rendering [75, 76]. Examples of POF

parameterizations include digital video, the lightfield and lumigraph [30, 31], concentric

mosaics [77], and the surface plenoptic function [28].

74

Regardless of the parametrization, due to the large size of the data set, compression

is essential. Given a parametrization, a typical scenario involves a camera traversing

the domain of the POF and acquiring its samples to be compressed and then stored for

later rendering (see Figure 1.1). The information to be compressed is thus POF (W (t), t)

where the trajectory W (t) collectively represents a sequence of positions and angles where

light rays are acquired. In such context, it is crucial to know the compression limits and

how the parameters involved influence such limits. This would provide a benchmark to

assess compression schemes for such data set.

4.1.2 Prior art

The practical aspects of compressing video and other examples of the plenoptic func-

tion have been studied extensively (see e.g., [28, 78], and references therein). But very

little has been done in terms of rate-distortion analysis and addressing the general ques-

tion of how many bits are needed to code such a source. Due to the complexity inherent to

visual data, the source is difficult to model statistically. As a result, precise information

rates are difficult to obtain. Often, one obtains the rate-distortion behavior resulting from

a particular coding method, such as the hybrid coder used in video. For instance, Girod

in [79] analyzes the rate-distortion performance of hybrid coders using a Gauss-Markov

model for the video sequence as well as for the prediction error that is transmitted as

side information. A similar rate-distortion analysis for light-field compression is done in

[80]. Such models are interesting but they work with the assumption of predictive coding

from the start, thus being somewhat constrained. The compression of the POF is also

studied in [81], but in a distributed setting. Using piece-wise smooth models, the authors

derive operational rate-distortion bounds based on a parametric sampling model.

4.1.3 Chapter contributions

The general problem can be posed as shown in Figure 4.1. There is a physical world

or “reality” (e.g., scenes, objects, moving objects), and a camera that generates a “view

75

of reality” V . This “view of reality” (e.g., a video sequence) is coded with a source coder

with memory M giving an average rate of R bits. This bitstream is decoded with a

decoder with memory M to reconstruct a view of reality V close to the original one in

the MSE sense. We refer to memory and rate in a loose sense. Precise definitions of

memory and rate are given in Section 4.2.1.

In this chapter we propose a simplified stochastic model for the plenoptic function

that bears the elements of the general case. Within our model we distinguish between

two cases:

1. The video coding case. We take the viewpoint that video can be seen as

a 3-D slice of the POF. Our approach is to come up with a statistical model for

video data generation, and within that model establish information rate bounds.

We first propose a model in which the background scene is drawn randomly at time

0, but otherwise does not change as time progresses. Within this “static reality”

model we develop information rates for the lossless and lossy cases. Furthermore, we

compute the conditional information rate that provides a coding limit when memory

resources are constrained. We then extend the model to account for background

scene changes. We then propose a “dynamic reality” that is based on a Markov

random field. We compute bounds on the information rates. For the Gaussian

case, we compute lower and upper bounds that are tight in the high SNR regime.

Examples validating our theoretical findings are presented.

2. The “recording reality” case. In this case, the samples of the POF are coded

and sent with the aim of reproducing the underlying scene, and not the sequence of

MSE+

VIEW OF VIEW OF

REALITY REALITY

V V

ENCODER DECODER

WITH WITH

MEMORY M MEMORY M

CHANNEL

RATE R

“Reality”

Figure 4.1 The problem under consideration. There is a world and a camera thatproduces a “view of reality” that needs to be coded with finite or infinite memory.

76

snapshots taken over time. This is similar to the case one finds in light field coding

where the image samples resulting from a random trajectory in the camera array

are sent to the decoder with the aim of reconstructing the underlying scene (see

[80, 82]). We show, in this case, that the information rate of the resulting source is

the same as the original stationary source representing the visual reality. We also

propose a simple code that within our model, attains the given information rate.

The chapter is organized as follows. Section 4.2 sets up the problem and introduces

notation. The video coding problem is treated in Sections 4.3-4.4. We present results

for the static reality case in Section 4.3, and treat the dynamic case in Section 4.4. In

Section 4.5 we present results for the “recording reality” case. Concluding remarks are

made in Section 4.6.

4.2 Definitions and Problem Setup

We describe a simplified model for the process displayed in Figure 4.1. Consider a

camera moving according to a Bernoulli random walk. The random walk is defined as

follows:

Definition 2 The Bernoulli random walk is the process W = (Wt : t ∈ Z+) such that

Pr {W0 = 0} = 1 and for t ≥ 1,

Wt =t∑

i=1

Ni,

where {Ni} are drawn i.i.d. from the set {−1, 1} with probability distribution Pr{Ni =

1} = pW .

We assume without loss of generality that pW ≤ 0.5.

In front of the camera there is an infinite wall that represents a scene that is projected

onto a screen in front of the camera path (i.e., we ignore occlusion). The wall is modelled

as a 1-D strip “painted” with an i.i.d. process X = (Xn : n ∈ Z) that is independent of

the random walk W . The process X follows some probability distribution pX drawing

77

values from an alphabet X . Here we focus on the rather unrealistic i.i.d. case due to its

simplicity. Generalization to stationary process is left for future work. In the static case,

the wall process X is drawn at t = 0. Figure 4.2 (a) illustrates the proposed model.

Wall

Wt

· · · · · ·X0 X1 X2 X3

Camera Position

Image(t)V0 := (X0, X1, X2, V3)

(a)

V0

V1

V2

V3

V4

V5

V6

Wall process X

Wt

...

· · · · · ·

(b)

Figure 4.2 A stochastic model for video. (a) Simplified model. (b) The resulting vectorprocess V . Each sample of the vector process is a block of L samples from the processX taken at the position indicated by the random walk Wt. In the figure L = 4.

At each random walk time step, the camera sees a block of L samples from the infinite

wall, where L ≥ 1. This results in a vector process V = (Vt : t ∈ Z+) indexed by the

random walk positions, as defined below.

Definition 3 Let W be a random walk independent of X, and let L be an integer greater

than one. The vector process V = (Vt : t ∈ Z+) is defined as

Vt := (XWt, XWt+1, · · · , XWt+L−1). (4.1)

The random walk is a simple stochastic model for an ensemble of camera movements.

It includes camera panning as a special case, i.e., when pW = 0. Notice that consecutive

samples of the vector process, which are vectors of length L, have at least L− 1 samples

that are repeated. Furthermore, because the process X is i.i.d., it follows that the vector

process V is stationary and mean-ergodic. Figure 4.2 (b) illustrates the vector process

V .

78

4.2.1 The video coding problem

Given the vector process V = (V0, V1, · · · ), the coding problem consists in finding an

encoder/decoder pair that is able to describe and reproduce the process V at the decoder

using no more than R bits per vector sample. The decoder reproduces the vector process

V = (V0, V1, · · · ) with some delay. The reproduction can be lossless or lossy with fidelity

D. The encoder encodes each sample Vt based on the observation of M previous vector

samples Vt−1, . . . , Vt−M . Thus, M is the memory of the encoder/decoder. Since encoding

is done jointly, there is a delay incurred. The lossless and lossy information rates of the

process V provide the minimum rate needed to either perfectly reproduce the process

V at the decoder, or to reproduce it within distortion D, respectively. The information

rate (lossless or lossy) is usually only achievable at the expense of infinite memory and

delay [83].

4.2.2 Properties of the random walk

The following notions are needed in what follows.

Definition 4 Let W be a random walk. The set of recurrent paths of length t is the

event set

Rt := {(W0, W1, . . . , Wt) : Wt = Ws for some s, 0 ≤ s < t}.

If a path belongs to Rt, we call it a recurrent path. We call Pr {Rt} the probability of

recurrence at step t.

The probability of the complementary set Pr{Rt}

is called the first-passage proba-

bility. When a site Wt has not occurred before, we refer to it as a new site. A related

quantity is the probability of return.

Definition 5 Let W be a random walk, and let t > s ≥ 0. Consider the event set

T ts := {(W0, W1, . . . , Wt) : Wt = Ws but Wt *= Wi, for any i such that s < i < t}.

We call Pr {T ts } the probability of return at step t after step s.

79

When s = 0 we write T t for T t0 . From Definitions 4 and 5 one can check that

Rt =t⋃

i=1

T tt−i, (4.2)

where the union is a disjoint one. Furthermore, the sets T ts are shift invariant in the

sense that

Pr{T t

s

}= Pr

{T t−s

}. (4.3)

Combining (4.2) and (4.3), we also have that

Pr{Rt}

=t∑

i=0

Pr{T t

t−i

}=

t∑

i=0

Pr{T i}

. (4.4)

In addition to the above, for the case of the Bernoulli random walk we have the

following [84, 85].

Lemma 2 For the Bernoulli random walk with pW ≤ 1/2, the following holds:

(i) limt→∞ P (Rt) = 1 − 2pW .

(ii) For t > 0, Pr {T 2t−1} = 0, and Pr {T 2t} = 2Ct−1 ((1 − pW )pW )t, where Ct :=

1t+1

(2tt

).

4.3 Information Rates for a Static Reality

4.3.1 Lossless information rates for discrete memoryless wall

Denote V t = (V1, . . . , Vt). We assume that V0 is known to the decoder. Unless

otherwise specified, we assume that X takes value on a finite alphabet X . We seek to

quantify the entropy rate of V [86]:

H(V ) = limt→∞

1

tH(V t)

= limt→∞

H(Vt|V t−1). (4.5)

To characterize H(V ), we describe intuitively an upper and a lower bound (resp. sufficient

and necessary rates) that will be formalized in Theorem 1 below. For a sufficient rate,

80

note that V can be reproduced up to time t when both the trajectory W t = (W1, . . . , Wt)

and the samples of the wall occurring at the new sites of W t are available. When t is

large, this amounts to H(W t) = tH(pW ) bits for the trajectory, plus tPr{Rt}

H(X) ≈

t(1−2pW )H(X) for the new sites. So, a sufficient average rate is H(pW )+(1−2pW )H(X).

Moreover, the complexity of V is at least the complexity of the new sites, and so (1 −

2pW )H(X) is a necessary rate. This intuitive lower bound can be improved by examining

the probability of correctly inferring the random path W t from observing the vector

process V t. This probability is related to the following event:

AL := { (X0, . . . , XL) = (x0, x1, x0, x1, . . .), x0, x1 ∈ X}. (4.6)

To see this, let L = 4 and consider inferring W1 from the observation of (V0, V1). If

V0 = (x0, x1, x0, x1) and V1 = (x1, x0, x1, x0), then it follows that W1 cannot be unam-

biguously determined from (V0, V1). Intuitively, if W t can be determined from V t, then

the complexity of the trajectory is embedded in V t and thus has to be fully described.

If, however, there is ambiguity on W t, then sets of W t that are consistent with V t can

be indexed and coded with a lower rate. We are now ready to state and prove Theorem

1.

Theorem 1 Consider the vector process V consisting of L-tuples generated by a Bernoulli

random walk with transition probability pW ≤ 1/2, and a wall process X, drawing val-

ues i.i.d. on a finite alphabet, and that has entropy H(X). The conditional entropy

H(Vt|V t−1) obeys

Pr{Rt}

H(X)+ H(pW )Pr{AL

}≤ H(Vt|V t−1) ≤ 1

t

t∑

i=1

Pr{Ri}

H(X)+ H(pW ), (4.7)

where AL is as in (4.6). The entropy rate H(V ) satisfies

(1 − 2pW )H(X) + H(pW )Pr{AL

}≤ H(V ) ≤ (1 − 2pW )H(X) + H(pW ). (4.8)

81

Proof. For each t we have

H(Vt|V t−1)(a)

≤ 1

t

t∑

i=1

H(Vi|V i−1) =H(V t)

t

(b)

≤ H(V t) + H(W t|V t)

t(4.9)

=H(W t) + H(V t|W t)

t(4.10)

=H(W t) +

∑ti=1 H(Vi|V i−1, W t)

t

(c)= H(pW ) +

1

t

t∑

i=1

H(Vi|V i−1, W i), (4.11)

where (a) follows because H(Vt|V t−1) decreases with t, (b) holds because H(W t|V t) ≥ 0,

and (c) is true because H(W t) = tH(pW ) and (Wi+1, . . . , Wt) is independent of (V i, W i).

Further, it is true that

H(Vi|V i−1, W i = wi, wi is recurrent) = 0.

H(Vi|V i−1, W i = wi, wi is not recurrent) = H(X).

Consequently,

H(Vi|V i−1, W i) =∑

wi∈Ri

Pr{W i = wi

}H(Vi|V i−1, W i = wi)

= Pr{Ri}

H(X). (4.12)

Combining (4.9) and (4.12) gives the upper bound in (4.7). We now turn to the lower

bound. Using the chain rule for mutual information and the information inequality, we

have

H(Vt|V t−1) = H(Vt|V t−1, W t) + I(W t; Vt|V t−1)

= H(Vt|V t−1, W t) + I(W t−1; Vt|V t−1) + I(Wt; Vt|V t−1, W t−1)

≥ H(Vt|V t−1, W t) + I(Wt; Vt|V t−1, W t−1). (4.13)

82

Moreover, because the random walk increment Wt−Wt−1 is independent of (V t−1, W t−1),

it follows that

I(Wt; Vt|V t−1, W t−1) = H(Wt|V t−1, W t−1) − H(Wt|V t, W t−1)

= H(pW ) − H(Wt|V t, W t−1). (4.14)

We proceed by finding an upper bound for H(Wt|V t, W t−1). If (vt, wt−1) is such that Wt

can be inferred with with probability one from (vt, wt−1), then the conditional entropy

is zero. Otherwise, if (vt, wt−1) is such that Wt cannot be inferred with probability one,

then the conditional entropy is at most H(pW ). Thus, denote by At the set of (vt, wt−1)

such that Wt cannot be inferred from (vt, wt−1) with probability one. We have:

H(Wt|V t, W t−1) =∑

(vt,wt−1)

Pr{wt−1,vt

}H(Wt|V t = vt, W t−1 = wt−1

)

≤ H(pW ) Pr{

(wt−1,vt) ∈ At

}.

The event set on the right-hand side above is contained in the event set {Vt−1 =

(x0, x1, . . .), Vt = (x1, x0, . . .)}. By conditioning on (Wt−1, W t), it follows that the prob-

ability of this event is Pr {AL}, where AL is as in (4.6). So the right-hand side above

is upper-bounded by H(pW ) Pr {AL} . Combining this with (4.13 - 4.14) and (4.12), we

assert the lower bound in (4.7). By letting t → ∞ in (4.7) and using Lemma 2 (i) we

obtain (4.8). !

Remark 7 The upper bound of Theorem 1 contains slack. One trivial example is when

the entropy of the process X is 0. In such a case the bound reduces to H(pW ), which is

clearly loose given that the vector process V has zero entropy in this case.

Remark 8 The recurrences of the random walk have the effect of reducing the entropy

of the process. In particular, for a random walk with pW = 1/2, the entropy rate of the

vector process reduces to that of the random walk.

Remark 9 The size of the conditional entropy H(Wt|V t) determines the amount of slack

in the bounds (see (4.9)). Such entropy depends, among other things, on the size of the

alphabet of the process V and on the block length L, as the next example illustrates.

83

Theorem 1 shows that, under some conditions, optimal encoding in the information-

theoretic sense can be attained by extracting and optimally coding the trajectory W t,

and optimally coding the spatial innovations in the vector samples V t.

Example 5 Suppose that the X is uniformly distributed over |X | values. Then, it is

easily seen that

Pr {AL} =1

|X |L−2.

Consequently, the difference between upper and lower bounds in (4.7) decays exponentially

fast when the block length L → ∞. For fixed L, the difference also decays as |X | increases.

Thus, for L and |X | sufficient large, we have that Pr {AL} ≈ 0, and we can approximate

the entropy rate as

H(V ) ≈ (1 − 2pW ) log |X | + H(pW )

bits per block. Figure 4.3 illustrates the bounds when X is Bern(1/2), and L = 8.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.95

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

pW

Entro

py [b

its/v

ecto

r sam

ple]

Upper boundLower bound

(a)

Figure 4.3 Bounds on information rate. (a) Lower and upper bounds as a function ofpW for the binary wall with pX = 1/2 and L = 8.

84

4.3.2 Memory constrained coding

From source-coding theory, the entropy rate H(V ) can be attained with an encoder-

decoder pair with unbounded memory and delay. In the finite memory case, often the

encoder has to code Vt based on the observation of Vt−1, . . . , Vt−M , and the decoder

proceeds accordingly. This situation is similar to one encountered in video compression,

where a frame at time t is coded based on M previously coded frames [78]. In this case, the

average code-length is bounded below by the conditional entropy H(Vt|Vt−1, . . . , Vt−M) =

H(VM |VM−1, . . . , V0). The bound (4.7) in Theorem 1 describes the behavior of the condi-

tional entropy H(VM |V M−1). Intuitively, by looking at the stored samples from t−M up

to t, the encoder can separately code Wt and take advantage of recurrences existent from

t−M to t− 1. In effect, finite memory prevents the encoder to exploit long term recur-

rences that are not visible in the memory. Similar observations are verified in practice

for instance in [87–89].

Figure 4.4 illustrates how memory influences coding when X is uniform over an al-

phabet of size |X | = 256. The curves are computed using the upper bound in (4.7).

Because the alphabet size is large, the bound is approximately tight. In the most re-

current case with pW = 0.5, the conditional entropy approaches the entropy rate at a

slower rate when M → ∞ [see (4.7-4.8)]. Furthermore, as M approaches infinity, there

is a significant reduction in the conditional entropy. For instance, an encoder that uses 1

frame in the past with optimal coding would need about twice as many bits as one that

uses 4 frames. By contrast, when pW = 0.1, because longer term recurrences are rare,

moderate values of M are already enough to attain the limiting rate. As a result there

is little to gain by increasing M .

The observations drawn from Figure 4.4 are also verified in practice for instance in

[87, 88, 90]. Finally, we point out that the issue of exploiting long term recurrences dates

back to Ziv-Lempel [91] in lossless compression. Extension of the Lempel-Ziv algorithm

to the lossy case is also discussed in [92].

85

100 101 1020

1

2

3

4

5

6

7

M

Entro

py D

iffer

ence

pW=0.5pW=0.4pW=0.3pW=0.2pW=0.1

HM −H∞

(a)

Figure 4.4 Memory constrained coding. Difference H(V )−H(VM |V M) as a function ofM . When pW = 0.5, the bit rate can be lowered significantly at the cost of large memory.A moderate bit rate reduction is obtained with small values of M when pW = 0.1. Thecurves are computed using Theorem 1 for X uniform over an alphabet of size 256.

4.3.3 Lossy information rates

In this section we assume again that the process X is i.i.d. and that Xn takes values

over a finite alphabet X . Information rates for the lossy case take the form of a rate

distortion function. Consider a t-tuple (V1, . . . , Vt) where each Vj is a random vector

taking values in X L. A reproduction t-tuple is denoted by (V1, . . . , Vt), and its entries

take values on a reproduction alphabet X . A distortion metric is defined as follows:

d(V t, V t) =1

tL

t∑

i=1

ds(Vi, Vi),

where ds : X × X → R+ is a distortion metric for an L-dimensional vector. For example,

for the MSE metric we have d(Vi, Vi) = ‖Vi − Vi‖2.

86

The rate-distortion function for each t, and for given distortion metric, is written as

RV t(D) = infEd(V t,V t)≤D

I(V t; V t)

t, (4.15)

where the infimum of the normalized mutual information I(V t;V t)t is taken over all joint

probability distributions of (V1, . . . , Vt) and (V1, . . . , Vt) such that Ed(V t, V t) ≤ D.

The rate-distortion function for the process V = (V1, V2, . . .) is given by [83]

RV (D) = limt→∞

RV t(D). (4.16)

Because the process V is stationary, it can be shown that the above limit always exists

(see [83, p. 270], or [93]).

By coding the side information W t separately, an upper bound for RV (D) similar to

Theorem 1 can be developed. The upper bound is based on the notion of conditional

rate-distortion [83, 94]. This notion is developed in the lemma below.

Lemma 3 (Gray [94]) Let V be a random vector taking values in X and let W be another

random variable. Define the conditional rate-distortion:

RV |W (D) = infEd(V,V )≤D

I(V ; V |W ), (4.17)

where the infimum is taken over all conditional joint distributions of V and V given W .

The conditional rate-distortion obeys

RV |W (D) ≤ RV (D) ≤ RV |W (D) + I(V ; W ). (4.18)

The conditional rate-distortion of V t conditional on W t is defined as follows:

RV t|W t(D) = infEd(V t,V t)≤D

I(V t; V t|W t)

t, (4.19)

where the infimum is taken over all joint probability distributions of V t and V t conditional

on W t. The conditional rate-distortion can be bounded in terms of the rate-distortion

function of the process X.

87

Proposition 11 The conditional rate-distortion function satisfies

lim supt→∞

RV t|W t(D) ≤ (1 − 2pW )RX(D). (4.20)

Proof. Let λ(wt) denote the number of new sites from the path wt. Then, conditional on

wt, the V t has only λ(wt) entries that need to be encoded. For each (wt, vt), let fwt(vt)

denote the vector with the λ(wt) entries of vt to be coded. Moreover, let V and V be

such that E|Vi[j] − Vi[j]|2 ≤ D for i = 0, . . . , t, and j = 0, . . . , L − 1. We have

I(V t; V t|W t) =∑

wt

Pr{W t = wt

}I(V t; V t|W t = wt)

≥∑

wt

Pr{W t = wt

}I(fwt(V t); fwt(V t)|W t = wt)

≥∑

wt

Pr{W t = wt

}λ(wt)RX(D)

= Eλ(W t)RX(D), (4.21)

where we have used the inequality I(X; Y ) ≥ I(f(X); g(Y )) for measurable functions

f, g [95], and the fact that the process X is i.i.d. and independent of W t, and that the

individual distortions are less than D. The lower bound can be achieved as follows. Let

p∗(X|X) be the test channel that attains RX(D). We let Xn be the result of passing Xn

though the channel p∗(Xn|Xn). For each given wt we construct V t from X and wt. This

results in a joint conditional distribution that attains the lower bound (4.21).

Because the lower bound is attainable, it follows that

RV t|W t(D) ≤ Eλ(W t)

tRX(D).

Moreover, using Lemma 2 it is straightforward to check that t−1Eλ(W t) converges to

(1 − 2pW ), which concludes the proof. !

The above proposition enables us to derive an upper bound for the rate-distortion

function.

88

Theorem 2 Consider the i.i.d. process X such that Xn takes values over a finite alphabet

X . Let RX(D) denote its rate-distortion function. The rate-distortion function of the

process V satisfies

RV (D) ≤ H(pW ) + (1 − 2pW )RX(D). (4.22)

Proof. Using Lemma 3 we have the following bound based on the conditional rate-

distortion function [94]:

RV t|W t(D) ≤ RV t(D) ≤ RV t|W t(D) +1

tI(V t; W t)

≤ RV t|W t(D) + H(pW ).

Letting t → ∞ and using Proposition 11 asserts (4.22). !

Remark 10 Because the alphabet is finite, if the reproduction alphabet X is a superset

of the original alphabet X , then we have that for each t, RV t(D) converges to t−1H(V t)

as D → 0 [83]. Consequently, for large alphabet sizes and large block length, the entropy

rate bound of (4.22) is sharp, and so the above bound on the rate-distortion is also sharp

for small distortion values.

Theorem 2 shows that in the low distortion regime, optimal encoding in the information-

theoretic sense can be attained by extracting and coding W t losslessly, and using the re-

maining bits to optimally code the vector samples corresponding to spatial innovations.

4.4 Information Rates for Dynamic Reality

The model in the previous section assumes a “static background.” More precisely, the

infinite wall process X is drawn at time 0 and does not change after that. In practice,

however, scene background changes with time and a suitable model would have to account

for those changes. New information comes fundamentally in two forms: the first consists

of information that is “seen” by the camera for the first time, while the second consists of

89

changes to old information (e.g., changes in the background). In this section, we propose

a model that accounts for both these sources of new information.

To develop a model for scenes that change over time, we model X as a 2-D random

field indexed by (n, t) ∈ Z × Z+. A simple yet rich model for the field is that of a first

order Markov model over time, and i.i.d. in space. The random field is defined as follows:

Definition 6 The random field is the field RF = {X(t)n : (n, t) ∈ Z × Z+}, such that

(X(0)n : n ∈ Z) is i.i.d. and for each n ∈ Z, the process (X(t)

n : t ∈ Z) is a first order

time-homogeneous Markov process.

The fact that the random field (X(0)n : n ∈ Z) is i.i.d. simplifies calculations con-

siderably. One justification for this model is when the field is Gaussian. In such case,

independence is attained by a simple linear transformation of the process (X(0)n : n ∈ Z).

It can be shown that such transformation preserves the Markovianity on the time dimen-

sion, and the i.i.d. assumption can be justified in this case.

Throughout this section, we assume that the Markov chain of the vector process is

already in steady-state. This assumption is common, for example, in calculating rate-

distortion functions for Gaussian processes with memory [83].

The dynamic vector process V is defined similarly to the static case, but now taking

snapshots or vectors from the random field:

Definition 7 Let RF = {X(t)n : (n, t) ∈ Z × Z+} be a random field, and let W be a

random walk. The dynamic vector process is the process V = (Vt : t ∈ Z+) such that for

each t > 0,

Vt = (X(t)Wt

, X(t)Wt+1, . . . , X

(t)Wt+L−1).

The random field and the corresponding vector process are illustrated in Figure 4.5.

4.4.1 Lossless information rates

In the development that follows we assume, for simplicity, that the random field takes

values on a finite alphabet X . The results can equally be developed for a random field

taking values over R, under suitable technical conditions.

90

t

n

· · · · · ·

· · · · · ·...

...

X(t)n−1 X

(t)n X

(t)n+1

X(t+1)n−1 X

(t+1)n X

(t+1)n+1

(a)

t

n

...

X(0)0 X

(0)1 X

(0)2

X(1)1 X

(1)2 X

(1)3

X(2)0 X

(2)1 X

(2)2

(b)

Figure 4.5 A model for the dynamic reality. (a) It entails a random field that is Markovin the time dimension t, and i.i.d. in the spatial dimension n. (b) Motion then occurswithin this random field.

To derive bounds for H(V ) in the dynamic reality case, we compute the following

conditional entropy rate:

H(V |W ) := limt→∞

H(Vt|V t−1, W t), (4.23)

if the limit exists. As we shall see in the examples that follow, the above limit can be

computed analytically. The key is to compute H(Vt|V t−1, W t = wt) by splitting the set of

all paths into recurrent and nonrecurrent paths, and further splitting the set of recurrent

paths according to (4.2).

Referring to Figure 4.5(b), let wt be a given path and consider the process V t. Note

that each Vt has L − 1 entries from the same spatial location as L − 1 entries from

Vt−1. The remaining entry corresponds to either a nonrecurrent or a recurrent location

depending on wt. If wt is nonrecurrent, then by the Markov property of the field, we

have

H(Vt|V t−1, W t = wt) = H(X(t)0 ) + (L − 1)H(X(t)

0 |X(t−1)0 ).

If a path is recurrent at t, then there is an s < t such that ws = wt but wt *= wi,

for s < i < t. Using the Markov property again, it follows that H(Vt|V t−1, W t = wt) =

91

H(X(t)0 |X(s)

0 )+(L−1)H(X(t)0 |X(t−1)

0 ). The above argument is explicitly written as follows:

H(Vt|V t−1, W t) =∑

wt∈Rt

H(Vt|V t−1, W t = wt)Pr{W t = wt

}

+∑

wt∈Rt


}

=(H(X(t)

0 ) + (L − 1)H(X(t)0 |X(t−1)

0 ))

Pr{Rt}

+

(t/2)∑

i=1

∑

wt∈T tt−2i


}

=(H(X(t)

0 ) + (L − 1)H(X(t)0 |X(t−1)

0 ))

Pr{Rt}

+(t/2)∑

i=1

[(H(X(t)

0 |X(t−2i)0 ) + (L − 1)H(X(t)

0 |X(t−1)0 )

)Pr

{T t

t−2i

}

= (L − 1)H(X(t)0 |X(t−1)

0 ) + H(X(t)0 )Pr

{Rt}

+(t/2)∑

i=1

H(X(2i)0 |X(0)

0 )Pr{T 2i

0

}.

By letting t → ∞ using Lemma 2 (i) leads to

H(V |W ) = H(X(∞)0 )(1−2pW )+(L−1)H(X(1)

0 |X(0)0 )+

∞∑

i=1

H(X(i)0 |X(0)

0 )Pr{T i}

, (4.24)

where Pr {T i} is the probability of return given in Lemma 2 (ii). The infinite sum in the

left-hand side of (4.24) is well-defined. It is an infinite sum of positive numbers, and it

is bounded above by H(X(∞)0 )

∑∞i=1 Pr {T i} = H(X(∞))2pW .

With the conditional entropy rate in (4.24) we can derive lower and upper bounds

on the entropy rate H(V ). To derive an upper bound, we bound H(V t)t for each t and

let t → ∞. For the lower bound, similar to Section 4.3, we bound H(Vt|V t−1) below.

Because the alphabet X is finite and the process is stationary, the limits of H(V t)t and

H(Vt|V t−1) as t → ∞ coincide.

The upper bound is obtained from the inequality H(V t) ≤ tH(pW )+H(V t|W t). Note

that H(V t|W t) =∑t

i=1 H(Vi|V i−1, W t), so that if H(Vi|V i−1, W t) converges to a limit

as t → ∞, we have necessarily that t−1H(V t|W t) converges to the same limit (see e.g.,

92

[86, p. 64]). So,

limt→∞

H(V t)

t≤ H(pW ) + lim

t→∞

H(V t|W t)

t

= H(pW ) + limt→∞

H(Vt|V t−1, W t)︸︷︷︸

H(V |W )

.

To derive a lower bound, note that the development leading to (4.13-4.14) for the

static case also holds for the dynamic case. So, we have

H(V t|V t−1) ≥ H(pW ) + H(Vt|V t−1, W t) − H(Wt|V t, W t−1). (4.25)

Thus, a lower bound is obtained by finding an upper bound for H(Wt|W t−1, V t). Be-

cause the process X changes at each time step, we cannot use the event AL to obtain

an upper bound for H(Wt|W t−1, V t) as in the static case. A useful upper bound for

H(Wt|W t−1, V t) is obtained by using Fano’s inequality. Let Pe denote the probability of

error in estimating Wt based on observing Yt := (Vt, Vt−1, Wt−1), i.e.,

Pe = Pr{

W (Yt) *= Wt

},

where W (·) is a given estimator assumed to be the same for all t. Since Wt−1 is observed,

estimating Wt amounts to estimating the increment Nt = Wt − Wt−1. Because V is

stationary and Nt is i.i.d., it follows that Pe does not depend on t. From Fano’s inequality,

we have that

H(Wt|V t, W t−1) ≤ H(Nt|Yt)

≤ H(Pe) + Pe log2(1)

= H(Pe). (4.26)

Consequently, a lower bound is obtained by combining (4.25) with (4.26) above.1 By

letting t → ∞ we arrive at the following:

Theorem 3 Consider the vector process V consisting of L-tuples generated by a Bernoulli

random walk with transition probability pW with pW ≤ 1/2, and the random field RF =

1Sharper lower bounds can be obtained by estimating Nt using (V t, W t−1). However, the estimateusing Yt is easily computed and already leads to a sharp enough bound.

93

{X(t)n : (n, t) ∈ Z × Z+} that is i.i.d. on the n dimension and first-order Markov on the

t dimension. The entropy rate of the process V obeys

H(pW ) + H(V |W ) − H(Pe) ≤ H(V ) ≤ H(pW ) + H(V |W ), (4.27)

where H(V |W ) is as in (4.24), and Pe is the probability of error in estimating W1 based

on the observation of Y1 = (V1, V0, W0) with any estimator W (Y1).

The lower and upper bounds become sharp when Pe → 0. This occurs with large block

sizes and for small changes in the background. The examples that follow illustrate the

sharpness of the above bounds. In the first example, we consider a binary process X,

and on the second a Gaussian process with AR(1) temporal innovations.B

SC

BSC

BSC

. . .

. . . . . .

. . .

. . .. . .

...

...

X(t)n−1 X

(t)n X

(t)n+1

X(t+1)n−1 X

(t+1)n X

(t+1)n+1

Figure 4.6 The binary random field. Innovations are in the form of bit flips caused bybinary symmetric channels between consecutive time instants.

Example 6 BSC innovations

Suppose that at t = 0, the process is a strip of bits that are i.i.d. Bernoulli with initial

distribution pX. Suppose that from t to t + 1 there is a nonzero probability pI that the

bit X(t)n is flipped. This amounts to a binary symmetric channel (BSC) between X(t)

n

and X(t+1)n , as illustrated in Figure 4.6. The t BSCs in series between X(0)

n and X(t)n are

equivalent to a single BSC with transition probability (see [86, p. 221], problem 8)

pI,t = 0.5(1 − (1 − 2pI)

t). (4.28)

94

Note that for pI > 0, we have that limt→∞ pI,t = 0.5. So, for each n, the distribution of

X(t)n converges to the stationary distribution Bern(0.5). Substituting in (4.24) gives for

pI > 0:

H(V |W ) = H(1

2)(1 − 2pW ) + (L − 1)H(pI) +

∞∑

i=1

H(pI,2i)Pr{T 2i

0

}. (4.29)

Notice that when pI = 0 we recover the static case. By using the above in (4.27) we obtain

the corresponding bounds. Figure 4.7 (a) illustrates the lower and upper bounds for L = 8

and pX = 0.5. We compute the bounds using (4.24) and (4.27), where we truncate the

infinite sum in (4.24) at a very large t. The probability Pe is computed through Monte

Carlo simulation using a simple Hamming distance detector. The bounds are surprisingly

robust in this case, and provide good approximation of the true entropy rate. Notice that

when pI increases, the entropy rate of the recurrent case (pW = 0.5) crosses that of the

panning case (pW = 0.05). This is because in the recurrent case a greater amount of bits

is spent coding the innovations.

Figure 4.7 (b) shows the contour plots of the upper bound for various pairs (pI , pW ).

The plot shows how the two innovations are combined to generate a given entropy value.

Notice that when pW approaches 12 , the entropy of the trajectory becomes significant and

it compensates for the lesser amount of spatial innovation.

To measure the effect of memory in the dynamic case, we evaluate the upper bound on

the conditional entropy rate (as in (4.11)), and the upper bound on the true entropy rate

given by (3). Figure 4.8 illustrates the difference between the conditional entropy upper

bound and the true entropy upper bound. The curves are similar to the ones obtained in

the static case with spatial innovation (Figure 4.4), and confirm the very intuitive fact

that memory is less useful when the scene around changes rapidly.

Example 7 AR(1) Innovations.

Although the development leading to Theorem 3 was made for finite alphabets, the

same calculation can be done for a random field taking values on R, provided it has abso-

lutely continuous joint densities. In this case, the entropies involved become differential

95

10−4 10−3 10−20.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

pI

Entro

py R

ate

[bits

/vec

tor s

ampl

e]

Upper boundLower bound

pW = 0.05

pW = 0.5

(a)

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

1

2

3

4

5

6

7

8

9

x 10−3

pW

p I

1.1509

1.2252

1.2252

1.2995

1.2995

1.2995

1.3737

1.37371.3737

1.3737

1.448

1.4481.448

1.448

1.5223

1.52231.5223

1.52231.5965

1.5965 1.5965

1.59651.6708

1.6708 1.6708

1.6708

1.7451

1.7451 1.7451

1.7451

1.81941.8194

1.8194

1.8936 1.8936

1.1509

(b)

Figure 4.7 The binary symmetric innovations. (a) The curves show the lower and upperbounds on the entropy rate. Notice that the bounds are sharp for various values of pI .(b) Contour plots of the upper bound for various pI and pW . The lines indicate pointsof similar entropy but with different amounts of spatial and temporal innovation.

entropies. For example, for each n ∈ Z and 0 < ρ < 1, let

X(t)n = ρX(t−1)

n + εt

96

5 10 15 20 25 30 35 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

M

Entro

py D

iffer

ence

[bits

/vec

tor s

ampl

e]

pe=0.0pe=0.001pe=0.01pe=0.05

Figure 4.8 Memory and Innovations. Shown is the difference between the conditionalentropy and the true entropy for the binary innovations with pX = 0.5, pW = 0.5, andL = 8. The curves show the intuitive fact that when the background changes too rapidly,there is little to be gained in bitrate by utilizing more memory.

for t ∈ Z+, where εt ∼ N(0, 1 − ρ2) i.i.d. and independent of X. Such a random field

model is used for instance in [96] for bit allocation over multiple frames. Let φ(σ2) denote

the differential entropy of a Gaussian density with variance σ2:

φ(σ2) :=1

2log2(2πeσ

2).

It is then easy to check that h(X(∞)1 ) = φ(1), and h(X(i)

1 |X(0)1 ) = φ(1 − ρ2i), so that

we obtain a lower and an upper bound on the differential entropy rate using Theorem 3.

The conditional differential entropy rate h(V |W ) is

h(V |W ) = φ(1)(1 − 2pW ) + (L − 1)φ(1 − ρ2) +∞∑

i=1

φ(1 − ρ4i

)Pr

{T 2i

0

}. (4.30)

The infinite sum on the right-hand side is well defined. Because 1 − ρ2k converges to 1

as k → ∞ we see that for any value of ρ in (−1, 1), the tail of the infinite sum is a sum

of positive numbers. Using (4.4) and Lemma 2 (i), we see that∑∞

i=1 Pr {T 2i0 } = 2pW .

97

Because φ(·) is concave, we can use Jensen’s inequality as follows:

∞∑

k=1

φ(1 − ρ4k)Pr{T 2k

0

}= 2pW

∞∑

k=1

φ(1 − ρ4k)Pr

{T 2k

0

}

2pW

≤ 2pWφ

(∞∑

k=1

(1 − ρ4k)Pr

{T 2k

0

}

2pW

)

.

Using Lemma 2 (ii) and the generating function for the Catalan numbers [97], one

can further check that

∞∑

k=1

(1 − ρ4k)Pr{T 2k

0

}= ((1 − 4(1 − pW )pWρ

4))1/2 − (1 − 2pW ),

so that the last term is controlled by

∞∑

k=1

φ(1 − ρ4k)Pr{T 2k

0

}≤ 2pWφ

((1 − 4(1 − pW )pWρ4)1/2 − (1 − 2pW )

2pW

). (4.31)

The above upper bound turns out to be a very good approximation of the infinite sum in

(4.30) when pW is close to 0, and when ρ is away from 1.

Notice that for L large and ρ close to 1, Pe and H(Pe) are small so that the bounds

in Theorem 3 are sharp. Figure 4.9 displays the bounds on the differential entropy as

a function of ρ. The bounds are computed following Theorem 3 and (4.30). Here Pe

is inferred via Monte Carlo simulation with 107 trials, and a minimum MSE detector

for Wt. The inferred Pe is so low that the lower and upper bounds practically coincide.

Analytical computation of Pe is a detection problem beyond the scope of this dissertation.

4.4.2 Lossy information rates for the AR(1) random field

Consider the AR(1) innovations of the previous example. Under the MSE distortion

metric it is possible to derive an upper bound to the lossy information rate. The key is

to compute RV t|W t(D) defined as in (4.19) and use the upper bound [94]:

RV t(D) ≤ H(pW ) + RV t|W t(D), (4.32)

98

10−3 10−2 10−1−20

−15

−10

−5

0

5

10

1−ρ2

Diff.

Ent

ropy

Rat

e [b

its/v

ecto

r sam

ple]

Upper Bound

Lower Bound

pW=0.05

pW=0.5

Figure 4.9 Differential entropy bounds for the Gaussian AR(1) case as function of theinnovation parameter ρ. In this example Pe is small enough that the lower and upperbounds practically coincide. Note that the slope of the differential entropy curve isinfluenced by the value of pW .

for each t > 0. The conditional rate-distortion satisfies the Shannon lower bound (SLB)

[83]:

RV t|W t(D) ≥ h(V t|W t)

t− Lφ(D). (4.33)

The key observation is that for a given fixed trajectory wt, the rate-distortion function

of V t is that of a Gaussian vector consisting of the samples of the random field covered

by W t. For a Gaussian vector, the SLB is tight when the per sample distortion is less

than the minimum eigenvalue of the covariance matrix (see [83, p. 111]). The next

proposition gives a condition under which (4.33) is tight, and thus when combined with

(4.32) provides an upper bound on the rate-distortion function.

Proposition 12 Consider the vector process V resulting from the Gaussian AR(1) ran-

dom field with correlation coefficient 0 < ρ < 1, and a Bernoulli random walk with prob-

ability pW ≤ 1/2. The Shannon lower bound for the conditional rate-distortion function

is tight whenever the distortion satisfies

0 < D <1 − ρ1 + ρ

. (4.34)

99

Proof. To assert the claim we rely on the following lemmas:

Lemma 4 Let X1, X2, . . . , Xm be a sequence of Gaussian vectors in Rd such that Xj ∼

N(0, Cj), and where each Cj has spectrum λ(Cj). Let W be a random variable indepen-

dent of X1, . . . , Xm such that Pr {W = j} = µj for j = 1, . . . , m. Consider the mixture

X =m∑

j=1

I{W=j}Xj.

Denote by RX|W (D) the conditional rate-distortion with per-sample MSE distortion D.

Then, if

D ≤ minm⋃

j=1

λ(Cj),

the conditional rate distortion function is

RX|W (D) =m∑

j=1

µjRXj(D).

Proof. Let p(X, X|W ) be such that d−1 E‖X − X‖2 ≤ D. Then,

I(X; X|W ) =m∑

j=1

µjI(X; X|W = j) (4.35)

≥m∑

j=1

µjRXj(Dj) (4.36)

with

d−1E‖X − X‖2 =

m∑

j=1

µjDj ≤ D,

and Dj := d−1 E(‖X − X‖2|W = j). The above is minimized when

R′Xj

(Dj) = θ,

where θ is some constant. Suppose D ≤ min⋃m

j=1 λ(Cj) and Dj = D. We have

RXj(Dj) =

1

d

d∑

p=1

1

2log2(

λj,p

D),

100

where λj,p are the eigenvalues of Cj and moreover R′Xj

(Dj) = −1/D so that conditions

for a minimum are satisfied. The lower bound can be attained by setting

p(X, X|W = j) = p∗j(X, X),

where p∗j(Xj , Xj) attains RXj(Dj). !

Lemma 5 ([98, p. 189]) Let A be a n×n Hemitian matrix, and let 1 ≤ m ≤ n. Let Am

denote a principal submatrix of A, obtained by deleting n−m rows and the corresponding

columns of A. Then, for each integer k such that 1 ≤ k ≤ m, we have

λk(A) ≤ λk(Ak), (4.37)

where λk(A) denotes the k-th largest eigenvalue of matrix A.

Proof of Proposition 12: The SLB for each t > 0 is given by

RV t|W t(D) ≥ h(V t|W t)

t− Lφ(D). (4.38)

Because

I(V t; V t|W t) =∑

W t=wt

Pr{W t = wt

}I(V t; V t|W t = wt),

in view of Lemma 4, it suffices to show that for each t > 0, and for 0 ≤ D ≤ 1−ρ1+ρ , the

boundI(V t; V t|wt)

t≥ h(V t|wt)

t− Lφ(D), for E(d(V t, V t)|wt) ≤ D

is achievable. Given W t = wt, the above bound is attainable if D is smaller than the

minimum eigenvalue of the covariance matrix of the random field samples covered by wt.

Denote this covariance by Cwt := Cov(V t|wt). Because the random field is independent

in the spatial dimension n, the spectrum of the covariance matrix is the disjoint union

of the spectra of the covariance matrices corresponding to the random field samples of

V t at similar location n. Each Cwt is a submatrix of the t× t Toeplitz matrix Tt(ρ) with

101

entries [Tt(ρ)]ij = ρ|i−j|. Since λmin(Tt(ρ)) decreases to (1 − ρ)/(1 + ρ) as t → ∞ [99],

by applying the Lemma above we conclude that

λmin(Cwt) ≥ λmin(Tt(ρ)) ≥1 − ρ1 + ρ

. (4.39)

Therefore, the bound (4.38) is achievable for each t and since the limit of RV t|W t(D)

exists it follows that the bound is achievable for t → ∞. !

Example 8 We simulate the AR(1) dynamic reality model. To compress the process

V t, we estimate the trajectory and send it as side information. With the trajectory at

hand, we encode the samples with DPCM, encoding the residual with entropy constrained

scalar quantization (ECSQ). We build two encoders. In the first one, prediction is done

utilizing only the previously encoded vector sample; in the second, all encoded samples

up to time t are available to the encoder (and decoder). Figure 4.10 illustrates the SNR

as a function of rate when the block-length L = 8. In Figure 4.10 (a) and (b) we have

ρ = 0.99 and the upper bound is valid for SNR greater than 23 dB. In Figure 4.10 (a),

we have pW = 0.5. Because the scene changes slowly and is highly recurrent, the infinite

memory encoder (M = ∞) is about 3.5 dB better than when M = 1. The same behavior

is not observed when the scene is not recurrent (panning case, pW = 0.1, Figure 4.10 (b)

), and when the background changes too rapidly (ρ = 0.9, Figure 4.10 (c)).

4.5 The Recording Reality Case

In some applications, not only the samples of the POF are available, but also the loca-

tion of those samples. Consider the vector process in Definition 3, where X is stationary

taking values on a finite alphabet. Suppose that the position information is available.

In this case, the encoder has access to (Vt, Wt) for each t. In such scenario, usually the

goal is not to reproduce (Vt, Wt) at the decoder, but rather to reconstruct the underlying

scene X. This coding problem is the one encountered for example in the compression of

lightfields (see e.g., [100]).

102

1 1.5 2 2.5 3 3.5 4 4.5 5

10

15

20

25

30

35

40

45

50

Rate [bits/sample]

SNR

[dB]

RUB(D), pW = 0.5, ρ = 0.99 , L=6DPCM & ECSQ, M=∞DPCM & ECSQ, M=1

(a)

1 1.5 2 2.5 3 3.5 4 4.5 5

10

15

20

25

30

35

40

45

50

Rate [bits/sample]

SNR

[dB]


(b)

1 1.5 2 2.5 3 3.5 4 4.5 5

10

15

20

25

30

35

40

Rate [bits/sample]

SNR

[dB]


(c)

Figure 4.10 Performance of DPCM with motion for various ρ and pW . For ρ = 0.99 andρ = 0.9 the upper bound is valid for SNR greater than 23 dB and 12.8 dB, respectively.(a) Memory provide considerable gains, pW = 0.5, ρ = 0.99. (b) Modest gains whenpW = 0.1. (c) Modest gains when ρ = 0.9, as background changes too rapidly.

Because the scene is assumed to be static, only the positions corresponding to new

states need to be transmitted. These states correspond to new visual information being

“seen” by a camera for the first time. It is thus natural to define the sequence of times

that the camera moves into a new location:2

t0 = 0

tj = min{t > tj−1 : Wt *= Wi, 0 ≤ i < tj−1}, for j > 0.

2In probability theory parlance, those are in fact stopping times for the standard filtration generatedby W .

103

The above sequence of stopping times enables us to define the corresponding process of

interest:

Z =(

Zj := (Vtj , Wtj) : j ∈ Z+). (4.40)

The process Z is the subvector process consisting of the samples that contain new spatial

information. We seek to characterize the entropy rate of Z, defined as

H(Z) = limj→∞

1

jH(Z0, Z1, . . . , Zj).

We show next that despite the randomness of the trajectory W , the entropy rate of

the process Z is that of the underlying process X. That is, even in the worst case of a

random trajectory, there is no increase in the entropy rate due to random camera motion.

Theorem 4 Consider the vector process V where the process X is stationary and takes

values on a finite alphabet X . Let Z =(

Zj := (Vtj , Wtj ) : j ∈ Z+), and denote by H(X)

the entropy rate of the process X. Then,

H(Z) = H(X).

Proof.

Denote Wj = Wtj and Vj = Vtj . Assume without loss of generality that V0 is known

to the decoder. Then, for each j,

H(Zj) = H(W j) + H(V j|W j) (4.41)

= H(W j) + H(X1, . . . , Xj). (4.42)

The last equality holds because conditional on W j , each entry of V j contains a single

new sample of X. So, it suffices to assert that

limj→∞

1

jH(W0, . . . , Wj) = 0.

Let Nj = Wj−1−Wj and denote Nj = Ntj . Then, there is a one-to-one correspondence

between W j and N j so that

H(W0, . . . , Wj) = H(N0, . . . , Nj).

104

To check that j−1 H(N0, . . . , Nj) converges to 0, it suffices to show that H(Nj|N j−10 ) =

H(Nj|Nj−1) converges to zero (see [86, p. 64]). This conditional entropy in turn depends

on the transitional probability Pr{Nj |Nj−1

}. The transition probabilities are similar

to the ones in the Gambler’s ruin calculation done for example in [85]. This gives for

pW < 1/2,

Pr{Nj = −1|Nj−1 = 1

}=αj − αj+1

1 − αj+1, and Pr

{Nj = 1|Nj−1 = −1

}=α−j − α−j−1

1 − α−j−1,

where α := pW

1−pW. For pW = 1/2, then

Pr{

Nj = −1|Nj−1 = 1}

= Pr{Nj = 1|Nj−1 = −1

}=

1

j + 1.

So, for 0 ≤ pW ≤ 1/2, both transition probabilities converge to zero when j goes

to infinity, and by continuity, we have that limj→∞ H(Nj|Nj−1) = 0, which asserts the

claim. !

Remark 11 Note that if the trajectory W is deterministic, then the result is obvious.

What Theorem 4 shows is that the same holds even in the case where the trajectory is

random.

Remark 12 The result is also very similar to the rate-distortion problem for remote

sources considered in [83]. In this case the random walk can be seen as a channel between

the wall and the encoder. However, the results in [83] deal with memoryless channels

only and thus are not directly applicable to the problem at hand.

To optimally code a random camera taking samples of the plenoptic function, one

needs an average rate of H(X) bits. Suppose that there is a code for X that attains the

entropy rate H(X). Then, to attain the entropy of Z, a code for the positions W that

has a zero rate on average is required. In the next section we propose a code for the

increments Nj with corresponding average code length that essentially attains the zero

entropy lower bound.

105

4.5.1 A possible code: Shannon + run-length

A code that essentially achieves the entropy rate for the process N is constructed.

For simplicity, we assume that pW = 1/2. The results for pW < 1/2 are analogous.

The process N is a time-varying first-order Markov process and its entropy rate is zero.

Notice that the Lempel-Ziv code is not necessarily optimal as the source is not stationary.

Our proposal uses a run-length code. In this case, we buffer the runs of equal symbols

(“left” or “right” corresponding to −1 or +1). The runs are then coded according to their

probability. Thus we can describe the code as follows: suppose J samples have already

been coded. Thus, we start buffering the samples NJ , NJ+1, etc., until NJ+r *= NJ .

Denote by C the random variable describing the next run of samples. Code C with a

Shannon code [86]. Thus, if C = c its codeword length is

lJ(c) = 6− log PJ(c)7.

From the Gambler’s ruin calculation, the probability of C samples in the next run,

given that J increments Nj where already coded, is

PJ(C) = P (NJ+C−1 *= NJ+C−2)J+C−2∏

j=J

P (Nj = Nj−1)

=1

J + C

C−2∏

j=0

J + j

J + j + 1

=1

J + C

J

J + C − 1. (4.43)

We will show this code has average bits per sample converging to zero as J grows

large. To prove this result, we need the following lemma:

Lemma 6 The function f(x) = log(x)/x is strictly decreasing for x > e.

Proof. Note that f ′(x) = (1 − log x)/x2 so that f ′(x) < 0 for x > e. !

This simple lemma is crucial in the proof of the next proposition.

106

Proposition 13 Consider the run-length-Shannon code for the increments Nj. Then

the average number of bits per run given that J increments have been coded goes to zero

as J → ∞ at a rate of O ((log J)2/J).

Proof. Denote the length in bits of a coded run of length C when J bits have been

coded by lJ(C). Then, using Lemma 6 we have

0 ≤ E

[lJ(C)

C

]≤ E

[1 + log 1/PJ(C)

C

]

=∑

C≥1

PJ(C) (1 + log(1/PJ(C)))

C

=∑

C≥1

1

C

J

(J + C)(J + C − 1)

(1 + log

(J + C)(J + C − 1)

J

)

≤∑

C≥1

1

C

J

(J + C − 1)2

(1 + log

(J + C − 1)2

J

)

=∑

C≥1

1

C

J

(J + C − 1)2

(1 + log J + 2 log

(J + C − 1)

J

)

=∑

C≥1

1

C

(J + J log J)

(J + C − 1)2+

2

CJ

log J+C−1J

(J + C − 1)2.

Now we prove that each of the terms in the right hand side converges to zero. The

first one gives

∑

C≥1

1

C

(J + J log J)

(J + C − 1)2= (J + J log J)

(1

J2+∑

C>1

1

C

1

(J + C − 1)2

)

≤ (J + J log J)

(1

J2+

∫

C≥1

1

C

1

(J + C − 1)2dC

)

= (J + J log J)

(1

J2+

1 − J + J log J

J(J − 1)2

),

where we have used the fact that the term in the summand is decreasing in C. The

majoring term goes to zero essentially as O((logJ)2J−1). For the second term we use

the inequality logx ≤ x − 1 for x > 0. This gives

107

∑

C≥1

2

CJ

log J+C−1J

(J + C − 1)2≤

∑

C≥1

2

C

C − 1

(J + C − 1)2

≤∫

C>1

2

C

C − 1

(J + C − 1)2dC

= 2J − 1 − log J

(J − 1)2,

which goes to zero as ∼ 1/J . Thus, we conclude that E

[lJ (C)

C

]goes to zero at least as

fast as O((log J)2J−1) !

4.5.2 Coding with a finite buffer

The code suggested in the previous section actually requires unbounded buffer sizes

(the runs can have arbitrarily large length). However, one may still get some bounds on

the average bits spent. Suppose we impose a limit on the complexity. That is, we let the

runs be no larger than some maximum value denoted by K. Then, the probability mass

function of the runs now is given by

P KJ (C) =

1

J + C

J

J + C − 1I{1≤C<K} +

J

J + KI{C≥K}. (4.44)

Following the previous development we can assign a Shannon “run-length” code based

on the above probabilities. Thus, when C ≥ K, we assign a run of size K with codeword

length according to its probability. A result similar to Proposition 13 can easily be

derived.

Proposition 14 Consider the Shannon run-length code with the runs bounded by K.

Then the average bit rate on the next run given that J bits have already been coded,

denoted by E

[lKJ (C)

C

], converges to 1/K as J → ∞.

Proof. Note that

E

[lKJ (C)

C

]≤

K∑

C=1

1 − log P KJ (C)

CP K

J (C) +J(1 − log J

J+K

)

K(J + K). (4.45)

108

The first term on the RHS converges to zero. To check that, we consider the maximum

inside the summand. We thus get:

K∑

C=1

1 − log P KJ (C)

CP K

J (C) ≤ KP KJ (1) − K

log P KJ (K)

KP K

J (K). (4.46)

Since for each K, P KJ (K) → 0 as J → ∞, and since x log x → 0 as x → 0, it

follows that the above converges to zero as fast as log J/J . Now, the second term clearly

converges to 1/K. Hence we conclude that

lim supJ→∞

E

[lKJ (C)

C

]≤ 1

K.

To get a lower bound, notice that when J is large, then 1/P KJ (K) approximates 1

from above, so that in turn log 1/P KJ (K) approximates 0 from above. Consequently, we

have that 6log 1/P KJ (K)7 = 1 for J large enough. Thus,

E

[lKJ (C)

C

]≥

K∑

C=1

−1 − log P KJ (C)

CP K

J (C) +J

K(J + K). (4.47)

The first term in the RHS goes to zero. To see that, consider the minimum inside the

summand. The second term in the RHS converges to 1/K. This leads to

1

K≤ lim inf

J→∞E

[lKJ (C)

C

]≤ lim sup

J→∞E

[lKJ (C)

C

]≤ 1

K,

which proves the proposition. !

The above proposition suggests that with sufficient memory, the proposed code can

be very close to the average length of the case where the complexity is not bounded.

We verify the results of Propositions 13 and 14 by simulating the Shannon code for

the run-lengths of new states with computer generated random walks. Thus we apply

our proposed code to several sample paths and average out the results. As Figure 4.11

indicates, in the case where the code has infinite resources, then the coded runs can be

arbitrarily large and consequently the average rate converges to zero, as Proposition 13

109

suggests. In the case where we limit the buffer size to K, the simulation shows the rate

converges to 1/K’, as predicted by Proposition 14.

101 102 103 1040

0.1

0.2

0.3

0.4

0.5

0.6

0.7

j

Aver

age

Rate

K=3K=5K=10K=50K=∞

Figure 4.11 The proposed code for the trajectory. The proposed code with buffer sizeK attains an entropy rate of roughly 1/K. Notice that when K is infinity, the codeattains the entropy rate bound as the number of samples j goes to infinity.

4.6 Conclusion

We have proposed a stochastic model for video that enables the precise computation

of information rates. For the static case, we provided lossless and lossy information rate

bounds that are tight in a number of interesting cases. In some scenarios, the theoretical

results support the ubiquitous hybrid coding paradigm of extracting motion and coding

a motion compensated sequence.

110

We extended the model to account for changes in the background scene, and com-

puted bounds for the lossless and lossy information rates for the particular case of AR(1)

innovations. The bounds for this “dynamic reality” are tight in some scenarios, namely

when the background scene changes slowly with time (i.e., ρ close to 1).

The model explains precisely how long-term motion prediction helps coding in both

static and dynamic cases. In the dynamic model, this is related to the two parameters

(pW , ρ) that symbolizes the rate of recurrence in motion and the rate of changes in

the scene. As (pW , ρ) → (0.5, 1), long term memory predictions result in significant

improvements (in excess of 3.5 dB). By contrast, if either ρ is away from 1, or if pW is

away from 0.5, long term memory brings very little improvement.

Although we developed the results for the Bernoulli random walk, the model can

be generalized to other random walks on Z and Z2. Our current work includes such

generalizations. It also includes estimating ρ and pW for real video signals and fitting

the model to such signals.

We have shown that the entropy of the POF in problems such as the light field reduces

to the entropy of the scene around it. To assert this, we have used a random walk model

and we have shown that the trajectory of the random walk eventually has zero entropy

rate. For the simple example of a 1-D camera and a Bernoulli random walk, we have

shown a simple code that attains the entropy bound.

111

CHAPTER 5

CONCLUSION AND FUTURE DIRECTIONS

5.1 Summary

In the introduction we emphasized that efficient representation of visual information

requires a good understanding and handling of geometrical structure. This is the case in

static images, as well as in motion pictures. We have examined several problems related

to geometrical visual representation, processing, and coding. In particular, the following

was accomplished:

• Digital multidimensional filters with directional vanishing moments were proposed,

and a new filter bank design criteria suitable for multidimensional expansion was

developed. This novel class of filters have the property of annihilating directional

edges in images. The associated filter design problem was studied and characterized.

A flexible design methodology was presented. Applications of the proposed filter

banks in the context of the contourlet transform have shown that the proposed

filters yield reconstructed images with fewer ringing artifacts and thus better visual

quality. Moreover, the proposed filters are shorter and less complex than those of

competing designs.

• The nonsubsampled contourlet transform was proposed. This transform is suit-

able for applications that can afford redundancy and complexity such as denoising,

enhancement, and curvature detection. The proposed construction was studied in

detail and a frame analysis was provided. A design method that allows for regular-

112

ity, as well as frame stability control and sharp frequency resolution, was proposed.

Our design not only ensures the regularity of the basis vectors, but it also ensures

the almost tightness of the associated frame operator, and it has a fast algorithm

that results in substantial computational savings. The proposed transform is shown

to outperform similar transforms such as curvelets and the undecimated wavelet

transform in image denoising via hard-thresholding. Moreover, when coupled with

a fairly simple denoising strategy based on soft-thresholding, the resulting denoising

algorithm performed similar to the more sophisticated denoising scheme of [60].

• A stochastic model to study the information rates of the plenoptic function was

proposed. In the two cases considered, namely that of video and that of light

field, information rates were derived. In the video case, the simplicity of the model

enabled us to compute precise information rates. To the best of our knowledge, ours

is the first model for video that attempts to compute the true information-theoretic

rate-distortion rather than the optimal rate-distortion performance of a particular

coding method. We proposed models for static and dynamic realities and computed

information rates for both models. The proposed methodology gives new insight

into the source coding problem associated with video. In particular, the model

provides a characterization of performance in the presence of long-term memory,

and it also supports the hybrid coding paradigm of compensating for motion prior

to predictive coding in the low distortion regime. In the light field case we have

shown that the entropy of the process of interest reduces to the entropy of the scene

around it. That is, when reconstruction of the scene is the primary objective, the

trajectory and motion of the camera become irrelevant for coding purposes. In our

model we have shown by means of a simple example how a run-length code can

essentially code the trajectories at an average code length close to zero.

We point out that geometry in the POF comes in the form of explicit modeling and

coding of camera positions. This is analogous to capturing edges in images using filters

with directional vanishing moments, or the nonsubsampled contourlet transform.

113

5.2 Future Directions

Exploiting and representing geometrical structure in digital data is a challenging and

active research task. The use of multidimensional filter banks such as the proposed DVM

filter bank, or the NSCT, to decorrelate visual data rich in geometrical structure is a very

promising approach. Moreover, the understanding of the POF in the form of video is

very important from a practical view point. Our proposed model for POF video is simple

and provides a framework in which rate-distortion type calculations can be done. In that

respect, our contribution offers a valuable new perspective with potential to influence

several aspects of video compression. In view of that, in the following we outline some

future research directions to take.

5.2.1 Filter banks with directional vanishing moments

Even though in this dissertation we focused on the use of DVM filters in conjunc-

tion with the contourlet transform, there are many other applications where filter banks

with DVM can be useful. For example, the DVM filters with the NSCT transform can

potentially provide a better complexity/performance tradeoff with better visual quality

in applications such as denoising. Moreover, in critically sampled transforms such as the

one in [39], the DVM filters will likely be a better alternative to filters designed with fre-

quency selectivity as the primary design criterion. The DVM filters can also be useful in

edge and curvature detection. In this case, one can use a multiscale decomposition such

as contourlets coupled with DVM filters to detect straight lines on a single scale. Notice

that since this is an analysis task, there is no need to impose perfect reconstruction and

as a result we can use filters with many DVMs.

5.2.2 The nonsubsampled contourlet transform

The NSCT is a very useful transform. We strongly believe the NSCT have potential

to redefine the state-of-the-art in several image processing applications. In denoising,

for instance, we anticipate that more sophisticated estimation techniques in the NSCT

114

w2

w1

(−π,−π)

(π,π)

Figure 5.1 The idealized analytic complex transform. The frame elements are supportedon the first and third quadrants of the frequency plane. The real and imaginary parts ofeach atom are supported in the whole plane following the dashed boundaries.

domain can lead to even better results. Such techniques will likely involve a better

statistical model for the distribution of the coefficients in the NSCT domain.

One shortcoming of the NSCT is that it can be highly redundant and complex. We

already discussed in Chapter 3 that an alternative to lower complexity is to use a critically

sampled directional filter bank. This alternative suffers from aliasing due to the tree-

structure. However, this aliasing can be substantially reduced by carefully designing

the filters in the DFB. Preliminary experiments with this approach indicate that when

denoising ultrasound images, the new fast NSCT performs similarly to the full NSCT,

but at a much reduced computational cost.

5.2.3 Complex contourlet transform

The greatest shortcoming of the NSCT is its increased redundancy. As alternative to

this shortcoming, we investigate possible ways of constructing a complex contourlet trans-

form (CCT). This alternative is akin the one offered by the complex wavelet transform

(CWT) [24, 101]. The CWT is almost shift-invariant, but it is much less redundant than

the nonsubsampled wavelet transform. The goal is to have a decomposition consisting of

complex filters resulting in the subband decomposition shown in Figure 5.1.

A CCT can be obtained using the Hilbert transformers of [102] as postprocessors.

This leads to a complex contourlet transform in which the basis elements have different

115

Figure 5.2 Complex contourlet transform basis functions (4 out of 8 directions shown).real and imaginary parts on top and bottom, respectively. Note the different symmetryof the real and imaginary parts.

orientations in addition to different symmetries. Such construction is more redundant

than the one in [103], but its filter design is much easier and it has the perfect recon-

struction property. This is not the case with the one proposed in [103].

Figure 5.2 shows the basis function (real and imaginary parts) at a coarse scale of the

proposed CCT.

The CCT can be a good deal less expensive alternative to the NSCT. In this direction,

we plan on investigating other applications where the CCT performs similarly to the

NSCT but at a much reduced cost.

5.2.4 Information rates of the plenoptic function

We have proposed a simple model to study the compression problem of visual scenes

in the presence of camera motion. The proposed model is powerful and already provides

further understanding of the video compression problem.

One of the missing points, however, is to use the model to make accurate predictions

with real video sources. The dynamic reality model proposed provides a realistic model

for video on very contrived scenarios. To account for the complexities observed on a

typical video sequence, the trajectory model should include random walks that are better

fits for typical motion vector trajectories. For example, one such model is to consider

116

2-D random walks in the Z2 lattice. Preliminary experiments on this direction indicate

that our model can indeed make valuable predictions. For instance, the model provides a

precise characterization of multiframe prediction and how it affects the bitrate. Further

experiments and conclusions in this direction are the goals of our future work.

117

REFERENCES

[1] S. Mallat, A Wavelet Tour of Signal Processing. London, UK: Academic Press,

1999.

[2] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM, 1992.

[3] J. Duffin and A. C. Schaefer, “A class of nonharmonic Fourier series,” Trans. Amer.

Math. Soc., vol. 72, pp. 341–366, 1952.

[4] M. J. T. Smith and T. P. B. III, “Exact reconstruction techniques for tree-

structured subband coders,” IEEE Trans. Acoust. Speech, and Signal Process.,

vol. 34, pp. 434–441, June 1986.

[5] F. Mintzer, “Filters for distortion-free two-band multirate filter banks,” IEEE

Trans. Acoust. Speech, and Signal Process., vol. 33, pp. 626–630, June 1985.

[6] M. Vetterli, “Filter banks allowing perfect reconstruction,” Signal Processing,

vol. 10, no. 3, pp. 219–244, 1986.

[7] P. P. Vaidyanathan, Multirate Systems and Filterbanks. Englewood Cliffs, NJ:

Prentice Hall, 1993.

[8] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Englewood Cliffs,

NJ: Prentice Hall, 1995.

[9] G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley, MA: Wellesley-

Cambridge Press, 1996.

118

[10] H. S. Malvar, Signal Processing with Lapped Transforms. Boston, MA: Artech

House, 1992.

[11] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet

representation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.

PAMI-11, no. 7, pp. 674–693, 1989.

[12] Z. Cvetkovic and M. Vetterli, “Oversampled filter banks,” IEEE Trans. on Signal

Proc., vol. 46, no. 5, pp. 1245–1255, May 1998.

[13] H. Bolcskei, F. Hlawatsch, and H. G. Feichtinger, “Frame-theoretic analysis of

oversampled filter banks,” IEEE Trans. Signal Proc., vol. 46, no. 12, pp. 3256–

3268, December 1998.

[14] I. W. Selesnick, “The double density wavelet transform,” in Wavelet in Signal and

Image Analysis: From Theory to Practice, A. Petrosian and F. G. Meyer, Eds.

Norwell, MA: Kluwer, 2001.

[15] I. Daubechies, B. Han, A. Ron, and Z. Shen, “Framelets: MRA-based constructions

of wavelet frames,” Appl. Comput. Harmon. Anal., vol. 14, no. 1, pp. 1–46, 2003.

[16] E. L. Pennec and S. Mallat, “Sparse geometric image representation with ban-

delets,” IEEE Trans. Image Proc., vol. 14, no. 4, pp. 423–438, April 2005.

[17] G. Peyre and S. Mallat, “Surface compression with geometric bandelets,” ACM

Transactions on Graphics (SIGGRAPH’05), vol. 14, no. 3, pp. 601–608, 2005.

[18] M. B. Wakin, J. K. Romberg, H. Choi, and R. G. Baraniuk, “Wavelet-domain

approximation and compression of piecewise smooth images,” IEEE Trans. Image

Proc., vol. 15, no. 5, pp. 1071–1087, May 2006.

[19] D. L. Donoho, “Wedgelets: nearly minimax estimation of edges,” Ann. Statist.,

vol. 27, no. 3, pp. 859–897, 1999.

119

[20] M. N. Do and M. Vetterli, “The contourlet transform: An efficient directional

multiresolution image representation,” IEEE Trans. Image Proc., vol. 14, no. 12,

pp. 2091–2106, Dec. 2005.

[21] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, “Shiftable

multiscale transforms,” IEEE Trans. Info. Theory, vol. 38, no. 2, pp. 587–607,

March 1992.

[22] R. H. Bamberger, “The directional filter bank: A multirate filter bank for the di-

rectional decomposition of images,” Ph.D. dissertation, Georgia Institute of Tech-

nology, 1990.

[23] R. H. Bamberger and M. J. T. Smith, “A filter bank for the directional decompo-

sition of images: Theory and design,” IEEE Trans. Signal Proc., vol. 40, no. 4, pp.

882–893, April 1992.

[24] N. G. Kingsbury, “Image processing with complex wavelets,” Phil. Trans. R. Soc.

Lond., vol. 357, no. 1760, pp. 2543–2560, Sept. 1999.

[25] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,”

IEEE Trans. Communications, vol. 31, no. 4, pp. 532–540, April 1983.

[26] J.-X. Chai, S.-C. Chan, H.-Y. Shum, and X. Tong, “Plenoptic sampling,” in Pro-

ceedings of SIGGRAPH, 2000, pp. 307–318.

[27] M. N. Do, D. Marchand-Maillet, and M. Vetterli, “On the bandlimitedness of the

plenoptic function,” in Proceedings of the IEEE International Conference on Image

Processing (ICIP), vol. 3, Genoa, Italy, 2005, pp. 17–20.

[28] C. Zhang and T. Chen, “Spectral analysis for sampling image-based rendering

data,” IEEE Trans. on CSVT Special Issue on Image-Based Modeling, Rendering

and Animation, vol. 13, pp. 1038–1050, Nov. 2003.

120

[29] H.-Y. Shum, S. B. Kang, and S.-C. Chan, “Survey of image-based representations

and compression techniques,” IEEE Trans. on CSVT Special Issue on Image-Based

Modeling, Rendering and Animation, vol. 13, no. 11, pp. 1020–1037, Nov. 2003.

[30] M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of SIGGRAPH,

1996, pp. 31–42.

[31] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, “The lumigraph,” in

Proceedings of SIGGRAPH, 1996, pp. 43–54.

[32] J. Kovacevic and M. Vetterli, “Nonseparable 2-dimensional and 3-dimensional

wavelets,” IEEE Trans. on Signal Proc., vol. 43, no. 5, pp. 1269–1273, May 1995.

[33] A. L. da Cunha and M. N. Do, “Bi-orthogonal filter banks with directional vanishing

moments,” in Proceedings of the IEEE ICASSP, vol. 4, Philadelphia, PA, 2005, pp.

553–556.

[34] A. L. da Cunha and M. N. Do, “On two-channel filter banks with directional

vanishing moments,” IEEE Trans. Image Proc., to be published, 2007.

[35] A. L. da Cunha and M. Do, “Linear-phase filter design for directional multireso-

lution decompositions,” in Proc. of SPIE Conference on Wavelet Applications in

Signal and Image Processing XI, vol. 5914, San Diego, CA, July 2005, pp. 263–273.

[36] E. J. Candes and D. L. Donoho, “New tight frames of curvelets and optimal rep-

resentations of objects with piecewise C2 singularities,” Comm. Pure and Appl.

Math, vol. 57, no. 2, pp. 219–266, February 2004.

[37] M. N. Do and M. Vetterli, “Framing pyramids,” IEEE Trans. Signal Proc., vol. 51,

no. 9, pp. 2329 – 2342, Sept. 2003.

[38] M. N. Do, “Directional multiresolution image representations,” Ph.D. dissertation,

Swiss Federal Institute of Technology, Lausanne, Switzerland, December 2001.

121

[39] Y. Lu and M. Do, “Crisp-contourlet: A critically sampled directional multireso-

lution representation,” in Proc. SPIE Conf. on Wavelets X, San Diego, CA, Aug.

2003, pp. 655–665.

[40] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P. L. Dragotti, “Direction-

lets: Anisotropic multidirectional representation with separable filtering,” IEEE

Transactions on Image Processing, vol. 15, no. 7, pp. 1916–1933, July 2006.

[41] M. Vetterli, “Wavelets, approximation, and compression,” IEEE Signal Proc. Mag.,

vol. 18, pp. 59–73, Sept. 2001.

[42] A. Cohen and I. Daubechies, “Non-separable bidimensional wavelet bases,” Rev.

Math. Iberoamer, vol. 9, no. 1, pp. 51–137, 1992.

[43] J. Kovacevic and M. Vetterli, “Nonseparable multidimensional perfect reconstruc-

tion filter banks and wavelet bases for Rn,” IEEE Trans. Information Theory,

vol. 38, no. 2, pp. 533–555, March 1992.

[44] E. Viscito and J. P. Allebach, “The analysis and design of multidimensional FIR

perfect reconstruction filter banks for arbitrary sampling lattices,” IEEE Trans.

Circuits and Systems, vol. 38, no. 1, pp. 29–41, Jan. 1991.

[45] M. Vetterli, “Multi-dimensional subband coding: Some theory and algorithms,”

Signal Processing, vol. 6, no. 2, pp. 97–112, 1984.

[46] M. Vetterli and C. Herley, “Wavelets and filter banks: Theory and design,” IEEE

Trans. on Signal Proc., vol. 40, pp. 2207–2232, Sept. 1992.

[47] R. Ansari, C. W. Kim, and M. Dedovic, “Structure and design of two-channel filter

banks derived from a triplet of halfband filters,” IEEE Trans. CAS-II, vol. 46,

no. 12, pp. 1487–1496, December 1999.

[48] D. Wei and S. Guo, “A new approach to the design of multidimensional nonsepara-

ble two-channel orthonormal filter banks and wavelets,” IEEE Signal Proc. Letters,

vol. 7, no. 11, pp. 327–330, November 2000.

122

[49] D. B. H. Tay and N. G. Kingsbury, “Flexible design of multidimensional perfect

reconstruction FIR 2-band filters using transformation of variables,” IEEE Trans.

Image Proc., vol. 2, no. 4, pp. 466–480, October 1993.

[50] S.-M. Phoong, C. W. Kim, P. P. Vaidyanathan, and R. Ansari, “A new class of

two-channel biorthogonal filter banks and wavelet bases,” IEEE Trans. on Signal

Proc., vol. 43, no. 3, pp. 649–661, March 1995.

[51] D. Stanhill and Y. Y. Zeevi, “Two-dimensional orthogonal wavelets with vanishing

moments,” IEEE Trans. Signal Proc., vol. 44, no. 10, pp. 2579–2590, October 1996.

[52] D. B. H. Tay, “Design of filter banks/wavelets using TROV: A survey,” Digital

Signal Processing, vol. 7, no. 4, pp. 229–238, Oct. 1997.

[53] A. L. da Cunha, J. Zhou, and M. Do, “The nonsubsampled contourlet transform:

Theory, design, and applications,” IEEE Trans. Img. Proc., vol. 15, no. 10, pp.

3089–3101, Oct. 2006.

[54] A. L. da Cunha, J. Zhou, and M. Do, “The nonsubsampled contourlet transform:

Filter design and application in denoising,” in Proceedings of the IEEE Interna-

tional Conference on Image Processing (ICIP), vol. 1, Genoa, Italy, 2005, pp. 749–

752.

[55] J. Zhou, A. L. da Cunha, and M. Do, “The nonsubsampled contourlet transform:

Construction and application in enhancement,” in Proceedings of the IEEE Inter-

national Conference on Image Processing (ICIP), vol. 1, Genoa, Italy, 2005, pp.

469–472.

[56] J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency,

and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc.

Am. A, vol. 2, no. 7, pp. 1160–1169, July 1985.

123

[57] R. R. Coifman and D. L. Donoho, “Translation invariant de-noising,” in Wavelets

and Statistics, A. Antoniadis and G. Oppenheim, Eds. New York: Springer-Verlag,

1995, pp. 125–150.

[58] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image

denoising and compression,” IEEE Trans. Image Proc., vol. 9, no. 9, pp. 1532–

1546, September 2000.

[59] L. Sendur and I. W. Selesnick, “Bivariate shrinkage with local variance estimation,”

IEEE Signal Proc. Letters, vol. 9, no. 12, pp. 438–441, December 2002.

[60] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising

using scale mixtures of Gaussians in the wavelet domain,” IEEE Trans. Image

Proc., vol. 12, no. 11, pp. 1338–1351, 2003.

[61] M. J. Shensa, “The discrete wavelet transform: Wedding the a trous and Mallat

algorithms,” IEEE Trans. Signal Proc., vol. 40, no. 10, pp. 2464–2482, October

1992.

[62] J. L. Starck, F. Murtagh, and A. Bijaoui, Image Processing and Data Analysis.

New York, NY: Cambridge University Press, 1998.

[63] J. H. McClellan, “The design of two-dimensional digital filters by transformation,”

in Proc. 7th Annual Princeton Conf. Information Sciences and Systems, Princeton,

NJ, 1973, pp. 247–251.

[64] S. Mitra and R. Sherwood, “Digital ladder networks,” IEEE Trans. on Audio and

Electroacoustics, vol. AU-21, no. 1, pp. 30–36, February 1973.

[65] W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal

wavelets,” Appl. Comput. Harmon. Anal., vol. 3, no. 2, pp. 186–200, 1996.

[66] R. E. Blahut, Fast Algorithms for Digital Signal Processing. Reading, MA:

Addison-Wesley, 1985.

124

[67] R. Jia, “Approximation properties of multivariate wavelets,” Mathematics of Com-

putation, vol. 67, pp. 647–665, 1998.

[68] T. Cooklev, T. Yoshida, and A. Nishihara, “Maximally flat half-band diamond-

shaped FIR filters using the Bernstein polynomial,” IEEE Trans. CAS-II, vol. 40,

no. 11, pp. 749–751, Nov. 1993.

[69] J.-L. Starck, E. J. Candes, and D. L. Donoho, “The curvelet transform for image

denoising,” IEEE Trans. Image Proc., vol. 11, no. 6, pp. 670–684, June 2002.

[70] D. D.-Y. Po and M. N. Do, “Directional multiscale modeling of images using the

contourlet transform,” IEEE Trans. Img Proc., vol. 15, no. 6, pp. 1610–1620, June

2006.

[71] S. G. Chang, B. Yu, and M. Vetterli, “Spatially adaptive wavelet thresholding with

context modeling for image denoising,” IEEE Trans. Image Proc., vol. 9, no. 9, pp.

1522–1531, September 2000.

[72] A. L. da Cunha, M. N. Do, and M. Vetterli, “On the information rates of the

plenoptic function,” in Proceedings of the IEEE International Conference on Image

Processing (ICIP), Atlanta, GA, 2006, pp. 2489–2492.

[73] A. L. da Cunha, M. N. Do, and M. Vetterli, “A stochastic model for video and its

information rates,” in Prof. of IEEE Data Compression Conference (DCC), March

2007, pp. 3–12.

[74] E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of

early vision,” in Computational Models of Visual Processing, M. Landy and J. A.

Movshon, Eds. Cambridge, UK: MIT Press, 1991, pp. 3–20.

[75] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Englewood

Cliffs, NY: Prentice Hall, 2002.

125

[76] C. Zhang and T. Chen, “A survey on image-based rendering representation,

sampling and compression,” EURASIP Signal Processing: Image Communication,

vol. 19, pp. 1–28, Jan. 2004.

[77] L.-W. He and H.-Y. Shum, “Rendering with concentric mosaics,” in Proceedings of

SIGGRAPH, 1999, pp. 299–306.

[78] A. Telkap, Digital Video Processing. Upper Saddle River, NJ: Prentice-Hall, 1995.

[79] B. Girod, “The efficiency of motion-compensating prediction for hybrid coding of

video sequences,” IEEE Journal of Selected Areas in Communications, vol. SAC-5,

no. 7, pp. 1140–1154, August 1987.

[80] P. Ramanathan and B. Girod, “Rate-distortion analysis for light field coding and

streaming,” EURASIP Signal Processing: Image Communication, vol. 21, no. 6,

pp. 462–475, July 2006.

[81] N. Gehrig and P. L. Dragotti, “Distributed compression of the plenoptic function,”

in Proc. IEEE Int. Conf. on Image Proc., vol. 1, Singapore, 2004, pp. 529–532.

[82] P. Ramanathan and B. Girod, “Receiver-driven rate-distortion optimized streaming

of light fields,” in Proc. IEEE International Conference on Image Processing, vol. 3,

Genoa, Italy, 2005, pp. 25–28.

[83] T. Berger, Rate-Distortion Theory: A Mathematical Basis for Data Compression.

Englewood Cliffs, NJ: Prentice-Hall, 1972.

[84] J. Rudnick and G. Gaspari, Elements of the Random Walk. Cambridge, UK:

Cambridge University Press, 2004.

[85] W. Feller, An Introduction to Probability Theory and Its Applications. New York,

NY: John Wiley and Sons, 1957.

[86] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York, NY:

John Wiley & Sons, 1991.

126

[87] H. Li and R. Forchheimer, “Extended signal-theoretic techniques for very low bit-

rate video coding,” in Video Coding: The Second Generation Approach. Norwell,

MA: Kluwer, 1996, pp. 383–428.

[88] T. Wiegand, X. Zhang, and B. Girod, “Long-term memory motion-compensated

prediction,” IEEE Trans. CSVT, vol. 9, no. 1, pp. 70–84, February 1999.

[89] C. Herley, “ARGOS: Automatically extracting repeating objects from multimedia

streams,” IEEE Trans. on Multimedia, vol. 8, no. 1, pp. 115–129, Feb 2006.

[90] N. Vasconelos and A. Lippman, “Library-based image coding,” in Proc. IEEE Int.

Conf. Acoust., Speech, and Signal Proc., vol. 5, Adelaide, Australia, April 1994,

pp. 489–492.

[91] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate cod-

ing,” IEEE Transactions on Information Theory, vol. 24, no. 5, pp. 530–536, Sept.

1978.

[92] Y. Steinberg and M. Gutman, “An algorithm for source coding subject to a fidelity

criterion, based on string matching,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp.

877–886, May 1993.

[93] M. S. Pinsker, Information and Information Stability of Random Variables. San

Francisco, CA: Holden Day, 1984.

[94] R. M. Gray, “A new class of lower bounds to information rates of stationary sources

via conditional rate-distortion functions,” IEEE Trans. on Info. Theory, vol. IT-19,

no. 4, pp. 480–489, July 1973.

[95] R. M. Gray, Entropy and Information Theory. New York: Springer-Verlag, 1990.

[96] Y. Sermadevi, J. Chen, S. Hemami, and T. Berger, “When is bit allocation for

predictive video coding easy?” in Data Compression Conference (DCC), Snowbird,

UT, 2005, pp. 289–298.

127

[97] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A Foun-

dation for Computer Science. Boston, MA: Addison-Wesley Longman Publishing

Co., Inc., 1989.

[98] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, UK: Cambridge

Univ. Press, 1999.

[99] R. M. Gray, “On the asymptotic eigen-value distribution of Toeplitz matrices,”

IEEE Trans. Inf. Theory, vol. 18, no. 6, pp. 725–730, 1972.

[100] M. Magnor and B. Girod, “Data compression for light-field rendering,” IEEE

Transactions on CSVT, vol. 10, no. 3, pp. 338–343, 2000.

[101] I. W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury, “The dual-tree complex

wavelet transform,” IEEE Signal Proc. Mag., vol. 22, no. 6, pp. 123–151, Nov.

2005.

[102] F. C. A. Fernandes, R. L. C. van Spaendonck, and C. S. Burrus, “A new framework

for complex wavelet transforms,” IEEE Trans. Sign. Proc., vol. 51, no. 7, pp. 1825–

1837, July 2003.

[103] T. T. Nguyen and S. Oraintara, “Shift-invariant multiscale multidirectional image

decomposition,” in Proc. of IEEE ICASSP, vol. 2, Toulouse, France, 2006, pp.

153–156.

128

GEOMETRICAL REPRESENTATION, PROCESSING, AND CODING …

Documents

Transcript of GEOMETRICAL REPRESENTATION, PROCESSING, AND CODING …