Michtom School of Computer Science | Brandeis …gim/Papers/DCC99.doc · Web view[10] P.G. Howard,...

Adaptive Linear Prediction Lossless Image CodingGiovanni Motta1, James A. Storer1 and Bruno Carpentieri2

Abstract: The practical lossless digital image compressors that achieve the best results in terms of compression ratio are also simple and fast algorithms with low complexity both in terms of memory usage and running time. Surprisingly, the compression ratio achieved by these systems cannot be substantially improved even by using image-by-image optimization techniques or more sophisticate and complex algorithms [6]. A year ago, B. Meyer and P. Tischer were able, with their TMW [2], to improve some current best results (they do not report results for all test images) by using global optimization techniques and multiple blended linear predictors. Our investigation is directed to determine the effectiveness of an algorithm that uses multiple adaptive linear predictors, locally optimized on a pixel-by-pixel basis. The results we obtained on a test set of nine standard images are encouraging, where we improve over CALIC on some images.

IntroductionAfter the Call for Contributions for ISO/IEC JTC 1.29.12 (lossless JPEG), the field of greylevel lossless image compression received great attention from many researchers in the data compression community. Most of the contributions are very effective in compressing images, while keeping low the computational complexity and the memory requirements. On the other hand, most of them use heuristics and, even if the compression ratio achieved cannot be in practice easily improved, it is not completely clear they are able to capture the real entropy of the image or not. In [2] and [5] B. Meyer and P. Tischer proposed TMW, a lossless image coding algorithm that, by using linear predictors, achieves on some test images compression performance higher than CALIC [3], that is the best (in terms of compression ratio) algorithm known so far. TMW improves the current best results by using global optimization and blended linear predictors; a TMW compressed file consists of two parts: a header that contains the parameters of the model and the encoded data itself. Even if TMW has a computational complexity several orders of magnitude greater than CALIC, the results are in any case surprising because:

Linear predictors are known to be not effective in capturing fast transitions in image luminosity (edges) [6];

Global optimization seemed unable to improve substantially the performance of lossless image compressors [6];

CALIC was thought to achieve a data rate extremely close to the real entropy of the image.

In this paper, we discuss a series of experiments we made with an algorithm that uses multiple adaptive linear predictors that are locally optimized on a pixel-by-pixel basis. We address the problem of greylevel lossless image compression exclusively from the point of view of the achievable compression ratio, without being concerned about computational complexity or memory requirements. The preliminary results we obtained on a test set of nine standard images are encouraging. We improve over CALIC on some test images and we believe that, with a

1 Brandeis University, Computer Science Dept., Waltham MA-02454, {gim, storer}@cs.brandeis.edu.2 Universita' di Salerno, Dip. di Informatica ed Applicazioni "R.M. Capocelli", I-84081 Baronissi (SA), Italy, [email protected].

better encoding of the prediction error, our algorithm can be competitive with CALIC and TMW.

Description of the AlgorithmOur algorithm is based on adaptive linear prediction and consists of two main steps, pixel prediction and entropy coding; a pseudocode description is given in Figure 3. The input image is scanned from top to bottom, left to right, and the luminosity of each pixel PIX(x, y) is predicted according to a weighted sum of its neighbors (or context) and rounded to the nearest integer value:

PIX(x, y)=int(w0 *PIX(x, y−2)+w1 *PIX(x−1,y−1)+w2 * PIX(x, y−1)+w3 *PIX(x +1, y−1)+w4 *PIX(x−2, y)+w5 *PIX(x−1,y))

Figure 1 shows the pixels that form the context of PIX(x, y). The context has a fixed shape and only the weights are allowed to change.

Title: predictor.epsCreator: fig2dev Version 3.2 Patchlevel 0-beta3Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Figure 1: Context of the pixel PIX(x, y).

After the prediction, an error ERR(x, y) (prediction error or residual) is calculated by subtracting the current pixel from its prediction

ERR( x,y) =PIX(x, y) −PIX(x, y)

and finally the prediction error is entropy encoded and sent to the decoder.If we encode the image in raster-scan order, with a top to bottom, left to right scan, the context will be composed by previously encoded pixels and the prediction error is sufficient for the decoder to make a faithful reconstruction of the original pixel value. During the encoding process, the weights w0 ,..., w5 are adaptively changed and optimized on a per pixel basis. Our intent is to determine the predictors' weights such that they are able to model local characteristics of the image being encoded. After several experiments, we decided to determine the predictor by optimizing the energy of the prediction error inside a small window of radius Rp centered on PIX(x,y), Wx ,y (Rp ) :

minw 0, ...,w5

E(x, y) = minw0 ,...,w5

(ERR( ′ x , ′ y ))2PIX( ′ x , ′ y )∈Wx, y(Rp )

∑

Using a window of previously encoded pixels we can use a backward prediction scheme and the encoder has no need to send any side information to the decoder. On the other

- 2 -

hand, backward prediction has as a well-known major drawback, poor performance in the presence of edges. The radius Rp of the window Wx ,y (Rp ) (See Figure 2) is one of the essential features of our algorithm. Its size affects the prediction quality because if Rp is too small, only a few samples are in the window and the predictor "overspecializes" making big errors when in presence on edges. On the other hand too many samples in the window ( Rp too big) tend to generate predictors that are not specific enough to remove local variations in the image. In our experiments, we decided to keep Rp constant and equal for all the images.

Title: window.epsCreator: fig2dev Version 3.2 Patchlevel 0-beta3Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Figure 2: Window Wx ,y (Rp ) of radius Rp and centered on PIX(x, y).

To improve the prediction, the optimization is performed only on a subset of samples collected in the window. The rationale is that we want the predictors' weights to be representative of the relation existing between the context and the pixel being encoded. By discarding samples that have a context "too different" from the one of the current pixel, we can specialize the prediction and follow fine periodic patterns in the window.Most algorithms existing in literature use a simple pixel predictor and compensate the weak prediction with sophisticate heuristics to model the error as a function of the context in which it occurs (see for example LOCO-I [1]). Our algorithm, instead, embeds the contextual encoding inside the error prediction step. The classification of the samples in clusters of pixels that have a similar context is performed by using a Generalized Lloyd Algorithm [9] (or LBG). This classification method, although not optimal in our framework, is good enough to improve the performance of the basic adaptive predictor. However, we are confident that a better classification would improve the performance of our algorithm.Once all the samples in the window are classified and a representative centroid is determined for each cluster, a cluster of pixels is selected according to the minimum distance between the context of the corresponding centroid and the context of the current pixel. Also, in a set of predictors, the one that achieves the lowest prediction error on the selected centroid is chosen. This predictor is finally refined by applying Gradient Descent optimization on the samples collected in the selected cluster.

for every pixel PIX(x,y) in the input image do beginCollect all the pixels and their context in Wx,y(Rp)

- 3 -

Determine n centroids C1,...,Cn by applying the LBG on the contexts in Wx,y(Rp)Let K1,...,Kn be the corresponding clustersClassify each pixel/context in Wx,y(Rp) in one of the clusters K1,...,Kn

Classify the context of the current pixel PIX(x,y); let k be the index of the clusterLet Pi={w0, .., w5} be the predictor that achieves the smallest error on Ck among a set of predictors P1,...,Pn

Apply the Gradient Descent on the pixels in Ck to refine the predictor Pi

Use the refined predictor P'i to predict PIX(x,y)Generate the prediction error ERR(x,y)

endFigure 3: Pseudocode description of the adaptive prediction.

At each step t of the optimization, while the difference between the previous and the current errors is smaller than a fixed threshold, the weights wi of the predictor are changed according to

wi(t +1) =wi(t) −m∂E∂wi

where E is the error energy and m a small constant. When only a few samples are in the window, for example when PIX(x, y) is close to the top or to the left border, a default fixed predictor is used in the prediction and the Gradient Descent optimization is not applied. In our implementation, we used as a default the classic "planar predictor" [6]:

Pdef ={w0 =0,w1 =−1,w2 =1,w3 =0,w4 =0,w5 =1}

The algorithm uses at each step the refined predictor from the previous iterations, so initialization is not an issue and the predictors P1,...,Pn can be initialized with random values without compromising the performance. We also experimented by initializing the predictors with the values used in JPEG-LS, this only resulted in a slightly faster convergence in the Gradient Descent optimization. Reinitializing the predictors, instead of using the previous refined weights, while resulting in a much slower convergence, doesn't seems to change the compression ratio.

Entropy CodingAs is common in literature [10], we assume that for most images, prediction errors can be closely approximated by a Laplace distribution. In our case, adaptive linear prediction generates a skewed Laplacian distribution, centered on zero and with very long tails. We decided to use an Arithmetic Encoder [8] for the error entropy coding. Arithmetic encoding divides the coding step into the determination of a probabilistic model for the source and in the entropy coding that uses that model. This results in a very general framework in which the modeling part can be easily customized to perform experiments. Minimum and maximum error have absolute values that are much greater than the

- 4 -

operative limits of the distribution; during the experiments, we observed that 95% of the errors are concentrated in an interval [-, ..., +] that is substantially narrower than [Min, ..., Max]. While typical values for s are in the range [8, ...,20], Min and Max assume, in general, values in the range [-120, ..., 120]. The Arithmetic Coder implementation that we used [11] has the limitation that the initial probabilities must be always greater than zero. As a consequence, when only a small number of samples is available to model the distribution, encoder efficiency can be compromised by the use of a coding range that is substantially greater than necessary. For this reason, we decided to experiment by using two different models: one for the "typical errors", in the range [-, ..., +], and another for the errors outside of that range.As Min and Max, the parameter is determined by an off-line observation of all the errors and must be sent to the decoder. While sending those parameters has a cost that is irrelevant from the point of view of the compressed file size, it makes our algorithm an off-line procedure. Adaptive arithmetic encoders that rescale their range can eventually be used to overcome this problem. Errors are encoded by separating magnitude and sign. This is reasonable because the sign of the prediction error has little or no correlation with its magnitude, and two different probabilistic models can be used for the encoding.Our implementation uses an arithmetic encoder with four different models:

ACM_P Parameters model, used only to transmit the header of the compressed file with all the global parameters (Min, Max, , Re);

ACM_M Used to encode the magnitude of the typical errors. It has symbols inside the range [0, ..., ] plus an extra symbol (+1) that is used to send an escape signal to the decoder;

ACM_E An adaptive model that is used to encode the magnitude of the non-typical errors. It has symbols in the range [+1, ..., max(abs(Max), abs(Min))];

ACM_S Used to encode the error sign. It has two symbols [0, 1] that represent positive and negative errors.

Unlike the other three models, ACM_M is not automatically updated and the probability distribution for the magnitude of the prediction error ERR(x, y) is determined each time by observing the previously encoded error magnitudes in a window Wx ,y (Re) or radius Re.

A gain in the compression ratio can be achieved by modeling properly the error sign; our current implementation however, uses a less effective and simpler adaptive model.

Results and DiscussionExperiments were performed in order to assess the algorithm on a test set composed of nine greylevel images 720 * 576 pixels, digitized with a resolution of 8 bit (256 grey levels) per pixel.

- 5 -

Figure 4: Comparisons with the entropy of the prediction error in LOCO-I.

The same test set is widely used for comparisons in most of the lossless data compression literature and can be downloaded from an ftp site [12]. Main results are expressed in terms of bit per pixels or by giving the size of the compressed file. This (inelegant) choice was necessary to evaluate the small variation that are usually rounded off when results are expressed in bit per pixel.Figure 4 compares the entropy of the prediction error of the simple fixed predictor used in LOCO-I with the entropy of the prediction error achieved by our algorithm. The results were obtained by using 2 predictors and by optimizing the predictors in a window of radius Rp =10 . For comparison we also indicated the overall performance of LOCO-I, after the sophisticate entropy coding with a context modeling.

# of predictors 1 2 4 6 8Baloon 154275 150407 150625 150221 150298Barb 227631 223936 224767 225219 225912Barb2 250222 250674 254582 256896 258557Board 193059 190022 190504 190244 190597Boats 210229 208018 209408 209536 210549Girl 204001 202004 202326 202390 202605Gold 235682 237375 238728 239413 240352Hotel 236037 236916 239224 240000 240733Zelda 195052 193828 194535 195172 195503

Total (in bytes) 1906188 1893180 1904699 1909091 1915106

Table 1: Compressed File Size vs. Number of Predictors. Results shown for a window of radius Rp =6 ; error is coded by using a single adaptive arithmetic encoder.

- 6 -

Prediction error

2.50

3.00

3.50

4.00

4.50

5.00

5.50

baloon barb barb2 board boats girl gold hotel zeldaImage

bit per pixel

LOCO-I (Error Entropy after Context Modeling)

LOCO-I (Entropy of the Prediction Error)

2 Predictors, Rp=10, Single Adaptive AC

Rp 6 8 10 12 14Baloon 150407 149923 149858 150019 150277Barb 223936 223507 224552 225373 226136Barb2 250674 249361 246147 247031 246265Board 190022 190319 190911 191709 192509Boats 208018 206630 206147 206214 206481Girl 202004 201189 201085 201410 201728Gold 237375 235329 234229 234048 234034Hotel 236916 235562 235856 236182 236559Zelda 193828 193041 192840 192911 193111

Total (in bytes) 1893180 1884861 1881625 1884897 1887100

Table 2: Compressed File Size vs. window radius RP. The number of predictors used is 2, prediction error is entropy encoded by using a single adaptive arithmetic encoder.

It is evident that our adaptive linear predictors are (understandably) much more powerful than the fixed predictor used in LOCO-I; however, even adaptive prediction hasn't enough power to capture edges and sharp transitions, present for example in the picture "hotel".Tables 1, 2 and 3 summarize the experiments we made in order to understand the sensitivity of the algorithm to its parameters. In these experiments, we measured the variations on the compressed file size when only one of the parameters changes.In the Table 1, the number of predictor is changed while keeping the window radiusRp =6 , conversely, in the Table 2, the number of predictors is kept fixed to 2 and the performance with respect the window size changes is evaluated.Both the experiments described in the Tables 1 and 2, were performed by using a very simple entropy coding scheme for the prediction error: a single adaptive arithmetic coder. As we also verified experimentally, the performance of a single adaptive arithmetic encoder are a good approximation of the first order entropy of the encoded data.

Re 6 8 10 12 14 16 18 20baloon 147518 147227 147235 147341 147479 147620 147780 147885barb 218411 216678 216082 215906 215961 216135 216370 216600barb2 237523 234714 233303 232696 232455 232399 232473 232637board 187058 186351 186171 186187 186303 186467 186646 186800boats 203837 202168 201585 201446 201504 201623 201775 201943girl 198050 197243 197013 197040 197143 197245 197356 197465gold 232617 230619 229706 229284 229111 229026 229012 229053hotel 231125 229259 228623 228441 228491 228627 228785 228949zelda 190311 189246 188798 188576 188489 188461 188469 188500Total (in bytes) 1846450 1833505 1828516 1826917 1826936 1827603 1828666 1829832

Table 3: Compressed File Size vs. error window radius Re. The number of predictors is 2 and Rp=10. Prediction error is encoded as described in the "Entropy Coding" section.

- 7 -

baloon barb barb2 board boats girl gold hotel zelda AverageSUNSET CB9 [1] 2.89 4.64 4.71 3.72 3.99 3.90 4.60 4.48 3.79 4.08LOCO-I [1] 2.90 4.65 4.66 3.64 3.92 3.90 4.47 4.35 3.87 4.04UCM [6] 2.81 4.44 4.57 3.57 3.85 3.81 4.45 4.28 3.80 3.952 Pred., Rp=10, EC with Re 2.84 4.16 4.48 3.59 3.89 3.80 4.42 4.41 3.64 3.91CALIC [6] 2.78 4.31 4.46 3.51 3.78 3.72 4.35 4.18 3.69 3.86TWM [2],[13] 2.65 4.08 4.38 3.61 4.28 3.80

Table 4: Compression rate in bit per pixel achieved on the test set by some popular lossless image encoding algorithms. The number of predictors used in our results is 2,

Rp=10 and entropy encoding is performed as described in the "Entropy Coding" section.

Table 3 reports the conclusive experiments; the number of predictors is kept fixed to 2, Rp=10 and performance is evaluated encoding the prediction error as described in the section "Entropy Coding". Results are reported for changes in the value of Re .Comparisons with some popular lossless image codecs (see Table 4 and Figure 5) shows that the proposed algorithm achieves good performance on most test set images. Where we fall short of CALIC confirms that linear prediction, even in this form, is not adequate to model image edginess. Also, unlike CALIC, our codec doesn't use any special mode to encode high contrast image zones, so our results are penalized by images like "hotel" that have high contrasted regions. A closer look to the prediction error magnitude and sign for "board" and "hotel", two images in the test set, shows that most of the edges in the original image are still present in the prediction error (Figures 6 and 7).

ConclusionThe preliminary results we obtained experimenting on a test set of nine standard images are encouraging. With a better classification and selection of the contexts in the prediction window and with a more sophisticated encoding of the prediction error, it may be possible to achieve stable and better results on all the test images. Also it is likely that the computational complexity can be substantially reduced without sacrificing the performance, by using alternative methods for the optimization of the predictors. Further complexity reduction may be possible by substituting the arithmetic coder with more efficient entropy coders.

AcknowledgmentWe wish to thank Martin Cohn for fruitful discussions.

- 8 -

Figure 5: Graphical representation of the data in Table 4.

Title: board_mag.epsCreator: MATLAB, The Mathworks, Inc.Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Title: board_sgn.epsCreator: MATLAB, The Mathworks, Inc.Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Title: hotel_mag.epsCreator: MATLAB, The Mathworks, Inc.Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Title: hotel_sgn.epsCreator: MATLAB, The Mathworks, Inc.Preview: This EPS picture was not saved with a preview (TIFF or PICT) included in itComment: This EPS picture will print to a postscript printer but not to other types of printers

Figures 6 and 7: Magnitude (left column) and sign (right column) of the prediction error in two images of the Test Set. Images are "board" (top row) and "hotel" (bottom row).

- 9 -

2.50

3.00

3.50

4.00

4.50

5.00

baloon barb barb2 board boats girl gold hotel zelda

Image File

bit per pixelLOCO-IUCMSUNSETCALICTWM2 Predictors, Rp=10, EC with Re

Bibliography [1] M.J. Weinberger, G. Seroussi and G. Sapiro, "LOCO-I: A Low Complexity,

Context-Based, Lossless Image Compression Algorithm", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996.

[2] B. Meyer and P. Tischer, "Extending TMW for Near Lossless Compression of Greyscale Images", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1998.

[3] X. Wu and N. Memon, "Context-based, Adaptive, Lossless Image Codec", IEEE Trans. on Communications, Vol.45, No.4, Apr 1997.

[4] X. Wu, W. Choi and N. Memon, "Lossless Interframe Image Compression via Context Modeling", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1998.

[5] B. Meyer and P. Tischer, "TMW - a New Method for Lossless Image Compression", International Picture Coding Symposium PCS97 conference proceedings, Sep 1997.

[6] X. Wu, "An Algorithmic Study on Lossless Image Compression", Proceedings IEEE Data Compression Conference (DCC), Snowbird, Utah, Mar-Apr 1996.

[7] D. Speck, "Fast Robust Adaptation of Predictor Weights from Min/Max Neighboring Pixels for minimal Conditional Entropy", Proc. Twenty-Ninth Asilomar Conference Signal, Systems and Computers, pgg. 234-242, Oct 30 - Nov 2, Pacific Groove, CA.

[8] I.H. Witten, R. Neal and J.G. Cleary, "Arithmetic Coding for Data Compression", Communications of the ACM, Vol.30, No.6, Jun 1987, pp.520-540.

[9] Y. Linde, A. Buzo and R.M. Gray, "An Algorithm for Vector Quantization Design", IEEE Trans. Communications, Jan 1980, v. COM-28, pgg. 84-95.

[10] P.G. Howard, "The Design and Analysis of Efficient Lossless Data Compression Systems", Ph.D. Dissertation, Department of Computer Science, Brown University, June 1993.

[11] F. Wheeler, Adaptive Arithmetic Coding, Source Code from: "http://ipl.rpi.edu/wheeler/ac/".

[12] X. Wu, Test Images, from: "ftp://ftp.csd.uwo.ca/pub/from_wu/images/".[13] B. Meyer, TMW Code and new results, from:

"http://www.cs.monash.edu.au/~bmeyer/tmw".

- 10 -

Michtom School of Computer Science | Brandeis …gim/Papers/DCC99.doc · Web view[10] P.G. Howard,...

Documents

Transcript of Michtom School of Computer Science | Brandeis …gim/Papers/DCC99.doc · Web view[10] P.G. Howard,...