bwt 2008

LOSSLESS IMAGE COMPRESSION USING

BURROWS WHEELER TRANSFORM (METHODS AND TECHNIQUES)

Elfitrin Syahrul, Julien Dubois, Vincent Vajnovszki, Taoufik Saidani*, Mohamed Atri* Laboratoire Electronique Informatique et Image – Le2i Université de Bourgogne,

France [email protected]

*Laboratoire Electronique et Microélectronique – Lab. It06

Université de Monastir, Tunisie

Abstract

The Burrows-Wheeler Transform (BWT) is a

combinatorial algorithm originally created for text compression such as bzip2, and that has been recently applied to image compression field. This paper focuses on the impact of compression scheme based on the combinatorial transform on high-level resolution medical images. It overviews the original scheme and some improvements that have been develop in post processing of BWT in this context. The performances of these techniques are compared and discussed. Moreover, considerations on the image’s sizes and data formats are also considered. 1. Introduction

The performance of Burrows–Wheeler Compression Algorithm (BWCA) has been improved since it was created [1]. Many improvements for this algorithm have been presented in the past years. Some of them treat the calculation of the Burrows–Wheeler Transform (BWT) itself. Other studies treat the entropy coding of the data stream. Finally, many publications concern the middle part of the algorithm, where the BWT output symbols are prepared for the following entropy coding. This paper reveals different techniques of BWCA in image compression.

2. Original scheme

A typical scheme of the Burrows-Wheeler Compression Algorithm (BWCA) has been introduced by Abel [1]. It consists of four stages as shown in Figure 1. Each stage is a transformation of the input data and reaches the output data to the next stage. The stages are processed sequentially from left to right. The first stage is the BWT itself. It sorts the data in a way that symbols with a similar context are grouped closely together and keeps constant the number of symbols during the transformation. The second stage is called in this article Global Structure Transform (GST), which transforms the local context of the symbols to a global context. A typical

representative of a GST stage is the Move-To-Front Transform (MTF). Burrows and Wheeler introduced it in their original publication [2]. It was the first algorithm used as a GST stage in a BWCA original scheme. The MTF stage is a List Update Algorithm (LUA), which replaces the input symbols with corresponding ranking values. Just like the BWT stage, the LUA stage does not alter the number of symbols.

Figure 1 Typical scheme of the Burrows-Wheeler

Compression Algorithm.

The third stage typically shrinks the number of symbols by applying a Run Length Encoding scheme (RLE). Different algorithms have been presented for this purpose, with the Zero Run Transform (RLE-0) from Wheeler found to be an efficient one. The last stage is the Entropy Coding (EC) stage, which compresses the symbols by using an adapted model.

We focus on lossless compression due to the aimed applications in medical field, nevertheless this scheme can be considered for lossless image compression as well as for lossy image compression. In lossy configuration a preprocessing based on DCT is added to compression [3].

3. Method evolution 3.1 Improvements of BWT Several authors have presented improvements to the original algorithm. Andersson and Nilsson have published in [4] about Radix Sort algorithm, which can be used as a first sorting step during the BWT. In his final BWT research report, Fenwick described some BWT sort improvements including sorting long words instead of single bytes [5]. Kurtz and Balkenhol presented several papers about BWT sorting stages with suffix trees, which needed less space than other suffix tree implementations and are linear in time [6]. Sadakane described a fast suffix array-sorting scheme [7], Larsson presented an extended suffix

2008 IEEE International Conference on Signal Image Technology and Internet Based Systems

978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI

338


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI

338


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI 10.1109/SITIS.2008.40

338


978-0-7695-3493-0/08 $25.00 © 2008 IEEE

DOI 10.1109/SITIS.2008.40

338

array-sorting scheme [8]. Based on already sorted suffices, Seward developed in 2000 two fast suffix sorting algorithms called "copy" and "cache" [9]. Itoh and Tanaka presented a fast sorting algorithm called the "two-stage suffix sort" [10]. Kao improved the two-stage suffix sort by some new techniques, which become very fast for sequences of repeated symbols [11]. Manzini and Ferragina [12] improved suffix array sorting techniques based on the results of Seward and of Itoh and Tanaka. 3.2 Improvement of RLE The main function of the RLE is to support the probability estimation of the next stage. Long runs of identical values tend to overestimate the global symbol probability, which leads to lower compression. Balkenhol and Shtarkov call this phenomenon "the pressure of runs" [13]. The RLE stage helps to decrease this pressure. In order to improve the probability estimation of the EC stage, the common BWCA schemes positions the RLE stage directly in front of the EC stage [1]. One common RLE stage for BWT, based compressors is Run Length Encoding Zero (RLE-0). Wheeler has suggested to code only the runs of the 0 symbols and no runs of other symbols, since 0 is the symbol with the most runs. Hereto an offset of 1 is added to symbols greater than 0. The run length is incremented by one and all bits of its binary representation except the most significant bit – which is always 1 – are stored with the symbols 0 and 1. Some authors have suggested an RLE stage before the BWT stage for speed optimization and for reducing BWT input, but such a stage deteriorates in general the compression ratio [14]. Otherwise, specific sorting algorithms are used to arrange the runs of symbols practically in linear time [9,10,11,12]. Other type of Run Length Encoding is RLE-2 that has been used by Abel [1]. The RLE-2 stage replaces all runs of two or more symbols by a run consisting of exactly two symbols. In contrast to other approaches, the length of the run is not placed behind the two symbols inside the symbol stream but transmitted into a separate data stream, so the length information does not disturb the context of the main data stream. 3.3 Improvement of Global Structure Transform Most GST stages use a recent ranking scheme for the List Update problem like Move-To-Front (MTF) algorithm, which is used in the original BWCA approach from Burrows and Wheeler. Many authors have presented improved MTF stages, which are

based on a delayed behavior, such as the MTF-1 and MTF-2 approaches of Balkenhol et al. or a sticky version by Fenwick [5]. Another approach, which achieved a much better compression ratio than MTF stages, is the Weighted Frequency Count (WFC) stage presented by Deorowicz [14], this scheme has a very high cost of computation. Other GST schemes like Inversion Frequencies (IF) [13] use a distance measurement between the occurrences of same symbol. Similar to the WFC stage of Deorowicz, Abel presented a list of counters, Incremental Frequency Count (IFC) [1]. The difference to the WFC stage is to minimize calculation. 3.4 Improvement of Entropy Coding The very first proposition of Burrows and Wheeler was to use the Huffman coder as the last stage; it is fast and simple, but the arithmetic coder is a better choice to achieve better compression ratio. Abel has modified arithmetic coding, because of the coding type of the IFC output inside the EC stage has a strong influence on the compression rate, indeed it is not sufficient to compress the index stream just by a simple arithmetic coder with a common order-n context. The index frequency of the IFC output has a nonlinear decay. Even after the use of an RLE-2 stage, the index 0 is still the most common index symbol on average. 4. Experiments and Results The experiments use medical images from IRMA (Image Retrieval in Medical Applications) database [15]. This database consists primary and secondary digitized X-ray films in portable network graphics (PNG) and tagged image file format (TIFF) format, 8 bits per pixel (bpp), examples of images are shown in Figure 2. The size of images is between 101 KB and 4684 KB.

Figure 2. Example of tested images. Upper row: directly digital, lower row: secondarily captured.

From left to right: hand; head; pelvis; chest, frontal; chest, lateral.

The first experiment implemented the original chain of BWCA (Figure 1) for medical image from IRMA database for both digital and secondarily digitized medical image format. These images are high-resolution sizes. The results of this test are presented in Table 1.

339339339339

For this study, the lossless compression schemes are used as references. We selected TIFF, raw image file format, joint photographic experts group format (JPEG and JPEG 2000) that is based on wavelet decomposition (JPEG 2000). Table 1 summarizes the observed compression ratio. BWCA original scheme can get better compression ratio than JPEG but JPEG 2000 is significantly better than BWCA original scheme. The average compression ratio of JPEG 2000 is 2.650 and BWCA original scheme is 2.387. BWCA original scheme get better result than JPEG 2000, in only two images, which are the second images of Heads Secondary and Pelvis Secondary. The compression ratio of BWCA original scheme for Pelvis Secondary is 3.104, while JPEG 2000 could get only 2.178. The difference of compression ratio between JPEG 2000 and BWCA original scheme in this image is quite significant, which is 0.926. This original scheme that was proposed by Burrows and Wheeler has a few flaws. Employing RLE-0 is not effective to decrease data, because many consecutive characters still exist after RLE-0. Employing Move-To-Front (MTF) as one of GST before RLE-0 could not reduce this phenomenon effectively, because MTF transforms one string of symbols into another string of symbols of the same length with different distribution.

Another GST is Incremental Frequency Count (IFC) that was introduced by Abel [1] is compared with MTF. It avoids the disadvantage of the MTF. MTF always sets each new symbol directly to front of the list no matter how seldom the symbol has appeared in the near past. IFC from Abel uses the technique of Weighted Frequency Count (WFC) from Deorowicz [13], by weighting the frequency of all symbols in the near past. Symbols outside the sliding window are no longer taken into account. By choosing the proper window size and weights, the WFC achieves very good results, but it has a high cost of computation, since the weighting of the symbols within the sliding window and the sorting of the list has to be recalculated for each symbol processed, therefore, IFC is proposed to reduce this weakness [1]. In general, the model of our test is based on the model of Lehmann [16] as seen in Figure 3.

Figure 3. The improved BWCA with an RLE-2 stage after the BWT stage.

Table 1. Comparable first results using BWCA original scheme.

Name of image

Size of raw image

TIFF Comp. ratio Jpeg Comp.

ratio Jpeg 2000 Comp. ratio

BWCA original scheme

Comp. ratio

Hands Primary

2 235 688 1 434 628 1.558 994 043 2.249 746 812 2.994 921 077 2.427 1 120 960 778 982 1.439 553 455 2.025 404 790 2.769 503 559 2.226

Hands Secondary

431 172 227 802 1.893 201 901 2.136 157 759 2.733 201 396 2.141 1 667 040 782 492 2.130 761 412 2.189 573 070 2.909 608 922 2.738

Heads Primary

1 515 533 1 071 570 1.414 760 802 1.992 593 391 2.554 681 419 2.224 2 839 656 1 838 850 1.544 1 284 695 2.210 966 688 2.938 1 119 363 2.537

Heads Secondary

2 788 500 1 297 898 2.148 1 179 829 2.363 951 033 2.932 1 041 038 2.679 3 256 000 1 441 664 2.259 1 357 005 2.399 1 277 882 2.548 1 143 073 2.848

Pelvis Primary

3 239 730 2 772 998 1.168 1 877 742 1.725 1 589 535 2.038 1 770 899 1.829 3 126 784 2 592 926 1.206 1 740 236 1.797 1 485 588 2.105 1 661 580 1.882

Pelvis Secondary

1 076 768 803 374 1.340 506 967 2.124 420 919 2.558 501 369 2.148 7 036 956 3 184 574 2.210 3 374 061 2.086 3 230 414 2.178 2 267 335 3.104

Thoraces Frontal Primary

3 713 600 3 244 154 1.145 2 046 205 1.815 1 830 742 2.028 2 011 249 1.846 3 405 076 2 912 946 1.169 1 806 522 1.885 1 611 065 2.114 1 780 515 1.912

Thoraces frontal

Secondary

6 957 060 2 832 738 2.456 2 651 775 2.624 2 047 942 3.397 2 431 091 2.862 7 006 860 3 374 332 2.077 3 027 914 2.314 2 543 669 2.755 2 607 353 2.687

Thoraces Lateral Primary

6 184 913 4 357 022 1.420 2 590 276 2.388 2 115 375 2.924 2 430 634 2.545 2 186 181 1 836 094 1.191 1 227 943 1.780 1 053 533 2.075 1 170 793 1.867

Thoraces Lateral

Secondary

5 859 510 3 611 076 1.623 1 957 078 2.994 1 429 536 4.099 1 773 996 3.303 220 580 220 778 0.999 112 457 1.961 93 861 2.350 114 544 1.926

340340340340

Table 2. The comparison of BWCA original and its improvement scheme results.

Name Of Image

Image Size (.tiff)

Image Size (.raw)

Original BWCA

Comp. Ratio

BWCA Using IFC

Comp. Ratio

BWCA Using RLE-2

Symbols

Comp. Ratio

knee_0 2 701 240 2 696 640 791 153 3.408 736 782 3.660 774 751 3.481

knee_1 2 655 704 2 651 176 825 728 3.211 763 734 3.471 804 860 3.294

leg_0 1 728 972 1 725 500 568 049 3.038 537 945 3.208 563 096 3.064

leg_1 1 318 720 1 315 640 526 059 2.501 498 760 2.638 522 071 2.520

pelvis_0 3 124 892 3 119 852 1 642 274 1.900 1 566 239 1.992 1 624 185 1.921

pelvis_1 3 034 932 3 029 956 1 571 699 1.928 1 495 185 2.026 1 555 180 1.948

sinus_0 2 424 218 2 419 802 811 206 2.983 761 426 3.178 798 670 3.030

sinus_1 2 241 804 2 237 492 804 833 2.780 760 809 2.941 795 408 2.813

breast_0 3 752 938 3 746 730 983851 3.808 936388 4.001 974 505 3.845

breast_1 3 678 612 3 672 396 1096598 3.349 1046294 3.510 1 084 486 3.386

foot_0 3 125 062 3 119 694 782377 3.987 731163 4.267 766 994 4.067

foot_1 2 235 408 2 231 304 752290 2.966 702304 3.177 736 294 3.030

hand_0 2 500 096 2 484 368 822399 3.021 759646 3.270 799 105 3.109

hand_1 2 535 246 1 279 773 411577 3.109 380944 3.359 400 375 3.196

head_0 1 088 424 2 651 925 599394 4.424 572652 4.631 595 266 4.455

head_1 2 608 068 2 603 188 724980 3.591 691279 3.766 719 246 3.619

spine_0 1 759 608 1 755 944 917233 1.914 873973 2.009 908 556 1.933

spine_1 1 786 082 1 782 450 924235 1.929 877948 2.030 915 842 1.946

thorax_0 3 537 852 3 531 492 1614415 2.187 1535721 2.300 1 601 907 2.205

thorax_1 2 854 408 2 849 280 1227010 2.322 1170481 2.434 1 216 031 2.343

The model in Figure 3 is used to compare the effect of IFC and MTF. The results of IFC and MTF comparison can be seen in Table 2 whereas IFC can decrease 4.3% data. We also compare the RLE-0 with other model of RLE, which are RLE-2 that have been proposed by Lehmann [16], where RLE-2, could increase the average of the original BWCA’s compression performance around 1.5 %. The effects of blocked-oriented scheme are also investigated. The compression rates increase with the image resolution. This feature can be observed by splitting the image in blocks. The blocks are processed one by one, and provide one compressed data stream. Each stream is regrouped to produce the compressed image. The bit stream is compared with the full resolution image compression. The results of this test are presented in Table 3 and 4. These tables represent the result of image Hand Primary directly digital (see Figure 1). The image has been split into 10 blocks, see Table 3. Four different block sizes are considered. This block-oriented scheme provides lower compression ratio, nevertheless, it decreases significantly the processing time. This fact is discussed in the next experiments, which is presented in Table 5. We propose this study by investigating data format. In 4 bits process, each pixel of BWT input is split into 2 parts: 4 bits of Least Significant Bits [LSB] and 4 bits of Most Significant Bits [MSB] are

separated. Each of them becomes a new character or new 8 bits. Therefore, the input of BWT is doubled. Here, the maximum value of each character is 127, where in 8 bits the maximum value of characters is 256. The aim of diminution input symbols of BWT is to increase the number of the same symbols. The same process is done in 2 bits and 1 bit process of bit decomposition. In 2 bits process, the size of image becomes 4 times larger than 8 bits image full processed, but the maximal numbers of symbols are only 16. The results of bit decomposition in BWCA original scheme and BWCA modified are shown in Table 3 and 4. The impact of bit decomposition does not significantly increase compression ratios. There is a little alteration for each block. Therefore, binary plan decomposition can be considered. Indeed, the algorithm can then be adapted to the binary nature of the images. Therefore, binary decomposition will bring the possibility to propose hardware implementation based on logical operators. This solution can represent the simplest and low cost implementation compared to actual solution [17]. The running time is also studied, and the results are presented in Table 5. The results represent in seconds on a 2.13 GHz Pentium with 1 GB RAM. The results show that running time does not just depend on image size, but also on the nature of

341341341341

Table 3. The results of image Decomposition using BWCA original scheme.

Blocks size (.raw) 8 bits Comp. ratio 4 bits Comp. ratio 2 bits Comp. ratio 1 bit Comp. ratio

262 144 98 309 2,667 95 508 2,745 91 181 2,875 92 122 2,846 262 144 140 923 1,860 142 920 1,834 138 394 1,894 142 401 1,841 262 144 135 903 1,929 134 693 1,946 133 530 1,963 135 678 1,932 262 144 130 594 2,007 128 790 2,035 129 165 2,030 130 844 2,003

230 400 69 222 3,328 66 390 3,470 63 063 3,653 62 500 3,686

230 400 113 890 2,023 113 399 2,032 110 177 2,091 113 292 2,034

230 400 129 343 1,781 129 919 1,773 129 312 1,782 132 488 1,739

230 400 109 198 2,110 108 026 2,133 104 283 2,209 103 829 2,219 141 312 71 461 1,977 69 369 2,037 69 511 2,033 72 029 1,962

124 200 50 119 2,478 48 127 2,581 45 893 2,706 46 028 2,698

Average CR 2,216 2,259 2,324 2,296

image, as seen the different results in line 1 till 4 in Table 5.

Figure 4. Decomposition of image Hand.

The first and the second lines in Table 5 represent the processing time obtained for two different blocks extracted from the whole image. The two blocks are shown in Figure 4. Both blocks have the same size. The left image shows that there are many pixels with similar gray level. Table 5 shows that left image request more processing time than the right image. Therefore, running time process does not depend only on the image size (or block size) but also on image nature. The algorithm aims to sort and regroup the similar grey level pixels. The number of sorting

depends on the number of similar pixels. Therefore, the processing time increases with data redundancy. For 8 bits image decomposition, real time encoding and decoding can be obtained. Obviously, the processing time increases with image size and the decomposition in data plans. For instance, the binary data arrangement requests to process 8 plans instead of only one for the 8 bits data arrangement. Therefore, in this case, the processing time is 8 times higher with a sequential implementation. This binary data arrangement aims to purpose parallel hardware implementation based on logical operators. Therefore, a software sequential implementation is not to be considered for this data arrangement. 5. Conclusion and Perspectives BWCA lossless compression scheme allows reduction rates up to 4 when applied to radiographs.

Table 4. The results of image Decomposition using BWCA modified scheme.

Blocks size (.raw) 8 bits Comp. ratio 4 bits Comp. ratio 2 bits Comp. ratio 1 bit Comp. ratio

262 144 97 330 2,693 98 790 2,654 95 308 2,750 98 390 2,664

262 144 140 810 1,862 142 920 1,834 140 916 1,860 146 317 1,792

262 144 135 573 1,934 137 500 1,907 137 129 1,912 140 729 1,863

262 144 129 918 2,018 131 853 1,988 132 301 1,981 135 742 1,931

230 400 67 447 3,416 69 272 3,326 67 569 3,410 68 649 3,356

230 400 113 789 2,025 115 532 1,994 113 045 2,038 117 231 1,965

230 400 129 031 1,786 131 357 1,754 132 115 1,744 136 000 1,694

230 400 108 492 2,124 109 792 2,099 108 200 2,129 109 339 2,107

141 312 71 565 1,975 73 601 1,920 72 320 1,954 75 060 1,883

124 200 50 113 2,478 51 537 2,410 49 545 2,507 50 337 2,467

Average CR 2,231 2,189 2,229 2,172

342342342342

Table 5. Running time image decomposition.

Size blocks (.raw)

Time (milisecond) 8 bits 4 bits 2 bits 1 bit Orig Orig Orig Orig

enc dec enc dec enc dec enc dec

262 144 547 32 1 391 31 4 156 125 13 765 281 262 144 469 15 1 251 30 3 256 125 10 125 313 262 144 500 16 1 203 47 3 515 125 11 563 312 262 144 500 31 1 218 31 3 515 187 11 422 312 230 400 484 16 1 359 31 4 265 78 14 406 234 230 400 406 16 1 031 32 3 000 94 9 641 250 230 400 391 31 984 32 2 813 109 8 734 266 230 400 453 16 1 172 31 3 641 125 12 000 297 141 312 235 <1 594 16 1 609 31 5 046 125 124 200 250 <1 688 16 1 922 31 6 344 125

Other modifications of the methods are currently investigated, as for instance the inclusion of preprocessing stages (i.e. BWT) to the compression scheme. Moreover, binary versions of BWT are currently investigated, in order to propose low-cost hardware implementation. 6. References [1] J. Abel, "Improvements to the Burrows-Wheeler compression algorithm: after BWT stages", ACM Trans. Computer Systems, submitted for publication, 2003. [2] M. Burrows and D.J. Wheeler, "A Block-sorting lossless data compression", SRC Research Report 124, Digital systems research center, Palo Alto, 1994. [3] Y. Wiseman, "Burrows-Wheeler based JPEG", Data Science Journal, Vol. 6, 2007, pp. 19-27. [4] A. Andersson and S. Nilsson, "Implementing radix sort", ACM Journal of Experimental Algorithmic Vol. 3, 1998, pp. 7-22. [5] P. Fenwick, "Block sorting text compression–final report", Technical reports 130, University of Auckland, New Zealand, Department of Computer Science. 1996. [6] S. Kurtz and B. Balkenhol, "Space efficient linear time computation of the Burrows and Wheeler-Transformation", Festschrift in honour of Rudolf Ahlswede's 60th, 1999, pp. 375-384. [7] K. Sadakane, "Unifying text search and compression-suffix sorting, block sorting and suffix arrays", PhD Thesis, University of Tokyo, Japan, 2000. [8] N. J. Larsson, "Structures of string matching and data compression", PhD Thesis, Lund University, Sweden, 1999.

[9] J. Seward, "On the performance of BWT sorting algorithms", Data Compression Conference, 2000, pp. 173-182. [10] H. Itoh and H. Tanaka, "An efficient method for construction of suffix arrays", Transactions of Information Processing Society of Japan Vol. 41, 1999, pp. 31-39. [11] T. Kao, "Improving suffix-array construction algorithms with applications", Master Thesis, Department of Computer Science, Gunma University, Japan, 2001. [12] G. Manzini and P. Ferragina, "Engineering a lightweight suffix array construction algorithm", Lecture Notes In Computer Science, Springer Verlag, Vol. 2461, 2002, pp. 698-710. [13] B. Balkenhol and Y. Shtarkov, "One attempt of a compression algorithm using the BWT", SFB343: Discrete Structures in Math., Faculty of Math., Univ. of Bielefeld, Germany, 1999. [14] S. Deorowicz, "Second step algorithms in the Burrows-Wheeler compression algorithm". Software- Practice and Experience 32(2), 2002, pp. 99-111. [15] T. M. Lehmann, M. Güld, C. Thies, B. Fischer, K. Spitzer, D. Keysers, H. Ney, M. Kohnen, H. Schubert and B. Wein, "Content-based image retrieval in medical applications", Methods of Information In Medicine, Vol. 43, number 4, October 2004, pp. 354-361. [16] T.M. Lehmann, J. Abel and C. Weiss, "The impact of lossless image compression to radiographs", Proceeding SPIE Vol. 6145, 2006, pp. 290-297. [17] J. Martinez, R. Cumplido and C. Feregrino, "An FPGA-based parallel sorting architecture for the Burrow-Wheeler transform", Proceedings of the 2005 International Conference on Reconfigurable Computing and FPGAs, 2005, pp. 17-23.

343343343343

bwt 2008

Documents

Transcript of bwt 2008