Lecture 4 Run Length Encoding

November 1, 2015 1 [email protected]


Run-Length Encoding

RLE Text Compression

RLE Image Compression

Golomb Coding

Example 1

Decoding

Example 2

Contents

November 1, 2015 3

Data often contains sequences of identical bytes. By replacing these

repeated byte sequences with the number of occurrences, a substantial

reduction of data can be achieved. This is known as run-length coding.

Run-Length Encoding

Example: the character C occurs eight times in a row and is compressed

to the three characters C!8:

Uncompressed data: ABCCCCCCCCDEFGGG

Run-length coded: ABC!8DEFGGG

The technique can be described precisely as follows: if a byte occurs at

least four times in a row, then the number of occurrences is counted.

The compressed data contain the byte, followed by the M-byte and the

number of occurrences of the byte.

[email protected]


To get an idea of the compression ratios produced by RLE, we assume a

string of N characters that needs to be compressed. We assume that the

string contains M repetitions of average length L each. Each of the M

repetitions is replaced by 3 characters (escape, count, and data), so the

size of the compressed string is N − M × L +M ×3 = N −M(L − 3) and the

compression factor is

Examples: N = 1000,M = 10, L = 4 yield

a compression factor of 1000 / [1000 − 10(4 − 3)] = 1.01.

A better result is obtained in the case N = 1000,M = 50, L = 10, where the

factor is 1000 / [1000 − 50(10 − 3)] = 1.538.

RLE Text Compression

November 1, 2015 [email protected] 5

Compressing an image using RLE is based on the observation that if we

select a pixel in the image at random, there is a good chance that its

neighbors will have the same color.

RLE Image Compression

The compressor therefore scans the bitmap row by row, looking for runs

of pixels of the same color.

The size of the compressed stream depends on the complexity of the

image. The more detail, the worse the compression.


the run length cannot be 0, it makes sense to write the [run length minus

one] on the output stream. Thus the pair (3, 87) denotes a run of four

pixels with intensity 87. This way, a run can be up to 256 pixels long.

In color images it is common to have each pixel stored as three bytes,

representing the intensities of the red, green, and blue components of the

pixel.

binary (black-and-white) images, usually consist of runs of 0’s or 1’s. As an

example, if a segment of a binary image is represented as

d =

000000001111111111100000000000000011100000000000001001111111111

and it can be compactly represented as c(d) = (9, 11, 15, 3, 13, 1, 2, 10) by

simply listing the lengths of alternate runs of 0’s and 1’s.


Run-length encoding is a simple approach to source coding when there

exists a long run of the same data, in a consecutive manner, in a data

set. As an example, the data

d = 5 5 5 5 5 5 5 19 19 19 19 19 19 19 19 19 19 19 19 0 0 0 0 0 0 0 0 8

23 23 23 23 23 23

contains long runs of 5‘s, 19’s, 0’s, 23’s, etc. Rather than coding each

sample in the run individually, the data can be represented compactly by

simply indicating the value of the sample and the length of its run when

it appears. In this manner, the data d can be run-length encoded as

(5 7) (19 12) (0 8) (8 1) (23 6).

For ease of understanding, we have shown a pair in each parentheses.

Here the first value represents the pixel, while the second indicates the

length of its run.


In some cases, the appearance of runs of symbols may not be very

apparent. But the data can possibly be preprocessed in order to aid run-

length coding.

Consider the data d = 26 29 32 35 38 41 44 50 56 62 68 78 88 98 108 118

116 114 112 110 108 106 104 102 100 98 96. We can simply preprocess

this data, by taking the sample difference e( i ) = d ( i ) - d ( i - l), to produce

the processed data E= 26 3 3 3 3 3 3 6 6 6 6 10 10 10 10 10 -2 -2 -2 -2 -2 -

2 -2 -2 -2 -2 -2. This preprocessed data can now be easily run-length

encoded as (26 1) (3 6) (6 4) (10 5) (-2 11).


Golomb Coding

Golomb coding is a lossless data compression method invented by

Solomon W. Golomb.

Symbols following a geometric distribution will have a Golomb code as an

optimal prefix code, making Golomb coding highly suitable for situations in

which the occurrence of small values in the input stream is significantly

more likely than large values.


The Golomb-Rice codes belong to a family of codes designed to encode

integers with the assumption that the larger an integer, the lower its

probability of occurrence.

The simplest code for this situation is the unary code. The unary code for

a positive integer n is simply n 1’s followed by a 0.

the code for 4 is 11110, and the code for 7 is 11111110

The unary code is the same as the Huffman code for the semi-infinite

alphabet {1, 2, 3, . . .} with probability model

Because the Huffman code is optimal, the unary code is also optimal

for this probability model.


The Golomb code is actually a family of codes parameterized by an integer m > 0.

In the Golomb code with parameter m, we represent an integer n > 0 using two

numbers q and r ,where

q is the quotient.

r is the remainder when n is divided by m.

• Let k be the smallest positive integer such that 2k ≥ 2𝑚.

• Then the corresponding code dictionary contains exactly m words of

every word length ≥ 𝑘. • as well as 2k-1 – m words of length k-1.


Example 1

We will consider the cases m=14 and m=16.

In the case of m=14 we find k=5 and 2k-1 - m = 2

((2k≥ 2𝑚) … 25≥ 2*14 )

So that there are two codewares of length 4, followed by 14 codewares of

lengths 5,6,7 etc.


In general:

q is the quotient and r is the reminder when n is divided by m.

The quotient can take on values 0,1, 2, … and is represented by the

unary code of q.

The reminder r can take on the values 0, 1, 2, … m-1.

If m is a power of two, we use the log2 m-bit binary representation of r.

If m is not a power of two, we could still use log2 m bits.

We can reduce the number of bits required if we use:

the log2 m bit binary representation of r for the first 2 𝑙𝑜𝑔2 𝑚 − 𝑚

values.

and the log2 m bit binary representation of r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 for the

rest of the values.

• x is integer part of x.


Log2 14 = 3.807 m = 14

log2 14 = 4 bits

log2 14 = 3

log2 m bits binary representation of r for the first 2 𝑙𝑜𝑔2 𝑚 − 𝑚 values.

log2 14 = 3 bits

2 𝑙𝑜𝑔2 𝑚 −𝑚 = 24 – 14 = 16 – 14 = 2

log2 m bit binary representation of r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 for the rest of the

values.

log2 14 = 4 bits

r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 = r + 24 – 14 = r + 16 – 14 = r + 2


since m=16 is a power of 2, the corresponding dictionary contains

exactly 16 words of every word length starting with length 5.


m=16

Log2 16 = 4 bits binary representation of r.

2k≥ 2𝑚) … 25≥ 2*16


Decoding

The dictionaries in the previous table exhibit striking patterns which

suggest a rather simple decoding procedure might be employed.

For the case m=16, when m is power of 2, the following rule for decoding

is adequate.

Start at the beginning (left end) of the word, and counts the number of

1’s preceding the first 0. lets this number be A ≥ 0.

The word consist of A+4 bits by regarded as the ordinary binary

representation of the integer R, 0 ≤ R ≤ 15.

Then the correct decoding of the word is 16 A + R.



For the rest of values, let the last 4 bits be the binary representation of the

integer R.

Then the correct decoding of the codeword is 14 A + R - 2.

we regard the codeword as consisting of the total of A + 4 bits.

Letting the 4 bits be the binary representation of the first integer R’ , the

correct decoding in this case is 14 A + R’ (A equal zero).

If m is not a power of two, we could still use log2 m bits.

We can reduce the number of bits required if we use:

the log2 m bit binary representation of r for the first 2 𝑙𝑜𝑔2 𝑚 − 𝑚

values.

and the log2 m bit binary representation of r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 for the

rest of the values.

The case m=14, is only slightly more complicated.


Example 2

Lets design a Golomb code for m=5

Log2 5 = 2.321 m = 5

log2 5 = 3 bits

log2 5 = 2 bits

log2 m bits binary representation of r for the first 2 𝑙𝑜𝑔2 𝑚 − 𝑚 values.

log2 5 = 2 bits

2 𝑙𝑜𝑔2 𝑚 −𝑚 = 23 – 5 = 8 – 5 = 3 values of r (that is, r = 0, 1, 2) will be

represented by the 2-bit binary representation of r.

log2 m bit binary representation of r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 for the rest of the

values that is, r = 3, 4).

log2 5 = 3 bits

r + 2 𝑙𝑜𝑔2 𝑚 − 𝑚 = r + 23 – 5 = r + 8 – 5 = r + 3

Lecture 4 Run Length Encoding

Science

Transcript of Lecture 4 Run Length Encoding