1
Lempel-Ziv Compression Techniques
Introduction to Run-Length Encoding
Introduction to Lempel-Ziv Encoding: LZ77 & LZ78
Example 1: Encoding using LZ78
Example 2: Decoding using LZ78
2
Introduction to Run-Length Encoding
A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs.
Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency.
Data to be transmitted is first scanned to find the coding information before transmission.
For example, the messageAAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD
is replaced by4A3BAA5B8CDABCB3A4B3CD
Saving can be dramatic when long runs are present.
3
Run-Length Encoding: Possible Problems A problem arises if one of the characters being transmitted
is a digit, as in 11111111111544444
which is represented as 1111554 (eleven 1s, one 5, five 4s).
A solution is instead of using a number n, a character can be used whose ASCII value is n.
For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49).
This method is only modestly efficient for text files and widely used for encoding fax images.
The main advantage of this algorithm is simplicity and speed.
4
Introduction to Lempel-Ziv Encoding
Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding.
An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv.
This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78.
Due to patents, LZ77 and LZ78 led to many variants:
The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes.
LZ77 Variants
LZRLZSSLZBLZH
LZ78 Variants
LZWLZCLZTLZMWLZJLZFG
5
LZ78 Compression AlgorithmStart with empty dictionaryP = emptyWHILE (there more characters in the char
stream) C = next character in the char streamIF (the string P+C in the dictionary)
P = P+CELSE //(if P is empty,zero is its codeword)
output the code word corresponding to P output Cadd the string P+C to the dictionary P = empty
IF (P is not empty)output the code word corresponding to P
END
6
Example 1: LZ78 Compression 1Example 1: Use the LZ78 algorithm to encode the message
ABBCBCABABCAABCAAB
Solution: The encoding process is presented below in which: The column STEP indicates the number of the encoding step. The column POS indicates the current position in the input
data. The column DICTIONARY shows what string has been added
to the dictionary. The index of the string is equal to the step number.
The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary.
The output of each step decodes to the string that has been added to the dictionary.
7
Example 1: LZ78 Compression Step 1
ABBCBCABABCAABCAAB POS = 1C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
8
Example 1: LZ78 Compression Step 2
ABBCBCABABCAABCAAB POS = 2C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
9
Example 1: LZ78 Compression Step 3
ABBCBCABABCAABCAAB POS = 3C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
3 3)2,C(BC
10
Example 1: LZ78 Compression Step 4
ABBCBCABABCAABCAAB POS = 5C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
3 3)2,C(BC
4 5)3,A(BCA
11
Example 1: LZ78 Compression Step 5
ABBCBCABABCAABCAAB POS = 8C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
3 3)2,C(BC
4 5)3,A(BCA
5 8)2,A(BA
12
Example 1: LZ78 Compression Step 6
ABBCBCABABCAABCAAB POS = 10C = AP = BCA
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
3 3)2,C(BC
4 5)3,A(BCA
5 8)2,A(BA
6 10)4,A(BCAA
13
Example 1: LZ78 Compression Step 7
ABBCBCABABCAABCAAB POS = 14C = empty
P = empty
STEP POSOUTPUTDICTIONARY
1 1(0,A)A
2 2)0,B(B
3 3)2,C(BC
4 5)3,A(BCA
5 8)2,A(BA
6 10)4,A(BCAA
7 14)6,B(BCAAB
14
Example 1: Coded Information in Bits Now, we calculate the number of bits needed to represent the
coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
The number of bits needed to represent each integer with index n is at most equal to the number of bits needed to represent the (n-1)th index.
For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary.
Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is
(1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits
15
LZ78 Decompression Algorithm
At the start the dictionary is emptyWhile (more code words in the code stream)
W = code wordC = character following itoutput the string of W + Cadd the string of W + C to the dictionary
end
16
Example 2: LZ78 Decompression
Example 2: Use the above algorithm to decompress the
compressed output sequence of Example 1, namely,
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
17
Example 2: LZ78 Decompression Step 1
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
18
Example 2: LZ78 Decompression Step 2<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
19
Example 2: LZ78 Decompression Step 3<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
33)2,C(BCBC
20
Example 2: LZ78 Decompression Step 4<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
33)2,C(BCBC
45)3,A(BCABCA
21
Example 2: LZ78 Decompression Step 5<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
33)2,C(BCBC
45)3,A(BCABCA
58)2,A(BABA
22
Example 2: LZ78 Decompression Step 6<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
33)2,C(BCBC
45)3,A(BCABCA
58)2,A(BABA
610)4,A(BCAABCAA
23
Example 2: LZ78 Decompression Step 7<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>
W=empty C=empty
STEPPOSINPUTOUTPUTDICTIONARY
11)0,A(AA
22)0,B(BB
33)2,C(BCBC
45)3,A(BCABCA
58)2,A(BABA
610)4,A(BCAABCAA
714)6,B(BCAABBCAAB
24
LZ78: Concluding Remarks
LZ77 is based on a sliding window. Its output is very similar to that LZ78.
The compression ratio of LZ77 is similar to that of LZ78.
The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step.
More application areas of LZ77 and LZ78:• LZ77: gzip, Squeeze, LHA, PKZIP, ZOO• LZ78: compress, GIF, CCITT (modems), ARC, PAK
25
Exercises1. How many bits are required to represent the codeword
(2,C) generated on page 11? How many bits are required to represent the associated string without compression?
2. Use LZ78 to trace encoding the string
SATATASACITASA.
3. Write a Java program that encodes a given string using LZ78.
4. Write a Java program that decodes a given set of encoded code words using LZ78.
Top Related