Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time...
-
Upload
aron-george -
Category
Documents
-
view
220 -
download
0
Transcript of Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time...
![Page 1: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/1.jpg)
Compression
![Page 2: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/2.jpg)
Compression
Compression ratio: how much is the size reduced?
Symmetric/asymmetric: time difference to compress, decompress?
Lossless; lossy: any information lost in the process to compress and decompress?
Adaptive/static: does the compression dictionary change during processing?
![Page 3: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/3.jpg)
Huffman encoding
Statistical encoding Requires knowledge of relative frequency of
elements in the string Sender and receiver must both know
encoding chosen Create a tree structure that assigns longest
representations to most rarely used symbols
![Page 4: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/4.jpg)
Huffman example First, start with statistics about occurrence of symbols in the
text to be compressed.
That assumption might not be right for every message.
Sometimes expressed as percentage, sometimes as relative frequencies
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
We want shorter codes for A, E, S longer codes for R, W to minimize the overall message lengths
We are saying that in analysis of a large body of typical text, we find that the occurrence of E is 9 times more common than the occurrence of W, for example
![Page 5: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/5.jpg)
Constructing the Code
First, combine the least frequently used symbols
R(1) W(1)
2
The weight (frequency) of the pair (R,W) is 2, of the pair (N,P) is 4 and of the existing tree combined with N is also 2
N(2) P(2)
4
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
R(1) W(1)
2OR N(2)
4
Either will work. The second version gets us closer to the canonical form.
![Page 6: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/6.jpg)
Insert the next least frequently used symbol (P, with weight=2). We could attach the 2 to the 4 in the tree to get a node with weight = 6 or we could join the P(2) and G(3) nodes to get a combined node with weight = 5. We choose the lower weight.
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
R(1) W(1)
2 N(2) P(2)
5
G(3)
4
![Page 7: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/7.jpg)
R(1) W(1)
2 N(2) P(2)
5
G(3)
4
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
Next, we have D and I at weight 4 each. We could add one of them to the 4 node in the tree for a new weight of 8 or add them to each other for a weight of 8. To get the canonical form, let’s join D and I.
D(4) I(4)
8
![Page 8: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/8.jpg)
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
Now, we need to insert the A at weight 7. What are the options? Join the 4 + 5 = 9 or A(7)+4 = 11, A(7)+5 = 12, A(7)+8 = 15, 7+S(8) = 15. Where shall we put the A?
R(1) W(1)
2 N(2) P(2)
5
G(3)
4
D(4) I(4)
8
![Page 9: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/9.jpg)
Our lowest weight possible is 4=5 = 9, so we join those trees. Then our option is to join the A(7) to the 9 or to the 8, so we choose the 8.
R(1) W(1)
2 N(2) P(2)
5
G(3)
4
D(4) I(4)
8A(7)
159
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
![Page 10: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/10.jpg)
R(1) W(1)
2 N(2) P(2)
5
G(3)
4
D(4) I(4)
8
9
A(7)
15
24
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
Continuing, we need to insert the S(8) and the E(9). We can get a weight of 17 by joining them together. We could have inserted them in a different place, but would have broken the canonical form.
E(9)S(8)
17
![Page 11: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/11.jpg)
R(1) W(1)
2 N(2) P(2)
5
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
G(3)
4
D(4) I(4)
8
9
A(7)
15
24
E(9)S(8)
17
410
0
0
0
0
0
0
0
1
1
1
1
1
1 1
1
1
0
![Page 12: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/12.jpg)
Completed code
S(8)= 10 E(9)= 11 A(7)= 011 I(4)= 0101 N(2)= 0001 P(2)= 0010 G(3)= 0011 D(4)= 0100 R(1)= 00000 W(1)= 00001
Average code length: A has weight 7 and
length 3, etc. 8*2 + 9*2 +7*3 +4*4
+2*4 + 2*4 + 3*4 +4*4 + 1*5 +1*5 = 125
125/41 =3.049
A(7) E(9) S(8) I(4) D(4) G(3) N(2) P(2) R(1) W(1)
![Page 13: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/13.jpg)
In class exercise
Working in pairs, encode a message of at least 10 letters using the code we just generated.
Do not leave any spaces between the letters in your message.
Pass the message to some other team. Make sure you give and get a message. Decode the message you received.
![Page 14: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/14.jpg)
Entropy per symbol
Entropy, E, is information content
Entropy is inversely proportional to the probability of occurrence
E = -∑pi log2 pi i=1,n
where n is the number of symbols and pi is the probability of occurrence of the ith symbolThis is the lower bound on weighted compression -- the goal to shoot for.How well did we do in our code?
2.3 compared to 3.049
![Page 15: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/15.jpg)
Properties of the Huffman code
Variable length code Prefix property Average bits per symbol (entropy) Huffman codes approach the theoretical limit
for amount of information per symbol Static coding. Code must be known by sender
and receiver and used consistently
![Page 16: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/16.jpg)
Dynamic Huffman Code
Build the code as the message is transmitted. The code will be the best for this particular
message. Sender and receiver use the same rules for
building the code.
![Page 17: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/17.jpg)
Constructing the tree
Sender and receiver begin with an initial tree consisting of a root node and a left child with a null character and weight = 0
First character is sent uncompressed and is added to the tree as the right branch from the root. The new node is labeled with the character, its weight is 1 and the tree branch is labeled 1 also.
A list shows the tree entries in order
![Page 18: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/18.jpg)
Example
banana:r
*(0)
Initial tree
Transmit b
*(0)
r
b(1)
Weight (1) = number of times that character has occurred so far
*(0) b(1) List version of the tree
![Page 19: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/19.jpg)
A new character seen Whenever a new character appears in the
message, it is sent as follows: send the path to the empty node send the uncompressed representation of
the new character. Place the new character into the tree and
update the list representation.
*(0)
r
b(1)
*(0) a(1) 1 b(1)
a (1)
1
Null node moves down to make room for the new node
as its sibling
List is formed by reading the tree left to right,
bottom level to top level
ba
![Page 20: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/20.jpg)
Another character
n
b(1)
a (1)
2
r
*(0) n(1)
1*(0) n(1) 1 a(1) 2 b(1)
(Note all left branches are coded as 1, all right branches as 0)
List entries are not in non decreasing order.Adjust the list and show the corresponding tree.
*(0) n(1) 1 a(1) b(1) 2b(1)
a (1)
2
r
*(0) n(1)
1
ban
![Page 21: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/21.jpg)
Our first repeated character
a b(1)
a (2)
3
r
*(0) n(1)
1
*(0) n(1) 1 a(2) b(1) 3
Again there is a problem. The numbers in the list do not obey the requirement
of non decreasing order
Adjust the list and make the tree match*(0) n(1) 1 b(1) a(2) 2
a(2)
b(1)
2
r
*(0) n(1)
1
Note that the 3 changed to a 2 as a result of the tree restructuring.
bana
![Page 22: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/22.jpg)
Another repeat n
*(0) n(2) 2 b(1) a(2) 3
Another misfit. Move b(1) and adjust the tree as needed.
*(0) b(1) 1 n(2) a(2) 3
Code sent for this n will be 101corresponding to the original
position of n. Then the restructuringwill be done.
a(2)
b(1)
3
*(0) n(2)
2
r
banan
a(2)
n(2)
3
*(0) b(1)
1
r
![Page 23: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/23.jpg)
One more letter
a This a is encoded as 0.No restructuring of the
tree is needed.
*(0) b(1) 1 n(2) a(3) 3
banana
a(3)
n(2)
3
*(0) b(1)
1
r
![Page 24: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/24.jpg)
In class exercise
Create the dynamic Huffman code for the “message” = Tennessee
![Page 25: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/25.jpg)
Summary
Compression seeks to minimize the amount of transmission by making efficient representations for the data.
Static compression keeps the same codes and depends on consistency in the distribution of characters to code
Dynamic compression adjusts as it works to allow the most efficient compression for the current message.
![Page 26: Compression. Compression ratio: how much is the size reduced? Symmetric/asymmetric: time difference to compress, decompress? Lossless; lossy: any.](https://reader036.fdocuments.us/reader036/viewer/2022062408/56649eea5503460f94bfb7c3/html5/thumbnails/26.jpg)
Some extra resources
Huffman coding resources:http://www.dogma.net/DataCompression/Huffman.shtml